Hi, I just want to clarify the solution as I now have the correct one. Be aware that my terminology is not correct as I am still learning, but hopefully it might make it easier to read for the lay man.
First of all this refers to Python3 as python2 apparently is very forgiving of this issue and will simply not print a character is can't decipher, python3 is very strict and will throw an error. I have been learning python3 exclusively.
Although my above solution does indeed work, it does make then using the file very tricky, as python will now read in the lines in bytes, if it is correctly read in as a properly encoded text file, then it is a lot easier (for newbies like myself) to then work with as the line is then read in as words (by word I mean string of characters between white space). So the solution was indeed quite simple once I had established the correct encoding of the file, python will tell you this in the error message, but it is likely to only be one of only a few different ones with any luck:
The default which i believe is ASCII
utf-8
and then the one that I had
iso-8859-1 also referred to as Latin-1
so the correct way to open a text file for reading is to also state the encoding:
Code:
f = open('/path/to/file.py', 'r', encoding='iso-8859-1')
where:
f - refers to a variable to assign the object of opening a file to.
'/path/to/file.py' - I hope is obvious, though it wasn't to me when I first started.
'r' - refers to the mode in this case for read.
then the lat bit is the subject of this post so should be self explanatory.
Your data should now be fully accessible for any form of manipulation. I would just like to add that the file I was trying to open was not marked as a text file, it was a .vbo file, a custom file extension from a company that makes video equipment, but it is just a text file with all the extra data that is accumulated with the video file, hence if a file can be opened with a text editor and can be read as English, then the data can be extracted so long as the encoding is correct.
This article is well worth reading to explain how we got into the whole encoding mess in the first place and also explains why it is important:
https://www.joelonsoftware.com/2003/...ts-no-excuses/
Apologies to anyone who thinks this is massively over simplified, it's just this is the kind of thing I would have liked to have found when searching. I hope it helps somebody somewhere.
Kind regards
iFunction