[SOLVED] Strange Characters reading .csv file in Python 2
I am trying to read a csv file in Python, but there are strange characters about!!
This file is downloaded from a Solar Photovoltaic Array System. The file looks fine in gedit, geany and vim. I can import it into Libreoffice Calc, no problem. Here are a couple of lines in gedit to show what it should look like:- Code:
20/09/2016 00:00:00;16901.962;0.000 Code:
_file = open(fl,'rU') Code:
2 0 / 0 9 / 2 0 1 6 0 0 : 0 0 : 0 0 ; 1 6 9 0 1 . 9 6 2 ; 0 . 0 0 0 In LibreOffice Writer I get:- #2#0#/#0#9#/#2#0#1#6# #0#0#:#0#0#:#0#0#;#1#6#9#0#1#.#9#6#2#;#0#.#0#0#0## #2#0#/#0#9#/#2#0#1#6# #0#0#:#0#5#:#0#0#;#1#6#9#0#1#.#9#6#2#;#0#.#0#0#0## What are all the hashes about? Is this some sort of strange encoding issue? I am using utf-8 encoding at the beginning of my script. Thanks in advance |
Answering my own post, I think this is defininately an encoding problem. The original .csv file comes from a Windows 10 machine.
here is the string using repr(line) <CODE> '\\'\\x002\\x000\\x00/\\x000\\x009\\x00/\\x002\\x000\\x001\\x006\\x00 \\x000\\x000\\x00:\\x000\\x000\\x00:\\x000\\x000\\x00;\\x001\\x006\\x009\\x000\\x001\\x00.\\x009\\x0 06\\x002\\x00;\\x000\\x00.\\x000\\x000\\x000\\x00\\r\\x00\\n\\''</CODE> I think this would suggest UTF-16, but I am not sure. Here is the output from a line Code:
2 0 / 0 9 / 2 0 1 6 0 0 : 0 0 : 0 0 ; 1 6 9 0 1 . 9 6 2 ; 0 . 0 0 0 Here are some things I have tried, with no change to the output: where the variable 'line' is a line from the csv file Code:
codecs.encode(unicode(line),'utf-8') Code:
def to_unicode_or_bust( Code:
to_unicode_or_bust(line) Anyone good on codecs? |
dos2unix ?
|
Perfect!!! Works a treat
Thank you so much, saves a load of faffing about. |
All times are GMT -5. The time now is 03:52 AM. |