LinuxQuestions.org - [SOLVED] Strange Characters reading .csv file in Python 2

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - [SOLVED] Strange Characters reading .csv file in Python 2 (https://www.linuxquestions.org/questions/linux-software-2/%5Bsolved%5D-strange-characters-reading-csv-file-in-python-2-a-4175591571/)

[SOLVED] Strange Characters reading .csv file in Python 2

I am trying to read a csv file in Python, but there are strange characters about!!
This file is downloaded from a Solar Photovoltaic Array System.
The file looks fine in gedit, geany and vim. I can import it into Libreoffice Calc, no problem.
Here are a couple of lines in gedit to show what it should look like:-

Code:

20/09/2016 00:00:00;16901.962;0.000

20/09/2016 00:05:00;16901.962;0.000

However when I try and read the file in Python using

Code:

_file = open(fl,'rU')

for line in _file:

    print line

I get gaps between each character and extra lines:-

Code:

 2 0 / 0 9 / 2 0 1 6  0 0 : 0 0 : 0 0 ; 1 6 9 0 1 . 9 6 2 ; 0 . 0 0 0



 



 2 0 / 0 9 / 2 0 1 6  0 0 : 0 5 : 0 0 ; 1 6 9 0 1 . 9 6 2 ; 0 . 0 0 0

without the 'rU' in the open() just 'r' I just get one odd character printed out, and then a blank line, instead of the actual line, despite the fact I can see the line in the debugger. I am using PyCharm.
In LibreOffice Writer I get:-
#2#0#/#0#9#/#2#0#1#6# #0#0#:#0#0#:#0#0#;#1#6#9#0#1#.#9#6#2#;#0#.#0#0#0##
#2#0#/#0#9#/#2#0#1#6# #0#0#:#0#5#:#0#0#;#1#6#9#0#1#.#9#6#2#;#0#.#0#0#0##
What are all the hashes about? Is this some sort of strange encoding issue? I am using utf-8 encoding at the beginning of my script.
Thanks in advance

Answering my own post, I think this is defininately an encoding problem. The original .csv file comes from a Windows 10 machine.
here is the string using repr(line)
<CODE>
'\\'\\x002\\x000\\x00/\\x000\\x009\\x00/\\x002\\x000\\x001\\x006\\x00 \\x000\\x000\\x00:\\x000\\x000\\x00:\\x000\\x000\\x00;\\x001\\x006\\x009\\x000\\x001\\x00.\\x009\\x0 06\\x002\\x00;\\x000\\x00.\\x000\\x000\\x000\\x00\\r\\x00\\n\\''</CODE>
I think this would suggest UTF-16, but I am not sure.
Here is the output from a line

Code:

2 0 / 0 9 / 2 0 1 6 0 0 : 0 0 : 0 0 ; 1 6 9 0 1 . 9 6 2 ; 0 . 0 0 0

I have tried all sorts of things about unicode
Here are some things I have tried, with no change to the output:
where the variable 'line' is a line from the csv file

Code:

codecs.encode(unicode(line),'utf-8')

line.encode('utf-8')

then from a good presentation this function:

Code:

def to_unicode_or_bust(

        obj, encoding='utf-8'):

    if isinstance(obj, basestring):

        if not isinstance(obj, unicode):

            obj = unicode(obj, encoding)

    return obj

called as:

Code:

to_unicode_or_bust(line)

and output is identical.
Anyone good on codecs?

Perfect!!! Works a treat
Thank you so much, saves a load of faffing about.