LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-06-2016, 09:16 AM   #1
iFunction
Member
 
Registered: Nov 2015
Posts: 248

Rep: Reputation: Disabled
Reading iso-8859-1 file from command line


Hi there,

I am trying to write a python script that will read multiple files to get some data out. The files are .VBO files used with the VBOX camera system for a car, but they are simple txt files set to the above standard. So I can open them in any text editor but I can't do a simple read of the file from the command line. This is what I tried:
Code:
line=$(head -n 1 ./test_vbo.vbo)
I have tried changing the file extension to .txt, but it didn't work (I didn't think it would really) I have also tried to read it in python:
Code:
infile = open('./test_vbo.vbo', 'r')
firstline = infile.readline()
and that was what made me realize it was an encoding issue. The following line of bash:
Code:
file -bi test_vbo.vbo
gave me the following output:
Code:
text/plain; charset=iso-8859- 1
How can I use this please.

How can I get to read this file without changing the file itself, as I have over a thousand of these to look at. I wanted to attach the file for people to look at, is that possible?
 
Old 12-06-2016, 02:15 PM   #2
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Perhaps the strings command? It extracts text from data and other file types
e.g.

strings test_vbo.vbo
 
Old 12-06-2016, 02:22 PM   #3
iFunction
Member
 
Registered: Nov 2015
Posts: 248

Original Poster
Rep: Reputation: Disabled
Hi, thanks for your reply, I used the binary option in the end, that worked: 'rb'
 
Old 12-06-2016, 04:03 PM   #4
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Quote:
Originally Posted by iFunction View Post
Hi, thanks for your reply, I used the binary option in the end, that worked: 'rb'
Could you elaborate? I thought your question was how to extract text from the files. It appears rb is for receiving files not extracting text from them?
 
Old 12-07-2016, 12:13 AM   #5
iFunction
Member
 
Registered: Nov 2015
Posts: 248

Original Poster
Rep: Reputation: Disabled
Yes, I have about 1000 of these files that have time code and 'canbus' data from track cars, I simply wanted to be able to trawl them to assess on track running time. The files would not open with a standard read, but they opened with binary read:
Code:
f = open('/path/to/file', 'rb')
firstline = f.readline()
print(firstline)
Did work for what I needed. On this occasion, I couldn't get the file to read as a straight forward text file, though the files would open in any text editor so there had to be some way to read it in python. Reading in binary was the answer, and as we generate massive amounts of these files, this is a simple solution to an ever expanding situation.
 
1 members found this post helpful.
Old 12-08-2016, 06:46 AM   #6
iFunction
Member
 
Registered: Nov 2015
Posts: 248

Original Poster
Rep: Reputation: Disabled
Hi, I just want to clarify the solution as I now have the correct one. Be aware that my terminology is not correct as I am still learning, but hopefully it might make it easier to read for the lay man.

First of all this refers to Python3 as python2 apparently is very forgiving of this issue and will simply not print a character is can't decipher, python3 is very strict and will throw an error. I have been learning python3 exclusively.

Although my above solution does indeed work, it does make then using the file very tricky, as python will now read in the lines in bytes, if it is correctly read in as a properly encoded text file, then it is a lot easier (for newbies like myself) to then work with as the line is then read in as words (by word I mean string of characters between white space). So the solution was indeed quite simple once I had established the correct encoding of the file, python will tell you this in the error message, but it is likely to only be one of only a few different ones with any luck:
The default which i believe is ASCII
utf-8
and then the one that I had
iso-8859-1 also referred to as Latin-1

so the correct way to open a text file for reading is to also state the encoding:
Code:
f = open('/path/to/file.py', 'r', encoding='iso-8859-1')
where:
f - refers to a variable to assign the object of opening a file to.
'/path/to/file.py' - I hope is obvious, though it wasn't to me when I first started.
'r' - refers to the mode in this case for read.
then the lat bit is the subject of this post so should be self explanatory.

Your data should now be fully accessible for any form of manipulation. I would just like to add that the file I was trying to open was not marked as a text file, it was a .vbo file, a custom file extension from a company that makes video equipment, but it is just a text file with all the extra data that is accumulated with the video file, hence if a file can be opened with a text editor and can be read as English, then the data can be extracted so long as the encoding is correct.


This article is well worth reading to explain how we got into the whole encoding mess in the first place and also explains why it is important:
https://www.joelonsoftware.com/2003/...ts-no-excuses/

Apologies to anyone who thinks this is massively over simplified, it's just this is the kind of thing I would have liked to have found when searching. I hope it helps somebody somewhere.

Kind regards
iFunction
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C++: force ofstream to create file with iso-8859-1 encoding matiasar Programming 2 04-02-2009 08:56 AM
how to make non bootable iso file and iso via linux command line?? npubudu Linux - Newbie 2 02-01-2009 11:31 PM
convert text-file from utf-8 to iso-8859-1 [SOLVED] @ngelot Linux - Server 1 06-12-2007 05:47 AM
Reading a file and running a command for each line. mijohnst Linux - General 11 08-22-2005 06:18 PM
Pop3 Command stream end of file while reading line sesimonsen Linux - Networking 0 05-13-2005 12:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:53 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration