LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-26-2010, 11:33 AM   #1
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Reading binary file in Python


I'm trying to read in the header from a binary file format called XTF. The specification is in this document (pdf) http://www.tritonimaginginc.com/site...Format_X18.pdf.

Since this is my first attempt at reading binary files, and since my python is pretty poor, I'm finding it difficult interpreting the information in the spec.

For example, the spec says:
Quote:
Originally Posted by XTF Header
2.4.1. Xtf File Header Layout
The XTF File header structure is described in Table C. The size is 1024 bytes. If more than six channels of data are to be logged in the XTF file, then the header can grow in increments of 1024 bytes to allow for additional CHANINFO structures are required.
So, the header is 1024 bytes long — sounds easy:
Code:
#!/usr/bin/env python

import sys

for file in sys.argv[1:]:
   f=open(file,"rb")
   s=f.read(1024)
   print s
This produces something slighly understandable:
Code:
{MaxView223$D:\lcusbl\calibration1309.xtfPort CM2 S/S�@Stbd CM2 S/S�@
The MaxView bit makes some sense - it's the recording program name i.e. the software used to generate the file. 223 is the program's version (2.23). D:\lcusbl\calibration1309.xtf is the filename as it was being generated, again fair enough. The two "Port CM2 S/S" entries are a separate structure in the header (ChanInfo). The spec is something like this:
Quote:
Originally Posted by Table C
Code:
XTFFILEHEADER 
Field                            Byte Offset Status Comment 
BYTE FileFormat                  0           M      Set to 123 (0x7B) 
BYTE SystemType                  1           M      Set to 1 
char RecordingProgramName[8]     2           M      Example: "Isis" 
char RecordingProgramVersion[8]  10          M      Example: "556" for version 5.56 
char SonarName[16]               18          R      Name of server used to access sonar.  Example: 
"C31_SERV.EXE" 
WORD SonarType                   34          M      0 = NONE , default. 
<snip>
My problem arises mainly because the information I get from this little python snippet doesn't describe all the information I was expecting. Notably, there's lots of information missing, which I presume is stored in the $ and �@ characters.

How do I go about correctly decoding these to get the information I need? Any pointers greatly appreciated.
 
Old 04-26-2010, 03:56 PM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by pwc101 View Post
I'm trying to read in the header from a binary file format called XTF. The specification is in this document (pdf) http://www.tritonimaginginc.com/site...Format_X18.pdf.

Since this is my first attempt at reading binary files, and since my python is pretty poor, I'm finding it difficult interpreting the information in the spec.

For example, the spec says:So, the header is 1024 bytes long — sounds easy:
Code:
#!/usr/bin/env python

import sys

for file in sys.argv[1:]:
   f=open(file,"rb")
   s=f.read(1024)
   print s
This produces something slighly understandable:
Code:
{MaxView223$D:\lcusbl\calibration1309.xtfPort CM2 S/S�@Stbd CM2 S/S�@
The MaxView bit makes some sense - it's the recording program name i.e. the software used to generate the file. 223 is the program's version (2.23). D:\lcusbl\calibration1309.xtf is the filename as it was being generated, again fair enough. The two "Port CM2 S/S" entries are a separate structure in the header (ChanInfo). The spec is something like this:

My problem arises mainly because the information I get from this little python snippet doesn't describe all the information I was expecting. Notably, there's lots of information missing, which I presume is stored in the $ and �@ characters.

How do I go about correctly decoding these to get the information I need? Any pointers greatly appreciated.
Your link to spec doesn't work.

Anyway, your problem is not reading binary file in Python, but extracting fields from the header you've read.

The table clearly states field offsets and field lengths, so you need to extract block of characters of given offsets and given lengths from 's' variable.

I am not a Python guy; in Perl there is 'substr' function which could easily be used to do the task, I guess in Python there are string manipulation functions doing the same.

...

I've tried

python string manipulation
python substring

- the latter in Yahoo produces this very first match:

http://www.tutorialspoint.com/python/python_strings.htm

which, I think, is what you need to extract fields.
 
Old 04-26-2010, 04:26 PM   #3
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Original Poster
Rep: Reputation: 128Reputation: 128
Quote:
Originally Posted by Sergei Steshenko View Post
Your link to spec doesn't work.
Grr. Here's the link http://www.tritonimaginginc.com/site...Format_X26.pdf.
Quote:
Originally Posted by Sergei Steshenko View Post
Anyway, your problem is not reading binary file in Python, but extracting fields from the header you've read.

The table clearly states field offsets and field lengths, so you need to extract block of characters of given offsets and given lengths from 's' variable.

I am not a Python guy; in Perl there is 'substr' function which could easily be used to do the task, I guess in Python there are string manipulation functions doing the same.

...

I've tried

python string manipulation
python substring

- the latter in Yahoo produces this very first match:

http://www.tutorialspoint.com/python/python_strings.htm

which, I think, is what you need to extract fields.
Thanks, I'll have a look at manipulating strings in Python - the little I've done of it before seemed to be pretty straight forward. Famous last words...
 
Old 04-27-2010, 05:53 PM   #4
perforser
LQ Newbie
 
Registered: Nov 2008
Posts: 4

Rep: Reputation: 2
Take a look at the struct module.
 
Old 04-28-2010, 04:01 AM   #5
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Original Poster
Rep: Reputation: 128Reputation: 128
Quote:
Originally Posted by perforser View Post
Take a look at the struct module.
This is what I ended up using, though it took me a little while to realise I could stack the format characters to match the length of the string. Also, adding a [0] to the end of struct.unpack() leaves a nice, usable string. Now that I've figured those things out, it's all working pretty well.

It is, however, laborious work because the header's 1024 bytes long, and you can fit a lot on information in 1024 bytes, it seems!

Here's a sample, in case anyone ever needs an idea of what I'm talking about:
Code:
      PacketHeader=f.read(256)
      Magic=struct.unpack('bb',PacketHeader[0:2])[0] # FIXME: Wrong; should be 64206 (0xFACE)
      HeaderType=struct.unpack('b',PacketHeader[2:3])[0]
      SubChannelNumber=struct.unpack('b',PacketHeader[3:4])[0] # Unused
      NumChansToFollow=struct.unpack('bb',PacketHeader[4:6])[0] # Unused
Edit: The FIXME there is bugging me because it's the data magic number (0xFACE, which in binary is 64206, I think). However, I cannot get the code to output either 0xFACE or 64206. Any ideas how to craft those bytes into the magic number? Everything else works fine with the format characters listed here.

Last edited by pwc101; 04-28-2010 at 04:10 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Python Reading and Declaring Variables from a Text File dudeman41465 Programming 7 01-31-2009 04:54 PM
Reading text file-writting binary file cdog Programming 5 06-13-2006 11:56 AM
Reading and Writing integers to binary file oulevon Programming 2 02-26-2006 12:27 AM
problem in reading Microsoft word as a binary file ljqu_happy Programming 15 02-02-2005 10:10 AM
Reading Memory Value from ELF Binary Damaged Soul Programming 4 11-24-2004 11:52 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration