Reading binary file in Python

pwc101 · 04-26-2010, 11:33 AM

I'm trying to read in the header from a binary file format called XTF. The specification is in this document (pdf) http://www.tritonimaginginc.com/site...Format_X18.pdf.

Since this is my first attempt at reading binary files, and since my python is pretty poor, I'm finding it difficult interpreting the information in the spec.

For example, the spec says:

Quote:

Originally Posted by XTF Header

2.4.1. Xtf File Header Layout
The XTF File header structure is described in Table C. The size is 1024 bytes. If more than six channels of data are to be logged in the XTF file, then the header can grow in increments of 1024 bytes to allow for additional CHANINFO structures are required.

So, the header is 1024 bytes long — sounds easy:

Code:

#!/usr/bin/env python

import sys

for file in sys.argv[1:]:
   f=open(file,"rb")
   s=f.read(1024)
   print s

This produces something slighly understandable:

Code:

{MaxView223$D:\lcusbl\calibration1309.xtfPort CM2 S/Sï¿½@Stbd CM2 S/Sï¿½@

The MaxView bit makes some sense - it's the recording program name i.e. the software used to generate the file. 223 is the program's version (2.23). D:\lcusbl\calibration1309.xtf is the filename as it was being generated, again fair enough. The two "Port CM2 S/S" entries are a separate structure in the header (ChanInfo). The spec is something like this:

Quote:

Originally Posted by Table C

Code:

XTFFILEHEADER 
Field                            Byte Offset Status Comment 
BYTE FileFormat                  0           M      Set to 123 (0x7B) 
BYTE SystemType                  1           M      Set to 1 
char RecordingProgramName[8]     2           M      Example: "Isis" 
char RecordingProgramVersion[8]  10          M      Example: "556" for version 5.56 
char SonarName[16]               18          R      Name of server used to access sonar.  Example: 
"C31_SERV.EXE" 
WORD SonarType                   34          M      0 = NONE , default. 
<snip>

My problem arises mainly because the information I get from this little python snippet doesn't describe all the information I was expecting. Notably, there's lots of information missing, which I presume is stored in the $ and ï¿½@ characters.

How do I go about correctly decoding these to get the information I need? Any pointers greatly appreciated.

Sergei Steshenko · 04-26-2010, 03:56 PM

Quote:

Originally Posted by pwc101

I'm trying to read in the header from a binary file format called XTF. The specification is in this document (pdf) http://www.tritonimaginginc.com/site...Format_X18.pdf.

Since this is my first attempt at reading binary files, and since my python is pretty poor, I'm finding it difficult interpreting the information in the spec.

For example, the spec says:So, the header is 1024 bytes long — sounds easy:

Code:

#!/usr/bin/env python

import sys

for file in sys.argv[1:]:
   f=open(file,"rb")
   s=f.read(1024)
   print s

This produces something slighly understandable:

Code:

{MaxView223$D:\lcusbl\calibration1309.xtfPort CM2 S/Sï¿½@Stbd CM2 S/Sï¿½@

The MaxView bit makes some sense - it's the recording program name i.e. the software used to generate the file. 223 is the program's version (2.23). D:\lcusbl\calibration1309.xtf is the filename as it was being generated, again fair enough. The two "Port CM2 S/S" entries are a separate structure in the header (ChanInfo). The spec is something like this:

My problem arises mainly because the information I get from this little python snippet doesn't describe all the information I was expecting. Notably, there's lots of information missing, which I presume is stored in the $ and ï¿½@ characters.

How do I go about correctly decoding these to get the information I need? Any pointers greatly appreciated.

Your link to spec doesn't work.

Anyway, your problem is not reading binary file in Python, but extracting fields from the header you've read.

The table clearly states field offsets and field lengths, so you need to extract block of characters of given offsets and given lengths from 's' variable.

I am not a Python guy; in Perl there is 'substr' function which could easily be used to do the task, I guess in Python there are string manipulation functions doing the same.

...

I've tried

python string manipulation
python substring

- the latter in Yahoo produces this very first match:

http://www.tutorialspoint.com/python/python_strings.htm

which, I think, is what you need to extract fields.

pwc101 · 04-26-2010, 04:26 PM

Quote:

Originally Posted by Sergei Steshenko

Your link to spec doesn't work.

Grr. Here's the link http://www.tritonimaginginc.com/site...Format_X26.pdf.

Quote:

Originally Posted by Sergei Steshenko

Anyway, your problem is not reading binary file in Python, but extracting fields from the header you've read.

The table clearly states field offsets and field lengths, so you need to extract block of characters of given offsets and given lengths from 's' variable.

I am not a Python guy; in Perl there is 'substr' function which could easily be used to do the task, I guess in Python there are string manipulation functions doing the same.

...

I've tried

python string manipulation
python substring

- the latter in Yahoo produces this very first match:

http://www.tutorialspoint.com/python/python_strings.htm

which, I think, is what you need to extract fields.

Thanks, I'll have a look at manipulating strings in Python - the little I've done of it before seemed to be pretty straight forward. Famous last words...

perforser · 04-27-2010, 05:53 PM

Take a look at the struct module.

pwc101 · 04-28-2010, 04:01 AM

Quote:

Originally Posted by perforser

Take a look at the struct module.

This is what I ended up using, though it took me a little while to realise I could stack the format characters to match the length of the string. Also, adding a [0] to the end of struct.unpack() leaves a nice, usable string. Now that I've figured those things out, it's all working pretty well.

It is, however, laborious work because the header's 1024 bytes long, and you can fit a lot on information in 1024 bytes, it seems!

Here's a sample, in case anyone ever needs an idea of what I'm talking about:

Code:

      PacketHeader=f.read(256)
      Magic=struct.unpack('bb',PacketHeader[0:2])[0] # FIXME: Wrong; should be 64206 (0xFACE)
      HeaderType=struct.unpack('b',PacketHeader[2:3])[0]
      SubChannelNumber=struct.unpack('b',PacketHeader[3:4])[0] # Unused
      NumChansToFollow=struct.unpack('bb',PacketHeader[4:6])[0] # Unused

Edit: The FIXME there is bugging me because it's the data magic number (0xFACE, which in binary is 64206, I think). However, I cannot get the code to output either 0xFACE or 64206. Any ideas how to craft those bytes into the magic number? Everything else works fine with the format characters listed here.