LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-29-2012, 03:45 AM   #1
justinlapin
LQ Newbie
 
Registered: Mar 2012
Posts: 2

Rep: Reputation: Disabled
Smile write a file fo data array (with float values) in a binary format in Python


Hi,

I'm a new member of this forum, I think someone could help me...

I try to write a file in binary format in Python.

My file is composed of several data arrays containing float values.
I first learn to write these data in a file .txt in ASCII format, it's no very complicated, and I thought it will be the same to write a file binary format... But I don't succeed! (I'm only a biginner...)

I open my file as :

data = open("Results2.txt", 'wb')

Do my calculs and write as, for example:

data.todata(str(np.round(time[i-1], decimals=2)))
data.tofile(str((mean[i]))

Is that my "str" which is wrong to write?

In a first time, to write in ASCII, I only did that and it worked good:

data.write(str(np.round(time, decimals=2)))


In a ASCII format an example of my results is :


Time Mean of positions

0.0 -0.000287037441469

0.01 -0.00036335778945

0.02 -0.000540046100227

0.03 0.000936252721119

0.04 0.000614809409481


And more, I should have a lot of data and my tutor said me to write all my files in bianry format, how can I do that?

Thanck you very much for helping me ...

Last edited by justinlapin; 03-29-2012 at 03:47 AM.
 
Old 03-29-2012, 06:08 AM   #2
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 640

Rep: Reputation: 375Reputation: 375Reputation: 375Reputation: 375
Hi.

What exactly do you mean by "binary format"? If you need to save disk space, use np.savetxt() function with file name ending in gz. In this case your files will be gzipped (compressed). np.loadtxt() handles such files transparently.

Last edited by firstfire; 03-29-2012 at 06:18 AM.
 
Old 03-29-2012, 02:36 PM   #3
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
Please use [CODE][/CODE] tags around your code and data examples, so they're easier to read.

Quote:
Originally Posted by justinlapin View Post
My file is composed of several data arrays containing float values [in binary format].
You use struct.pack() to pack the float data. For example:

Code:
import sys
import struct

data = [ 1.2, 1.5, 1.7 ]

handle = open('floats.bin', 'wb')
handle.write(struct.pack('<%df' % len(data), *data))
handle.close()

handle = open('doubles.bin', 'wb')
handle.write(struct.pack('<%dd' % len(data), *data))
handle.close()
In the above example, the pack statement evaluates to struct.pack('<3f', 1.2, 1.5, 1.7) for floats.bin file and to struct.pack('<3d', 1.2, 1.5, 1.7) for doubles.bin file. Check the struct format characters in the Python standard library documentation, and you'll see the pack format strings read as "in little-endian byte order, three IEEE-754 float (binary32) values" and "in little-endian byte order, three IEEE-754 double (binary64) values".

Reading the data is just as simple, except you use struct.unpack() instead. The only important point is to realize you need to know exactly how many values you want to unpack. I personally prefer to embed the count somewhere in the file header, but if you have pure float/double data, you can just divide the length of the data by 4 (for floats) or 8 (for doubles). Oh, and struct.unpack() always returns a tuple, which you probably want to convert to a list using list() ; I do.
Code:
import sys
import struct

handle = open('floats.bin', 'rb')
datastr = handle.read()
data = list(struct.unpack('<%df' % (len(datastr)/4), datastr))
sys.stdout.write('Read %d floats from floats.bin:\n' % len(data))
for i in data:
    sys.stdout.write('\t%g\n' % i)
handle.close()

handle = open('doubles.bin', 'rb')
datastr = handle.read()
data = list(struct.unpack('<%dd' % (len(datastr)/8), datastr))
sys.stdout.write('Read %d doubles from doubles.bin:\n' % len(data))
for i in data:
    sys.stdout.write('\t%g\n' % i)
handle.close()
Finally, a couple of notes.

The above code is platform-agnostic. The < at the start of the struct format means the data will be little-endian in the file, even if the computer architecture was big-endian. According to Python docs, it will also work even if the architecture used non-IEEE-754 floating-point formats.

While my code reads and writes the entire data at once, it wastes memory. You're better off slicing the data into say 131072-value units (yielding half a megabyte or megabyte of data per slice), and packing and writing or reading and unpacking each slice at a time, to limit the amount of temporary storage needed. It becomes very important very suddenly, when the data set size grows. Obviously, if your entire data set is smaller than twice that, it is better to just do it all at once. (But please, do not just assume that "640k is enough for everyone", that "nobody will ever use this for that much data". That way of thinking leads to tools that break for no discernible reason, when the dataset grows large enough; and that is extremely frustrating.)

(The I/O block size calculation is a bit of black magic. First, it should be a power of two, because all current filesystems use power-of-two block sizes, so non-power of two writes end up being much slower. Second, it should be large enough so that the OS can handle the write optimally. Third, it should be small enough so both source and target data, during the conversion, fits comfortably in the CPU core cache. Thus, there is really no right answer, but in practical terms, I find 131072/262144/524288/1048576/2097152/4194304/8388608 -byte blocks to work best -- and have done so for the last ten years or so, so I do not expect that to change anytime soon. Smaller end works better with desktop-type machines, and larger end with server and cluster-type machines.)

The above code snippets work in both Python 2 and Python 3.

@firstfire: Reading floating-point values from text strings, and writing them as strings, is a very, very slow process. The rounding requirements in the relevant standards mean very precise intermediate values must be used. Even on high-end CPUs, the conversion speed is less than a million values per second!

With classical atomic simulations (molecular dynamics), you may have say ten million atoms in a system. To animate it in a video, you might save just the coordinates for say a hundred frames (yielding just a few seconds of animation). In practice, the simulation is measurably slowed down by the conversion of coordinates to text format, even when the system is distributed over eight CPU cores (or nodes). On the other hand, saving the coordinates in native binary format (thirty million doubles per frame, for 100 frames, is about 24 gigabytes of data) is only limited by the local I/O speed. I also prefer to do it on the background, so as long as the calculations take up enough time, and there is enough memory to keep one frame in memory (about 240 megabytes in this case), the save is essentially free. If I do the conversion to text, it still takes a large amount of CPU processing power away from the calculations, and it is very difficult to distribute it evenly to the computing cores -- meaning it is usually one core that does the job, with the rest idling by. What a waste.

Additionally, the binary data format ends up being easier to handle. You can slice exactly the parts of data you want, simply by seeking the file; if it was text, you'd have to read the file from the start to find the exact row you want. Not that fun if the file is a couple of dozen gigabytes in size..

Hope this helps,

Last edited by Nominal Animal; 03-29-2012 at 02:53 PM.
 
1 members found this post helpful.
Old 04-06-2012, 05:35 AM   #4
justinlapin
LQ Newbie
 
Registered: Mar 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
Thank you very much for your answers, and for the quality of them!

I tried a lot of different things and I finally use the pickle module.

Nominal Animal, I keep all your answer in mind because for the rest of my work this could really help me!

Thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python - How to get text file data into an array on python. golmschenk Programming 4 11-11-2013 10:15 AM
How to read CSV data and compare the column values and then write them in new file VijayaRaghavanLakshman Linux - Newbie 9 01-26-2012 10:02 PM
How to convert a float to its binary or hex representation in Python? zero79 Programming 1 09-01-2005 11:19 AM
Best Python float array library? zero79 Programming 0 08-31-2005 06:25 PM
How to convert a float to its binary or hex representation in Python? zero79 Linux - Software 1 08-29-2005 10:30 PM


All times are GMT -5. The time now is 03:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration