LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Writing binary data under python. (https://www.linuxquestions.org/questions/programming-9/writing-binary-data-under-python-718165/)

dracuss 04-10-2009 06:09 AM

Writing binary data under python.
 
I'm trying to implement the Huffman algorithm on python. It's like counting the characters of a file, finding their probability, and after that the characters with higher probability get a shorter binary code, and the characters with lower probability - a bigger one.
I've written the functions that form the code for each character, but I cannot write the code in the file.
Python 2 supports writing binary only as hex or 8-base digits, but I need to write binary with variable length.
I tried to experiment with the third python. Here are the results I got:
Code:

>>> file=open("tryme","r+b")
>>> file.write(b"01010102")
8
>>> file.read()
b''
>>> file.write(b"0101010")
7
>>> file.read()
b''

As much as I understand, every character of the string is like transformed to binary, and after that is written into the file. But anyway, as you can see, after writing, the file remains empty.
So, how should I write the code into the file?

ghostdog74 04-10-2009 06:22 AM

explicitly close the file after writing. or just do
Code:

open("tryme","rb").write(b'0101010')

David1357 04-10-2009 07:20 AM

Quote:

Originally Posted by dracuss (Post 3504460)
But anyway, as you can see, after writing, the file remains empty.

You need to seek to the beginning of the file before reading. The file position points to the location after what you wrote.
Code:

file.seek(0);
If you write multiple values to the file, you will need to seek to specific offsets.

dracuss 04-13-2009 03:41 AM

Thank you very much for your replies, but anyway I didn't get the results I wanted. I've opened the file with a hexeditor and I get that the characters simply were recoded into binary.
Here is the file written with python in hexedit:
Code:

00000000  30 31 30 31  30 31 30 31  30 31 30 31              010101010101
Can you help me write the binary code?? i don't need them to be recoded. I want to write raw binary numbers of different length directly into the file

Hko 04-13-2009 05:53 AM

The problem is b"101010" is still a string. The 'b' maybe confusing you: it does not mean "convert this string to a binary byte".

To convert a string containing '1' and '0' to a numerical value (int) do this:
Code:

# Convert string to int using base 2
# (2=binary, 10=dec, 16=hex etc..)
byte = int('101010', 2)
print byte
print '%X' % byte  # print as hexadecimal

You could then write bytes to a file like this:
Code:

byte1 = int('110001', 2) # ascii for '1'
byte2 = int('110010', 2) # ascii for '2'
byte3 = int('110011', 2) # ascii for '3'
byte4 = int('11111111', 2) # 255, 0xFF

f = file('/tmp/bytes.bin', 'wb')
f.write('%c' % byte1)
f.write('%c' % byte2)
f.write('%c' % byte3)
f.write('%c' % byte4)
f.close()

Checking with hexdump:
Code:

shell$ hexdump -C /tmp/bytes.bin
00000000  31 32 33 ff    |123.|
00000004

Probably more efficient, and better (arguably more the way python want you to write binary data):
Code:

#!/usr/bin/env python

import array

# An python array is like a restricted python list
# for storing binary data.
#
data = array.array('B')  # create array of bytes.

data.append(int('1000001', 2)) # binary for ascii 'A'
data.append(int('1000010', 2)) # binary for ascii 'B'
data.append(int('1000011', 2)) # binary for ascii 'C'
data.append(int('11111111', 2)) # 255, 0xFF

print data

# Write the array at once to a file
#
f = file('/tmp/data.bin', 'wb')
data.tofile(f)
f.close()

Checking result:
Code:

shell$ python ./binary.py
array('b', [65, 66, 67, 255])
shell$ hexdump -C /tmp/data.bin
00000000  41 42 43 ff    |ABC.|
00000004

This page may also be of help.

dracuss 04-14-2009 03:33 AM

Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??

Hko 04-14-2009 04:34 AM

Quote:

Originally Posted by dracuss (Post 3508154)
Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??

Then the rest of the bits in that 'half-byte' are 0 of course...
Or did I get you question wrong?

bgeddy 04-14-2009 06:54 AM

Quote:

Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??
This is why Huffman encoding using this method is nonsense. All symbols will occupy a minimum of one byte and so no gains are had. To program various sybmbol lengths and their prefix codes requires a more complex algorithm than just representing all codes in a fixed number of bits.

dracuss 04-14-2009 07:48 AM

Hko,yes :)
Thank you once again, you've helped me very much
bgeddy, not really. The point is that if I would write this file with binary, I can lose maximum 1 byte :) and that's not a really big loss. It's not like writing every character as a number. It's concatenating the binary string for all the characters and after that writing it in the file. The most I fear that these spare bits could ruin my decoding algorithm, because I cannot find out how many 0s do I have to "clear" in order to decode correctly the first letter.

Hko 04-14-2009 07:54 AM

True, so you obviously need fill them all, possbily except the last byte.
Then only the last byte may contain 7 (worst case) bits. But there's no way around that anyway.

Creating a string of '0' and '1' characters first, and then converting those to (real binary) bytes is obviously a unneeded detour using more memory an CPU than necessary. But IMO it's OK to start like that, easier to debug and see what is going on. You can alway later optimize the character strings away.

dracuss 04-14-2009 08:10 AM

Hko, i thought that the first bits would be 0. Isn't that 1=('00000001',2)?

Hko 04-14-2009 08:25 AM

Quote:

Originally Posted by dracuss (Post 3508396)
Hko, i thought that the first bits would be 0. Isn't that 1=('00000001',2)?

Yes, sure.
But I was trying to tell bgeddy that does not necessarily happen on every byte...

bgeddy 04-14-2009 08:57 AM

Quote:

It's concatenating the binary string for all the characters and after that writing it in the file
Aha - problem solved then !
Quote:

But I was trying to tell bgeddy that does not necessarily happen on every byte...
Thanks for the information..


All times are GMT -5. The time now is 01:11 PM.