LinuxQuestions.org - [SOLVED] Writing binary data under python.

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Writing binary data under python. (https://www.linuxquestions.org/questions/programming-9/writing-binary-data-under-python-718165/)

Writing binary data under python.

I'm trying to implement the Huffman algorithm on python. It's like counting the characters of a file, finding their probability, and after that the characters with higher probability get a shorter binary code, and the characters with lower probability - a bigger one.
I've written the functions that form the code for each character, but I cannot write the code in the file.
Python 2 supports writing binary only as hex or 8-base digits, but I need to write binary with variable length.
I tried to experiment with the third python. Here are the results I got:

Code:

>>> file=open("tryme","r+b")

>>> file.write(b"01010102")

8

>>> file.read()

b''

>>> file.write(b"0101010")

7

>>> file.read()

b''

As much as I understand, every character of the string is like transformed to binary, and after that is written into the file. But anyway, as you can see, after writing, the file remains empty.
So, how should I write the code into the file?

explicitly close the file after writing. or just do

Code:

open("tryme","rb").write(b'0101010')

Quote:

Originally Posted by dracuss (Post 3504460)

But anyway, as you can see, after writing, the file remains empty.

You need to seek to the beginning of the file before reading. The file position points to the location after what you wrote.

Code:

file.seek(0);

If you write multiple values to the file, you will need to seek to specific offsets.

Thank you very much for your replies, but anyway I didn't get the results I wanted. I've opened the file with a hexeditor and I get that the characters simply were recoded into binary.
Here is the file written with python in hexedit:

Code:

00000000 30 31 30 31 30 31 30 31 30 31 30 31 010101010101

Can you help me write the binary code?? i don't need them to be recoded. I want to write raw binary numbers of different length directly into the file

The problem is b"101010" is still a string. The 'b' maybe confusing you: it does not mean "convert this string to a binary byte".

To convert a string containing '1' and '0' to a numerical value (int) do this:

Code:

# Convert string to int using base 2 

# (2=binary, 10=dec, 16=hex etc..)

byte = int('101010', 2) 

print byte

print '%X' % byte  # print as hexadecimal

You could then write bytes to a file like this:

Code:

byte1 = int('110001', 2) # ascii for '1'

byte2 = int('110010', 2) # ascii for '2'

byte3 = int('110011', 2) # ascii for '3'

byte4 = int('11111111', 2) # 255, 0xFF



f = file('/tmp/bytes.bin', 'wb')

f.write('%c' % byte1)

f.write('%c' % byte2)

f.write('%c' % byte3)

f.write('%c' % byte4)

f.close()

Checking with hexdump:

Code:

shell$ hexdump -C /tmp/bytes.bin

00000000  31 32 33 ff    |123.|

00000004

Probably more efficient, and better (arguably more the way python want you to write binary data):

Code:

#!/usr/bin/env python



import array



# An python array is like a restricted python list 

# for storing binary data.

#

data = array.array('B')  # create array of bytes.



data.append(int('1000001', 2)) # binary for ascii 'A'

data.append(int('1000010', 2)) # binary for ascii 'B'

data.append(int('1000011', 2)) # binary for ascii 'C'

data.append(int('11111111', 2)) # 255, 0xFF



print data



# Write the array at once to a file

#

f = file('/tmp/data.bin', 'wb')

data.tofile(f)

f.close()

Checking result:

Code:

shell$ python ./binary.py 

array('b', [65, 66, 67, 255])

shell$ hexdump -C /tmp/data.bin 

00000000  41 42 43 ff    |ABC.|

00000004

This page may also be of help.

Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??

Quote:

Originally Posted by dracuss (Post 3508154)

Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??

Then the rest of the bits in that 'half-byte' are 0 of course...
Or did I get you question wrong?

Quote:

Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??

This is why Huffman encoding using this method is nonsense. All symbols will occupy a minimum of one byte and so no gains are had. To program various sybmbol lengths and their prefix codes requires a more complex algorithm than just representing all codes in a fixed number of bits.

Hko,yes :)
Thank you once again, you've helped me very much
bgeddy, not really. The point is that if I would write this file with binary, I can lose maximum 1 byte :) and that's not a really big loss. It's not like writing every character as a number. It's concatenating the binary string for all the characters and after that writing it in the file. The most I fear that these spare bits could ruin my decoding algorithm, because I cannot find out how many 0s do I have to "clear" in order to decode correctly the first letter.

True, so you obviously need fill them all, possbily except the last byte.
Then only the last byte may contain 7 (worst case) bits. But there's no way around that anyway.

Creating a string of '0' and '1' characters first, and then converting those to (real binary) bytes is obviously a unneeded detour using more memory an CPU than necessary. But IMO it's OK to start like that, easier to debug and see what is going on. You can alway later optimize the character strings away.

Hko, i thought that the first bits would be 0. Isn't that 1=('00000001',2)?

Quote:

Originally Posted by dracuss (Post 3508396)

Hko, i thought that the first bits would be 0. Isn't that 1=('00000001',2)?

Yes, sure.
But I was trying to tell bgeddy that does not necessarily happen on every byte...

Quote:

It's concatenating the binary string for all the characters and after that writing it in the file

Aha - problem solved then !

Quote:

But I was trying to tell bgeddy that does not necessarily happen on every byte...

Thanks for the information..