Writing binary data under python.
I'm trying to implement the Huffman algorithm on python. It's like counting the characters of a file, finding their probability, and after that the characters with higher probability get a shorter binary code, and the characters with lower probability - a bigger one.
I've written the functions that form the code for each character, but I cannot write the code in the file. Python 2 supports writing binary only as hex or 8-base digits, but I need to write binary with variable length. I tried to experiment with the third python. Here are the results I got: Code:
>>> file=open("tryme","r+b") So, how should I write the code into the file? |
explicitly close the file after writing. or just do
Code:
open("tryme","rb").write(b'0101010') |
Quote:
Code:
file.seek(0); |
Thank you very much for your replies, but anyway I didn't get the results I wanted. I've opened the file with a hexeditor and I get that the characters simply were recoded into binary.
Here is the file written with python in hexedit: Code:
00000000 30 31 30 31 30 31 30 31 30 31 30 31 010101010101 |
The problem is b"101010" is still a string. The 'b' maybe confusing you: it does not mean "convert this string to a binary byte".
To convert a string containing '1' and '0' to a numerical value (int) do this: Code:
# Convert string to int using base 2 Code:
byte1 = int('110001', 2) # ascii for '1' Code:
shell$ hexdump -C /tmp/bytes.bin Code:
#!/usr/bin/env python Code:
shell$ python ./binary.py |
Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??
|
Quote:
Or did I get you question wrong? |
Quote:
|
Hko,yes :)
Thank you once again, you've helped me very much bgeddy, not really. The point is that if I would write this file with binary, I can lose maximum 1 byte :) and that's not a really big loss. It's not like writing every character as a number. It's concatenating the binary string for all the characters and after that writing it in the file. The most I fear that these spare bits could ruin my decoding algorithm, because I cannot find out how many 0s do I have to "clear" in order to decode correctly the first letter. |
True, so you obviously need fill them all, possbily except the last byte.
Then only the last byte may contain 7 (worst case) bits. But there's no way around that anyway. Creating a string of '0' and '1' characters first, and then converting those to (real binary) bytes is obviously a unneeded detour using more memory an CPU than necessary. But IMO it's OK to start like that, easier to debug and see what is going on. You can alway later optimize the character strings away. |
Hko, i thought that the first bits would be 0. Isn't that 1=('00000001',2)?
|
Quote:
But I was trying to tell bgeddy that does not necessarily happen on every byte... |
Quote:
Quote:
|
All times are GMT -5. The time now is 01:11 PM. |