LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-10-2009, 06:09 AM   #1
dracuss
Member
 
Registered: May 2006
Location: Chisinau, Moldova
Distribution: Gentoo, Debian sid
Posts: 151

Rep: Reputation: 29
Writing binary data under python.


I'm trying to implement the Huffman algorithm on python. It's like counting the characters of a file, finding their probability, and after that the characters with higher probability get a shorter binary code, and the characters with lower probability - a bigger one.
I've written the functions that form the code for each character, but I cannot write the code in the file.
Python 2 supports writing binary only as hex or 8-base digits, but I need to write binary with variable length.
I tried to experiment with the third python. Here are the results I got:
Code:
>>> file=open("tryme","r+b")
>>> file.write(b"01010102")
8
>>> file.read()
b''
>>> file.write(b"0101010")
7
>>> file.read()
b''
As much as I understand, every character of the string is like transformed to binary, and after that is written into the file. But anyway, as you can see, after writing, the file remains empty.
So, how should I write the code into the file?

Last edited by dracuss; 04-10-2009 at 06:15 AM.
 
Old 04-10-2009, 06:22 AM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
explicitly close the file after writing. or just do
Code:
open("tryme","rb").write(b'0101010')
 
Old 04-10-2009, 07:20 AM   #3
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,300
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by dracuss View Post
But anyway, as you can see, after writing, the file remains empty.
You need to seek to the beginning of the file before reading. The file position points to the location after what you wrote.
Code:
file.seek(0);
If you write multiple values to the file, you will need to seek to specific offsets.
 
Old 04-13-2009, 03:41 AM   #4
dracuss
Member
 
Registered: May 2006
Location: Chisinau, Moldova
Distribution: Gentoo, Debian sid
Posts: 151

Original Poster
Rep: Reputation: 29
Thank you very much for your replies, but anyway I didn't get the results I wanted. I've opened the file with a hexeditor and I get that the characters simply were recoded into binary.
Here is the file written with python in hexedit:
Code:
00000000  30 31 30 31  30 31 30 31  30 31 30 31               010101010101
Can you help me write the binary code?? i don't need them to be recoded. I want to write raw binary numbers of different length directly into the file
 
Old 04-13-2009, 05:53 AM   #5
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
The problem is b"101010" is still a string. The 'b' maybe confusing you: it does not mean "convert this string to a binary byte".

To convert a string containing '1' and '0' to a numerical value (int) do this:
Code:
# Convert string to int using base 2 
# (2=binary, 10=dec, 16=hex etc..)
byte = int('101010', 2) 
print byte
print '%X' % byte  # print as hexadecimal
You could then write bytes to a file like this:
Code:
byte1 = int('110001', 2) # ascii for '1'
byte2 = int('110010', 2) # ascii for '2'
byte3 = int('110011', 2) # ascii for '3'
byte4 = int('11111111', 2) # 255, 0xFF

f = file('/tmp/bytes.bin', 'wb')
f.write('%c' % byte1)
f.write('%c' % byte2)
f.write('%c' % byte3)
f.write('%c' % byte4)
f.close()
Checking with hexdump:
Code:
shell$ hexdump -C /tmp/bytes.bin
00000000  31 32 33 ff    |123.|
00000004
Probably more efficient, and better (arguably more the way python want you to write binary data):
Code:
#!/usr/bin/env python

import array

# An python array is like a restricted python list 
# for storing binary data.
#
data = array.array('B')  # create array of bytes.

data.append(int('1000001', 2)) # binary for ascii 'A'
data.append(int('1000010', 2)) # binary for ascii 'B'
data.append(int('1000011', 2)) # binary for ascii 'C'
data.append(int('11111111', 2)) # 255, 0xFF

print data

# Write the array at once to a file
#
f = file('/tmp/data.bin', 'wb')
data.tofile(f)
f.close()
Checking result:
Code:
shell$ python ./binary.py 
array('b', [65, 66, 67, 255])
shell$ hexdump -C /tmp/data.bin 
00000000  41 42 43 ff     |ABC.|
00000004
This page may also be of help.

Last edited by Hko; 04-13-2009 at 06:05 AM.
 
Old 04-14-2009, 03:33 AM   #6
dracuss
Member
 
Registered: May 2006
Location: Chisinau, Moldova
Distribution: Gentoo, Debian sid
Posts: 151

Original Poster
Rep: Reputation: 29
Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??
 
Old 04-14-2009, 04:34 AM   #7
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by dracuss View Post
Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??
Then the rest of the bits in that 'half-byte' are 0 of course...
Or did I get you question wrong?
 
Old 04-14-2009, 06:54 AM   #8
bgeddy
Senior Member
 
Registered: Sep 2006
Location: Liverpool - England
Distribution: slackware64 13.37 and -current, Dragonfly BSD
Posts: 1,810

Rep: Reputation: 227Reputation: 227Reputation: 227
Quote:
Hko, thank you very much. But what if the number of bits are not enough for to complect an entire byte??
This is why Huffman encoding using this method is nonsense. All symbols will occupy a minimum of one byte and so no gains are had. To program various sybmbol lengths and their prefix codes requires a more complex algorithm than just representing all codes in a fixed number of bits.
 
Old 04-14-2009, 07:48 AM   #9
dracuss
Member
 
Registered: May 2006
Location: Chisinau, Moldova
Distribution: Gentoo, Debian sid
Posts: 151

Original Poster
Rep: Reputation: 29
Hko,yes
Thank you once again, you've helped me very much
bgeddy, not really. The point is that if I would write this file with binary, I can lose maximum 1 byte and that's not a really big loss. It's not like writing every character as a number. It's concatenating the binary string for all the characters and after that writing it in the file. The most I fear that these spare bits could ruin my decoding algorithm, because I cannot find out how many 0s do I have to "clear" in order to decode correctly the first letter.

Last edited by dracuss; 04-14-2009 at 07:49 AM.
 
Old 04-14-2009, 07:54 AM   #10
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
True, so you obviously need fill them all, possbily except the last byte.
Then only the last byte may contain 7 (worst case) bits. But there's no way around that anyway.

Creating a string of '0' and '1' characters first, and then converting those to (real binary) bytes is obviously a unneeded detour using more memory an CPU than necessary. But IMO it's OK to start like that, easier to debug and see what is going on. You can alway later optimize the character strings away.
 
Old 04-14-2009, 08:10 AM   #11
dracuss
Member
 
Registered: May 2006
Location: Chisinau, Moldova
Distribution: Gentoo, Debian sid
Posts: 151

Original Poster
Rep: Reputation: 29
Hko, i thought that the first bits would be 0. Isn't that 1=('00000001',2)?
 
Old 04-14-2009, 08:25 AM   #12
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by dracuss View Post
Hko, i thought that the first bits would be 0. Isn't that 1=('00000001',2)?
Yes, sure.
But I was trying to tell bgeddy that does not necessarily happen on every byte...
 
Old 04-14-2009, 08:57 AM   #13
bgeddy
Senior Member
 
Registered: Sep 2006
Location: Liverpool - England
Distribution: slackware64 13.37 and -current, Dragonfly BSD
Posts: 1,810

Rep: Reputation: 227Reputation: 227Reputation: 227
Quote:
It's concatenating the binary string for all the characters and after that writing it in the file
Aha - problem solved then !
Quote:
But I was trying to tell bgeddy that does not necessarily happen on every byte...
Thanks for the information..
 
  


Reply

Tags
binary, convert, file, python


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
writing binary data via the shell? zero79 Programming 3 03-04-2006 11:35 AM
Python DCOP (KDE) and binary data carl.waldbieser Programming 0 12-22-2005 03:02 PM
Binary reading/writing in cpp kornerr Programming 1 11-27-2005 07:09 PM
Python and binary files The_Nerd Programming 2 08-27-2004 02:48 PM
Problem in reading/writing binary data in Linux esi-eric Linux - Hardware 3 07-20-2004 04:21 PM


All times are GMT -5. The time now is 04:02 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration