LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-14-2006, 10:28 PM   #1
darinbolson
Member
 
Registered: Jan 2006
Distribution: multi booting whatever I feel like. Grub rocks!
Posts: 85

Rep: Reputation: 15
question about checksums and data compression


Just a thought. If you can take an image file, lets say its an iso you want to burn on a dvd, and run a checksum on that file, and arrive at one very specific number, why can't the same be done in reverse?
Couldn't you create a program that would take an md5sum text file and build your iso out of that? This would be fantastic data compression! Imagine downloading the latest version of your favorite distro in under a second!
 
Old 10-15-2006, 12:35 AM   #2
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,290

Rep: Reputation: 378Reputation: 378Reputation: 378Reputation: 378
Unfortunately, life doesn't work that way. An MD5 hash is 128 bits -- much smaller than most files on a computer. This means that multiple files have the same MD5 sum. In other words, these hash functions are susceptible to collisions (two files having the same checksum). I believe there are theoretically an infinite number of files with the same MD5 hash (I am not a mathemetician though). How is your "reconstructor" program supposed to know which of these infinite number to reconstruct?

Also, getting an MD5 hash from a file is kind of like turning a cow into a bunch of steaks, hamburgers, etc. It's essentially a one way process. You can't take a bunch of hamburger patties and steaks and reconstruct a cow, and the same applies for constructing a file from a hash function.

Actually, that's a bit of an oversimplification. You *can* construct a file with a particular MD5 hash, but AFAIK it requires brute forcing (generating random files and seeing if they match the hash). This takes a long time. For a 128 bit hash, there are 2^128 possible hashes, and therefore on average it will take approximately 2^64 tries to generate a hash (let's assume we know how big the file is supposed to be, so we only have to generate random files of the correct length). Support it takes 1/100th of a second to generate each possibility. If I did my math right, it would take about 5.8 billion years on average to find a file with the correct MD5 sum, and even then, as mentioned above, we don't even know if it will be the right file (because of hash collisions). Kind of makes moot the one second it took to download your favorite distro .

Note: as mentioned above, it's been years since I sat in a graduate level discrete math class, so the above "back of the envelope" calculation may be horribly wrong. If so someone with more math smarts than I can hopefully correct it.

Edit to add: I recommend having a look at the MD5 article on Wikipedia for more than you probably ever wanted to know about the MD5 algorithm.

Last edited by btmiller; 10-15-2006 at 12:37 AM.
 
Old 10-15-2006, 11:37 PM   #3
darinbolson
Member
 
Registered: Jan 2006
Distribution: multi booting whatever I feel like. Grub rocks!
Posts: 85

Original Poster
Rep: Reputation: 15
Ok, so using an md5sum would not be the way to go about it. I still think that there must be a way to rebuild the iso out of a very small file with two parts. One would be something similar to an md5sum, and the other is a list of operations done to it to arrive at that number. Our download time will now be over one second, but I can live with that.

Last edited by darinbolson; 10-15-2006 at 11:40 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
client server data compression Ammad Linux - Networking 1 10-08-2006 03:48 PM
compression question canyon289 General 3 06-27-2006 10:07 PM
lossless data (de)compression algorithms for red hat zivsh Red Hat 1 05-30-2006 02:08 AM
backup large data - good compression slackman Linux - General 12 04-28-2006 01:01 AM
mkisofs -z (compression question) minion01 Linux - Newbie 1 12-20-2003 12:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 07:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration