LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 01-21-2008, 12:14 PM   #1
GATTACA
Member
 
Registered: Feb 2002
Location: USA
Distribution: Fedora, CENTOS
Posts: 201

Rep: Reputation: 32
encode text with md5sum or sha1sum


Hello.

I have these files that look something like this:
988.207079 2
144.9625 0.8969
147.1986 2.3309
148.7969 2.1577
156.0010 0.8954
158.7631 0.6406
168.3721 0.6769
169.0449 1.7547
171.2164 1.6338
173.0547 2.8426


I need to store the contents of entire file in a MySQL database. I was wondering if there was a way to compress/re-encode the contents of the file and then store the result. That way if I needed the file I could just de-code/convert the format.

I'm envisioning something like:
[ In Perl script calling database function via DBI ]
Code:
 my $txt = mysql_query(select * from table where file_name = 'someName');

 open(TOFILE, ">original_file_contents") or die; 
 print TOFILE convert($txt) . "\n";
 close(TOFILE);
Is this something I can do?


Thanks

Last edited by GATTACA; 01-21-2008 at 12:15 PM. Reason: Wanted to highlight code snippet
 
Old 01-21-2008, 04:10 PM   #2
rednuht
Member
 
Registered: Aug 2005
Posts: 239
Blog Entries: 1

Rep: Reputation: 30
yes, that is quite reasonable.

In the title of your post you refer to MD5/SHA1, you do realise that is not encoding and that more than one file could have the same MD5/SHA1 hash ?
More importantly that you can not create a hash from a file delete the file and expect to recreate the file from the hash.

look at zlib (what gzip uses) for compression, I am sure there are Perl APIs for it.
 
Old 01-22-2008, 05:52 AM   #3
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,148

Rep: Reputation: 236Reputation: 236Reputation: 236
Hi

If all you want is compression/uncromression, MySql has functions you can use, so you don't need to do it in a script. But you should set the column type to BLOB since it's binary data. Here's an example:

CREATE TABLE files (filename VARCHAR(255),filecontent BLOB)
INSERT INTO files (filename,filecontent) VALUES ('thefilename.txt',COMPRESS('Lots of text here...'))
SELECT filename,UNCOMPRESS(filecontent) FROM files
 
Old 01-22-2008, 07:58 AM   #4
GATTACA
Member
 
Registered: Feb 2002
Location: USA
Distribution: Fedora, CENTOS
Posts: 201

Original Poster
Rep: Reputation: 32
Quote:
Originally Posted by Guttorm View Post
Hi

If all you want is compression/uncromression, MySql has functions you can use, so you don't need to do it in a script. But you should set the column type to BLOB since it's binary data. Here's an example:

CREATE TABLE files (filename VARCHAR(255),filecontent BLOB)
INSERT INTO files (filename,filecontent) VALUES ('thefilename.txt',COMPRESS('Lots of text here...'))
SELECT filename,UNCOMPRESS(filecontent) FROM files
Thanks for the responses!

I haven't found a perl library that compresses the text the way I was looking for. The closest I got was the encode_base64() function in the MIME::Base64 module. The zlib library seems to be for binary compression and I want to keep the resulting data in ascii or utf8.

Guttorm:
So how do you pass in the whole file into the table?
Code:
INSERT INTO files (filename, filecontent) VALUES ('thefilename.txt', COMPRESS(LOAD DATA INFILE '/abs/path/to/file')))
 
Old 01-22-2008, 09:35 AM   #5
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,148

Rep: Reputation: 236Reputation: 236Reputation: 236
Hi

I was thinking about just reading the file into a variable, and putting the content of the file in the query. You can't use LOAD DATA INFILE like that.

But your aim is a bit strange. You want to store files in the database, and compress them to save space, but you don't want it to be binary data. Why is that? If you actually want to do it, you can compress, then base64_encode, but I doubt you save a lot of space that way.

It seems to me you are confused about some things:
compress (zlib) compresses data, but the result will be binary data.
base64_encode is used to convert binary data to ascii data. The result is typically 30% more data.
md5sum/sha1sum makes hashes (checksums) of data. It is not encoding of any kind - you just get checksums, and you can't get the data back with just a checksum.

All of those things can be done with Perl or with MySQL functions. It doesn't matter much how you do it.

But keep in mind that a database is for structuring information. Unstructured data can be put in a database, but it's usually better to keep files as files.

Also, if this is something for like an archive of old data, MySql has a special table type, where everything is compressed with zlib, so you wont have to worry about compression/decompression. See: http://dev.mysql.com/doc/refman/5.0/...ge-engine.html

But those tables have some limitations, like no indexes, and you cant do UPDATE or DELETE. If it really is an archive, it can be very useful though.
 
Old 01-22-2008, 10:09 AM   #6
GATTACA
Member
 
Registered: Feb 2002
Location: USA
Distribution: Fedora, CENTOS
Posts: 201

Original Poster
Rep: Reputation: 32
You are right I'm a bit confused. (And my post is confusing too).
Thanks for the clarification.

The data files I'm trying to store are just like the example I posted originally. Each file represents 1 experiment. Ultimately we would like the user to be able to run queries like this:

Code:
SELECT file_name, experiment_data
FROM filetable
WHERE date = '20070810'
for example.

This would return 1 row for each experiment an the experiment_data would contain the entire contents of the file as one long string.

The user then just saves this output to a file and runs a perl script on it to convert the long string back into the contents of the file.

As you pointed out encode_base64() in perl results in output that is much longer than the original file. I was hoping there was another (similar) method that doesn't result in such long strings.

Does this make more sense?

Thanks.
 
Old 01-24-2008, 07:24 AM   #7
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,148

Rep: Reputation: 236Reputation: 236Reputation: 236
Hi again.

If you ask me, files like that don't belong in a database. Why not use files when that's what it is? You could then easily gzip/gunzip the files when needed. And things like incremental backups are a harder when the stuff is in the database.

But if you really need to put it in a MySQL database, why not let the database do the compression?

CREATE TABLE filetable (
filename VARCHAR(255) NOT NULL ,
experiment_data TEXT NOT NULL
) ENGINE = archive;

Note the "ENGINE = archive" - now all the data will be compressed/uncomressed without you or scripts having to do anything.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
md5sum .vs. SHA1SUM thelonius Linux - Software 2 07-18-2006 08:44 AM
sha1sum frolicchap Fedora 3 01-22-2006 06:48 AM
sha1sum check? help schitzo1978 Linux - Newbie 1 12-18-2005 10:19 PM
computing SHA1SUM mkvemuri Linux - General 2 07-31-2005 11:12 PM
MD5 and SHA1SUM Problem niztec Linux - Software 8 07-14-2005 09:36 PM


All times are GMT -5. The time now is 06:02 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration