encode text with md5sum or sha1sum
I have these files that look something like this:
I need to store the contents of entire file in a MySQL database. I was wondering if there was a way to compress/re-encode the contents of the file and then store the result. That way if I needed the file I could just de-code/convert the format.
I'm envisioning something like:
[ In Perl script calling database function via DBI ]
yes, that is quite reasonable.
In the title of your post you refer to MD5/SHA1, you do realise that is not encoding and that more than one file could have the same MD5/SHA1 hash ?
More importantly that you can not create a hash from a file delete the file and expect to recreate the file from the hash.
look at zlib (what gzip uses) for compression, I am sure there are Perl APIs for it.
If all you want is compression/uncromression, MySql has functions you can use, so you don't need to do it in a script. But you should set the column type to BLOB since it's binary data. Here's an example:
CREATE TABLE files (filename VARCHAR(255),filecontent BLOB)
INSERT INTO files (filename,filecontent) VALUES ('thefilename.txt',COMPRESS('Lots of text here...'))
SELECT filename,UNCOMPRESS(filecontent) FROM files
I haven't found a perl library that compresses the text the way I was looking for. The closest I got was the encode_base64() function in the MIME::Base64 module. The zlib library seems to be for binary compression and I want to keep the resulting data in ascii or utf8.
So how do you pass in the whole file into the table?
I was thinking about just reading the file into a variable, and putting the content of the file in the query. You can't use LOAD DATA INFILE like that.
But your aim is a bit strange. You want to store files in the database, and compress them to save space, but you don't want it to be binary data. Why is that? If you actually want to do it, you can compress, then base64_encode, but I doubt you save a lot of space that way.
It seems to me you are confused about some things:
compress (zlib) compresses data, but the result will be binary data.
base64_encode is used to convert binary data to ascii data. The result is typically 30% more data.
md5sum/sha1sum makes hashes (checksums) of data. It is not encoding of any kind - you just get checksums, and you can't get the data back with just a checksum.
All of those things can be done with Perl or with MySQL functions. It doesn't matter much how you do it.
But keep in mind that a database is for structuring information. Unstructured data can be put in a database, but it's usually better to keep files as files.
Also, if this is something for like an archive of old data, MySql has a special table type, where everything is compressed with zlib, so you wont have to worry about compression/decompression. See: http://dev.mysql.com/doc/refman/5.0/...ge-engine.html
But those tables have some limitations, like no indexes, and you cant do UPDATE or DELETE. If it really is an archive, it can be very useful though.
You are right I'm a bit confused. (And my post is confusing too).
Thanks for the clarification.
The data files I'm trying to store are just like the example I posted originally. Each file represents 1 experiment. Ultimately we would like the user to be able to run queries like this:
This would return 1 row for each experiment an the experiment_data would contain the entire contents of the file as one long string.
The user then just saves this output to a file and runs a perl script on it to convert the long string back into the contents of the file.
As you pointed out encode_base64() in perl results in output that is much longer than the original file. I was hoping there was another (similar) method that doesn't result in such long strings.
Does this make more sense?
If you ask me, files like that don't belong in a database. Why not use files when that's what it is? You could then easily gzip/gunzip the files when needed. And things like incremental backups are a harder when the stuff is in the database.
But if you really need to put it in a MySQL database, why not let the database do the compression?
CREATE TABLE filetable (
filename VARCHAR(255) NOT NULL ,
experiment_data TEXT NOT NULL
) ENGINE = archive;
Note the "ENGINE = archive" - now all the data will be compressed/uncomressed without you or scripts having to do anything.
|All times are GMT -5. The time now is 03:29 PM.|