Linux Compression
OK - I am sure there are a bunch of man pages and guides that cover this but I would like to discuss this here and clearify some confusion I have.
According to basic Linux administration, there seem to be 3 popular compression options. 1 - tar.gz 2 - bzip2 3 - gzip Now I did try and do some research before just asking a question and it appears that tar was a way of simply packaging files to one file with little or zero compression. I know you can add or create a tar.gz file with a compression option but my main question is what is the best way to get the best compression on a directory that has multiple sub directories and files? Which format do you guys suggest I use as a Linux Admin for backup and archiving purposes? Which will give me the most compression? Thanks for any help! |
Hi.
Generally bzip2 has better compression than gzip, so you should probably use a .tar.bz2 for a directory tree. e.g. tar cjvf mydir.tar.bz2 /path/to/mydir/ gzip compresses and decompresses faster, though... Here's a quite comprehensive review: http://brej.org/compression/ Dave |
Depends on what you're compressing as well. Take for instance a database dump, you'll see better compression ratios than if you were to compress a bunch of random files and especially photos and media files, which are binary and are already compressed.
For instance, at my present employer, we are currently using gzip to compress our MySQL dumps, at an average we're seeing anywhere from 5:1 to 9:1 compression ratio, which is actually quite good due to databases having a lot of repetive data, etc. |
My experience on large (over 1 TB) data sets is that at the default compressions levels bzip2 is much slower than gzip -- on the order of 2 - 3 time as slow. If compression speed as well as ratio is of concern then this might be a factor in your decision.
|
Quote:
Code:
[root@lt2fs1 tmp]# ls -lh Code:
3.7G old_users/ Code:
tar cjvf old_users.bz2 /tmp/old_users/ |
tar cjvf /path/to/old_users.tar.bz2 /tmp/old_users
where /path/to is the directory you want to put it in. It's a good idea to put a .tar.bz2 (or tbz2) suffix on the file to remind yourself it's a bzipped tarball. |
From memory, the reason we use tar.gz or tar.bz2 is because Gzip and Bzip2 can only deal with a single file, unlike zip.
Therefore, when compressing a set of files, use tar to make them into a single file, and then compress with either Gzip or Bzip2 (or use the flags in tar to do it as one command). Gzip is faster, BZip2 is slower. BZip2 creates smaller archives than Gzip. --Ian |
Quote:
|
OK - lets say you're backing up your home directory "/home/<user>" so I would then run the command:
Code:
tar cjvf home.bz2 /home/user |
It puts it in the directory specified in the command!!!
"home.bz2" means /path/path/hereinthisdirectorywhereweare/home.bz2 (This is alwasy true---not just in tar) Note: If you are IN your home directory---eg /home/user---then just type tar -cjvf home.bz2 * 'nother note: better to name the target home.tar.bz2 |
tar is ubiquitous on unix-like OSes, but it has it's limitations. The tar format doesn't include compression itself, but as a convenience the GNU tar utility can compress a tar file with gzip, bzip2, or old-school unix compression (the -z, -j and -Z options). You can also use some arbitrary compression program so long as it supports the right usage on the command line.
There are problems with this. If you have several large files which you want to put in a tar file which are already compressed (e.g. ogg/vorbis, mpegs etc), compressing the resulting .tar file will waste a lot of CPU and not achieve very much. There are other formats, like dar, which can address this by allowing compression or not internal to the dar format. Dar is nice, and for me it seems like a good tar-killer (I don't need tar's sequential access options which lots of people still use for tape backups), but it's not reasonable to expect dar to be installed. Thus .tar.{gz,bz2} looks set to remain common as a distribution mechanism for a while yet. |
Quote:
Why not just called it whatever.bz2 rather than whatever.tar.bz2? |
.bz2 is appended to the name of any file which is compressed using the bzip2 program. doing this:
Code:
$ tar jcf tarfile.tar.bz2 ... Code:
$ tar cf tarfile.tar ... |
I made a test of that with Total commander :
and ACE compression won !!!! the second was rar a bit further but still very good ... zip, .... were behind. (tar.gz ... dont know) Quote:
|
"won"? Based on what criteria? With what data sets? What about memory footprint and CPU usage? How asymmetric is the compression/decompression resource use? Is the algorithm patent-encumbered? Is there a reliable open source implementation? How many platforms does each format have good implementation on? Is there a usable library for each format?
There are a lot of things to consider before you say a format "wins". |
All times are GMT -5. The time now is 04:12 AM. |