Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
OK - I am sure there are a bunch of man pages and guides that cover this but I would like to discuss this here and clearify some confusion I have.
According to basic Linux administration, there seem to be 3 popular compression options.
1 - tar.gz
2 - bzip2
3 - gzip
Now I did try and do some research before just asking a question and it appears that tar was a way of simply packaging files to one file with little or zero compression. I know you can add or create a tar.gz file with a compression option but my main question is what is the best way to get the best compression on a directory that has multiple sub directories and files?
Which format do you guys suggest I use as a Linux Admin for backup and archiving purposes? Which will give me the most compression?
Depends on what you're compressing as well. Take for instance a database dump, you'll see better compression ratios than if you were to compress a bunch of random files and especially photos and media files, which are binary and are already compressed.
For instance, at my present employer, we are currently using gzip to compress our MySQL dumps, at an average we're seeing anywhere from 5:1 to 9:1 compression ratio, which is actually quite good due to databases having a lot of repetive data, etc.
My experience on large (over 1 TB) data sets is that at the default compressions levels bzip2 is much slower than gzip -- on the order of 2 - 3 time as slow. If compression speed as well as ratio is of concern then this might be a factor in your decision.
OK - So I am wanting to backup and compress the following directory I have under /tmp called "old_users".
Code:
[root@lt2fs1 tmp]# ls -lh
total 164K
drwx------ 2 root root 16K Jan 18 2006 lost+found
drwxr-xr-x 35 root root 4.0K Jan 29 22:00 old_users
That dir is about 3GB in size...
Code:
3.7G old_users/
I am attempting to compress and tar the directory as bz2 but don't understand from the command below (if it's even correct) where the bzip2 file will be stored once created. How would the command look like if I wanted to specify a location for the compressed file once created rather than under /tmp?
Distribution: Ubuntu, Debian, Various using VMWare
Posts: 2,088
Rep:
From memory, the reason we use tar.gz or tar.bz2 is because Gzip and Bzip2 can only deal with a single file, unlike zip.
Therefore, when compressing a set of files, use tar to make them into a single file, and then compress with either Gzip or Bzip2 (or use the flags in tar to do it as one command).
Gzip is faster, BZip2 is slower. BZip2 creates smaller archives than Gzip.
My experience on large (over 1 TB) data sets is that at the default compressions levels bzip2 is much slower than gzip -- on the order of 2 - 3 time as slow. If compression speed as well as ratio is of concern then this might be a factor in your decision.
Anything over a GB I opt to use rzip, handles large filesystems much better than gzip and bzip.
OK - lets say you're backing up your home directory "/home/<user>" so I would then run the command:
Code:
tar cjvf home.bz2 /home/user
Now where does this place the file once it's compressed? Does it place it in my home directory or does it make the compressed file in the current directory I am in when I run the command?
It puts it in the directory specified in the command!!!
"home.bz2" means /path/path/hereinthisdirectorywhereweare/home.bz2
(This is alwasy true---not just in tar)
Note: If you are IN your home directory---eg /home/user---then just type tar -cjvf home.bz2 *
'nother note: better to name the target home.tar.bz2
tar is ubiquitous on unix-like OSes, but it has it's limitations. The tar format doesn't include compression itself, but as a convenience the GNU tar utility can compress a tar file with gzip, bzip2, or old-school unix compression (the -z, -j and -Z options). You can also use some arbitrary compression program so long as it supports the right usage on the command line.
There are problems with this. If you have several large files which you want to put in a tar file which are already compressed (e.g. ogg/vorbis, mpegs etc), compressing the resulting .tar file will waste a lot of CPU and not achieve very much.
There are other formats, like dar, which can address this by allowing compression or not internal to the dar format.
Dar is nice, and for me it seems like a good tar-killer (I don't need tar's sequential access options which lots of people still use for tape backups), but it's not reasonable to expect dar to be installed. Thus .tar.{gz,bz2} looks set to remain common as a distribution mechanism for a while yet.
"won"? Based on what criteria? With what data sets? What about memory footprint and CPU usage? How asymmetric is the compression/decompression resource use? Is the algorithm patent-encumbered? Is there a reliable open source implementation? How many platforms does each format have good implementation on? Is there a usable library for each format?
There are a lot of things to consider before you say a format "wins".
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.