LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Linux Compression (https://www.linuxquestions.org/questions/linux-general-1/linux-compression-523915/)

carlosinfl 01-29-2007 09:17 PM

Linux Compression
 
OK - I am sure there are a bunch of man pages and guides that cover this but I would like to discuss this here and clearify some confusion I have.

According to basic Linux administration, there seem to be 3 popular compression options.

1 - tar.gz
2 - bzip2
3 - gzip

Now I did try and do some research before just asking a question and it appears that tar was a way of simply packaging files to one file with little or zero compression. I know you can add or create a tar.gz file with a compression option but my main question is what is the best way to get the best compression on a directory that has multiple sub directories and files?

Which format do you guys suggest I use as a Linux Admin for backup and archiving purposes? Which will give me the most compression?

Thanks for any help!

ilikejam 01-29-2007 09:46 PM

Hi.

Generally bzip2 has better compression than gzip, so you should probably use a .tar.bz2 for a directory tree.

e.g. tar cjvf mydir.tar.bz2 /path/to/mydir/

gzip compresses and decompresses faster, though...

Here's a quite comprehensive review:
http://brej.org/compression/

Dave

trickykid 01-29-2007 11:19 PM

Depends on what you're compressing as well. Take for instance a database dump, you'll see better compression ratios than if you were to compress a bunch of random files and especially photos and media files, which are binary and are already compressed.

For instance, at my present employer, we are currently using gzip to compress our MySQL dumps, at an average we're seeing anywhere from 5:1 to 9:1 compression ratio, which is actually quite good due to databases having a lot of repetive data, etc.

btmiller 01-30-2007 01:17 AM

My experience on large (over 1 TB) data sets is that at the default compressions levels bzip2 is much slower than gzip -- on the order of 2 - 3 time as slow. If compression speed as well as ratio is of concern then this might be a factor in your decision.

carlosinfl 01-30-2007 08:40 PM

Quote:

Originally Posted by ilikejam
Hi.

Generally bzip2 has better compression than gzip, so you should probably use a .tar.bz2 for a directory tree.

e.g. tar cjvf mydir.tar.bz2 /path/to/mydir/

gzip compresses and decompresses faster, though...

Here's a quite comprehensive review:
http://brej.org/compression/

Dave

OK - So I am wanting to backup and compress the following directory I have under /tmp called "old_users".

Code:

[root@lt2fs1 tmp]# ls -lh
total 164K
drwx------  2 root    root  16K Jan 18  2006 lost+found
drwxr-xr-x  35 root    root  4.0K Jan 29 22:00 old_users

That dir is about 3GB in size...

Code:

3.7G    old_users/
I am attempting to compress and tar the directory as bz2 but don't understand from the command below (if it's even correct) where the bzip2 file will be stored once created. How would the command look like if I wanted to specify a location for the compressed file once created rather than under /tmp?

Code:

tar cjvf old_users.bz2 /tmp/old_users/

btmiller 01-30-2007 10:46 PM

tar cjvf /path/to/old_users.tar.bz2 /tmp/old_users

where /path/to is the directory you want to put it in.

It's a good idea to put a .tar.bz2 (or tbz2) suffix on the file to remind yourself it's a bzipped tarball.

IBall 01-31-2007 12:04 AM

From memory, the reason we use tar.gz or tar.bz2 is because Gzip and Bzip2 can only deal with a single file, unlike zip.

Therefore, when compressing a set of files, use tar to make them into a single file, and then compress with either Gzip or Bzip2 (or use the flags in tar to do it as one command).

Gzip is faster, BZip2 is slower. BZip2 creates smaller archives than Gzip.

--Ian

trickykid 01-31-2007 04:46 AM

Quote:

Originally Posted by btmiller
My experience on large (over 1 TB) data sets is that at the default compressions levels bzip2 is much slower than gzip -- on the order of 2 - 3 time as slow. If compression speed as well as ratio is of concern then this might be a factor in your decision.

Anything over a GB I opt to use rzip, handles large filesystems much better than gzip and bzip.

carlosinfl 01-31-2007 08:15 AM

OK - lets say you're backing up your home directory "/home/<user>" so I would then run the command:

Code:

tar cjvf home.bz2 /home/user
Now where does this place the file once it's compressed? Does it place it in my home directory or does it make the compressed file in the current directory I am in when I run the command?

pixellany 01-31-2007 08:38 AM

It puts it in the directory specified in the command!!!

"home.bz2" means /path/path/hereinthisdirectorywhereweare/home.bz2
(This is alwasy true---not just in tar)
Note: If you are IN your home directory---eg /home/user---then just type tar -cjvf home.bz2 *

'nother note: better to name the target home.tar.bz2

matthewg42 01-31-2007 10:25 AM

tar is ubiquitous on unix-like OSes, but it has it's limitations. The tar format doesn't include compression itself, but as a convenience the GNU tar utility can compress a tar file with gzip, bzip2, or old-school unix compression (the -z, -j and -Z options). You can also use some arbitrary compression program so long as it supports the right usage on the command line.

There are problems with this. If you have several large files which you want to put in a tar file which are already compressed (e.g. ogg/vorbis, mpegs etc), compressing the resulting .tar file will waste a lot of CPU and not achieve very much.

There are other formats, like dar, which can address this by allowing compression or not internal to the dar format.

Dar is nice, and for me it seems like a good tar-killer (I don't need tar's sequential access options which lots of people still use for tape backups), but it's not reasonable to expect dar to be installed. Thus .tar.{gz,bz2} looks set to remain common as a distribution mechanism for a while yet.

carlosinfl 01-31-2007 10:25 AM

Quote:

Originally Posted by btmiller
It's a good idea to put a .tar.bz2 (or tbz2) suffix on the file to remind yourself it's a bzipped tarball.

Last question - I don't understand what that means above.

Why not just called it whatever.bz2 rather than whatever.tar.bz2?

matthewg42 01-31-2007 10:32 AM

.bz2 is appended to the name of any file which is compressed using the bzip2 program. doing this:
Code:

$ tar jcf tarfile.tar.bz2 ...
is the same as doing this:
Code:

$ tar cf tarfile.tar ...
$ bzip2 tarfile.tar


patrick295767 01-31-2007 04:18 PM

I made a test of that with Total commander :

and ACE compression won !!!!

the second was rar a bit further but still very good

... zip, .... were behind.

(tar.gz ... dont know)


Quote:

debian linux frontend:
apt-get -f install xarchive ark file-roller

matthewg42 01-31-2007 04:35 PM

"won"? Based on what criteria? With what data sets? What about memory footprint and CPU usage? How asymmetric is the compression/decompression resource use? Is the algorithm patent-encumbered? Is there a reliable open source implementation? How many platforms does each format have good implementation on? Is there a usable library for each format?

There are a lot of things to consider before you say a format "wins".


All times are GMT -5. The time now is 04:12 AM.