LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-29-2007, 09:17 PM   #1
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Rep: Reputation: 77
Linux Compression


OK - I am sure there are a bunch of man pages and guides that cover this but I would like to discuss this here and clearify some confusion I have.

According to basic Linux administration, there seem to be 3 popular compression options.

1 - tar.gz
2 - bzip2
3 - gzip

Now I did try and do some research before just asking a question and it appears that tar was a way of simply packaging files to one file with little or zero compression. I know you can add or create a tar.gz file with a compression option but my main question is what is the best way to get the best compression on a directory that has multiple sub directories and files?

Which format do you guys suggest I use as a Linux Admin for backup and archiving purposes? Which will give me the most compression?

Thanks for any help!
 
Old 01-29-2007, 09:46 PM   #2
ilikejam
Senior Member
 
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109

Rep: Reputation: 97
Hi.

Generally bzip2 has better compression than gzip, so you should probably use a .tar.bz2 for a directory tree.

e.g. tar cjvf mydir.tar.bz2 /path/to/mydir/

gzip compresses and decompresses faster, though...

Here's a quite comprehensive review:
http://brej.org/compression/

Dave

Last edited by ilikejam; 01-29-2007 at 09:50 PM.
 
Old 01-29-2007, 11:19 PM   #3
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Depends on what you're compressing as well. Take for instance a database dump, you'll see better compression ratios than if you were to compress a bunch of random files and especially photos and media files, which are binary and are already compressed.

For instance, at my present employer, we are currently using gzip to compress our MySQL dumps, at an average we're seeing anywhere from 5:1 to 9:1 compression ratio, which is actually quite good due to databases having a lot of repetive data, etc.
 
Old 01-30-2007, 01:17 AM   #4
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,290

Rep: Reputation: 378Reputation: 378Reputation: 378Reputation: 378
My experience on large (over 1 TB) data sets is that at the default compressions levels bzip2 is much slower than gzip -- on the order of 2 - 3 time as slow. If compression speed as well as ratio is of concern then this might be a factor in your decision.
 
Old 01-30-2007, 08:40 PM   #5
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Original Poster
Rep: Reputation: 77
Quote:
Originally Posted by ilikejam
Hi.

Generally bzip2 has better compression than gzip, so you should probably use a .tar.bz2 for a directory tree.

e.g. tar cjvf mydir.tar.bz2 /path/to/mydir/

gzip compresses and decompresses faster, though...

Here's a quite comprehensive review:
http://brej.org/compression/

Dave
OK - So I am wanting to backup and compress the following directory I have under /tmp called "old_users".

Code:
[root@lt2fs1 tmp]# ls -lh
total 164K
drwx------   2 root    root   16K Jan 18  2006 lost+found
drwxr-xr-x  35 root    root  4.0K Jan 29 22:00 old_users
That dir is about 3GB in size...

Code:
3.7G    old_users/
I am attempting to compress and tar the directory as bz2 but don't understand from the command below (if it's even correct) where the bzip2 file will be stored once created. How would the command look like if I wanted to specify a location for the compressed file once created rather than under /tmp?

Code:
tar cjvf old_users.bz2 /tmp/old_users/
 
Old 01-30-2007, 10:46 PM   #6
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,290

Rep: Reputation: 378Reputation: 378Reputation: 378Reputation: 378
tar cjvf /path/to/old_users.tar.bz2 /tmp/old_users

where /path/to is the directory you want to put it in.

It's a good idea to put a .tar.bz2 (or tbz2) suffix on the file to remind yourself it's a bzipped tarball.
 
Old 01-31-2007, 12:04 AM   #7
IBall
Senior Member
 
Registered: Nov 2003
Location: Perth, Western Australia
Distribution: Ubuntu, Debian, Various using VMWare
Posts: 2,088

Rep: Reputation: 62
From memory, the reason we use tar.gz or tar.bz2 is because Gzip and Bzip2 can only deal with a single file, unlike zip.

Therefore, when compressing a set of files, use tar to make them into a single file, and then compress with either Gzip or Bzip2 (or use the flags in tar to do it as one command).

Gzip is faster, BZip2 is slower. BZip2 creates smaller archives than Gzip.

--Ian
 
Old 01-31-2007, 04:46 AM   #8
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by btmiller
My experience on large (over 1 TB) data sets is that at the default compressions levels bzip2 is much slower than gzip -- on the order of 2 - 3 time as slow. If compression speed as well as ratio is of concern then this might be a factor in your decision.
Anything over a GB I opt to use rzip, handles large filesystems much better than gzip and bzip.
 
Old 01-31-2007, 08:15 AM   #9
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Original Poster
Rep: Reputation: 77
OK - lets say you're backing up your home directory "/home/<user>" so I would then run the command:

Code:
tar cjvf home.bz2 /home/user
Now where does this place the file once it's compressed? Does it place it in my home directory or does it make the compressed file in the current directory I am in when I run the command?
 
Old 01-31-2007, 08:38 AM   #10
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
It puts it in the directory specified in the command!!!

"home.bz2" means /path/path/hereinthisdirectorywhereweare/home.bz2
(This is alwasy true---not just in tar)
Note: If you are IN your home directory---eg /home/user---then just type tar -cjvf home.bz2 *

'nother note: better to name the target home.tar.bz2
 
Old 01-31-2007, 10:25 AM   #11
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
tar is ubiquitous on unix-like OSes, but it has it's limitations. The tar format doesn't include compression itself, but as a convenience the GNU tar utility can compress a tar file with gzip, bzip2, or old-school unix compression (the -z, -j and -Z options). You can also use some arbitrary compression program so long as it supports the right usage on the command line.

There are problems with this. If you have several large files which you want to put in a tar file which are already compressed (e.g. ogg/vorbis, mpegs etc), compressing the resulting .tar file will waste a lot of CPU and not achieve very much.

There are other formats, like dar, which can address this by allowing compression or not internal to the dar format.

Dar is nice, and for me it seems like a good tar-killer (I don't need tar's sequential access options which lots of people still use for tape backups), but it's not reasonable to expect dar to be installed. Thus .tar.{gz,bz2} looks set to remain common as a distribution mechanism for a while yet.
 
Old 01-31-2007, 10:25 AM   #12
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Original Poster
Rep: Reputation: 77
Quote:
Originally Posted by btmiller
It's a good idea to put a .tar.bz2 (or tbz2) suffix on the file to remind yourself it's a bzipped tarball.
Last question - I don't understand what that means above.

Why not just called it whatever.bz2 rather than whatever.tar.bz2?
 
Old 01-31-2007, 10:32 AM   #13
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
.bz2 is appended to the name of any file which is compressed using the bzip2 program. doing this:
Code:
$ tar jcf tarfile.tar.bz2 ...
is the same as doing this:
Code:
$ tar cf tarfile.tar ...
$ bzip2 tarfile.tar
 
Old 01-31-2007, 04:18 PM   #14
patrick295767
Member
 
Registered: Feb 2006
Distribution: FreeBSD, Linux, Slackware, LFS, Gparted
Posts: 664

Rep: Reputation: 138Reputation: 138
I made a test of that with Total commander :

and ACE compression won !!!!

the second was rar a bit further but still very good

... zip, .... were behind.

(tar.gz ... dont know)


Quote:
debian linux frontend:
apt-get -f install xarchive ark file-roller

Last edited by patrick295767; 01-31-2007 at 04:20 PM.
 
Old 01-31-2007, 04:35 PM   #15
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
"won"? Based on what criteria? With what data sets? What about memory footprint and CPU usage? How asymmetric is the compression/decompression resource use? Is the algorithm patent-encumbered? Is there a reliable open source implementation? How many platforms does each format have good implementation on? Is there a usable library for each format?

There are a lot of things to consider before you say a format "wins".
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is Linux clustering for Video compression possible? A6Quattro Linux - General 1 07-17-2005 01:05 PM
Linux compression is REAL impressive!! linuxhippy Slackware 4 03-14-2005 09:18 PM
Came a-cross some linux compression formats but not sure how to open them in Windows maximalred Linux - Distributions 1 06-09-2004 06:13 AM
best compression drigz Linux - Software 2 06-05-2004 07:38 AM
On-the-Fly Drive / Directory Compression for Linux? sb73542 Linux - Software 2 02-01-2004 08:01 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 05:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration