LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-12-2006, 10:20 PM   #1
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Rep: Reputation: 45
Does TAR use any compression?


Does using the TAR -cf command actually use any sort of compression? I do incremental backups every day and they still seem pretty big, even without contents in them, the directory listings of the folders and sub-folders already comes out to about 100-200MB. I even tried using TAR -zcf (zip) and it's still the same size. I then compared using the 7Z program to compress and it totally compresses my files, about 50% better than TAR. What's going on? 7Z just has better algorithm to compress, or is TAR not even compressing?
 
Old 12-12-2006, 10:32 PM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
I don't think tar by itself does any compression, but the "z" option invokes gzip. Other schemes include bzip and (new to me) 7zip.

Compression depends a lot on what the files are....I would simply try some experiments.
 
Old 12-12-2006, 10:37 PM   #3
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
This doesn't compress at all:
Code:
tar cf file.tar files ...
This compresses using the gzip algorithm:
Code:
tar czf file.tar files ...
This compresses using the bzip2 algorithm (slower but better compression):
Code:
tar czf file.tar files ...
*edit* see pixellany's correction below

Note that appending files to an existing .tar archive where the archive already contains those files will result in the tar file having multiple versions of the file in the archive. This is useful for backups where you want to be able to retrieve a file from a specific date, but it might be mis-leading you in your comparison if that is what you are doing.

Last edited by matthewg42; 12-12-2006 at 10:44 PM. Reason: note correction
 
Old 12-12-2006, 10:39 PM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
cjf for bzip2??
 
Old 12-12-2006, 10:42 PM   #5
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
Quote:
Originally Posted by pixellany
cjf for bzip2??
oops - the dangers of copy-paste, and too little sleep! *blush*

To use bzip2 compression:
Code:
tar cjf file.tar files ...

Last edited by matthewg42; 12-12-2006 at 10:43 PM. Reason: make it better
 
Old 12-12-2006, 11:20 PM   #6
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Original Poster
Rep: Reputation: 45
Good to know! In my case, I was wondering why anyone would use TAR since the file size wasn't significantly reduced, if at all! I will use the gzip compress (tar -zcf).

It's a long story, but I couldn't compare tar -cf with tar -czf because I was backing up to an external drive formatted using the VFAT filesystem, thus my max file sizes were maxing out at 4.5GB. Of course, I didn't know that VFAT had a max file size limit of 4.5GB, so I thought tar -cf and tar -czf made no difference. I reformatted my external hard drive with ext3 so I should be able to see if there will be a difference by the end of this week.

Last edited by Micro420; 12-12-2006 at 11:29 PM.
 
Old 12-12-2006, 11:42 PM   #7
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
tar was originally written for, and still useful for archiving to tape drives (the name is from "tape archive". tar files have an internal arrangement which means tape drives minimise the amount or winding forward and backward necessary to perform archiving operations. zip files are not suitable for this, and lacks other features which make it cumbersome or simply inappropriate for doing serious archiving.

The drawback of tar's approach is that the whole tar file needs to be read to show a listing of what files it contains, whereas zip files contain the "table of contents" in one chunk which can be read quickly.

The formats are designed for somewhat different purposes, and have different qualities accordingly.

One reason tar files are used so frequently is that they are supported on lots and lots of *nix systems "out of the box", so it's good for distributing files to multiple *nix platforms. I think tar probably pre-dates zip by quite a few moons.

If you're looking for something which takes some of the benefits of zip in a more tar-like format, have a look at dar. dar has some very nice features.
 
Old 12-13-2006, 12:02 AM   #8
edenCC
Member
 
Registered: May 2006
Location: China
Distribution: Debian
Posts: 198
Blog Entries: 1

Rep: Reputation: 32
Quote:
Originally Posted by matthewg42
oops - the dangers of copy-paste, and too little sleep! *blush*

To use bzip2 compression:
Code:
tar cjf file.tar files ...
pay attention to the suffix..
tar czf abc.tgz FILE_OR_DIRECTORY_LIST
tar czf abc.tar.gz FILE_OR_DIRECTORY_LIST
tar cjf abc.tar.bz2 FILE_OR_DIRECTORY_LIST
 
Old 12-13-2006, 10:50 AM   #9
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Original Poster
Rep: Reputation: 45
Update: HOLY COW! Using gzip is taking forever to compress and save onto an external USB disk. It's been running for about 12-hours trying to compress 3.7GB. I guess this is my fault for backing up onto an external hard drive using USB 1.1.

I'm thinking of just using rsync to transfer modified files over to a remote computer, and then running gzip locally on the machine since this USB 1.1 is too slow.

Last edited by Micro420; 12-13-2006 at 10:52 AM.
 
Old 12-13-2006, 11:04 AM   #10
nx5000
Senior Member
 
Registered: Sep 2005
Location: Out
Posts: 3,307

Rep: Reputation: 57
http://tukaani.org/lzma/benchmarks

7z has the best mean compression ratio but it's also the slowest. Its a trade-off.
If I remember well, 7z will be the next official format for new kernels on www.kernel.org.
 
Old 12-13-2006, 11:16 AM   #11
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
You can speed up or slow down gzip - it has a setting to tune the speed / compression tradeoff.

For the gzip command-line tool this is done with the -1 ... -9 options (-1 is the fastest, -9 is the best compression). The bzip2 command-line tool provides similar options.

I'm not sure if there is a way to pass this through tar.
 
Old 12-13-2006, 01:03 PM   #12
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Original Poster
Rep: Reputation: 45
Quote:
Originally Posted by matthewg42
You can speed up or slow down gzip - it has a setting to tune the speed / compression tradeoff.

For the gzip command-line tool this is done with the -1 ... -9 options (-1 is the fastest, -9 is the best compression). The bzip2 command-line tool provides similar options.

I'm not sure if there is a way to pass this through tar.
Thanks for the info. Unfortunately I think it is my slow USB 1.1 external hard drive that is the bottleneck. I will have to get myself a USB 2.0 PCI card. Ubuntu Linux should be able to recognize this USB 2.0 PCI card, right?

Quote:
Originally Posted by nx5000
http://tukaani.org/lzma/benchmarks

7z has the best mean compression ratio but it's also the slowest. Its a trade-off.
If I remember well, 7z will be the next official format for new kernels on www.kernel.org.
I really do like 7z's compression. Yes, it is slow, especially when I use the highest compression method, but man, this thing can really compress! I actually use it for my Windows 2000 server to do differential backups remotely. The compression saves me bandwidth. I was able to compress a 100MB Microsoft Access file down to just 10MB. I have had limited success with it in Linux, unfortunately. It sometimes gives me errors when I try to do differential backups. I contacted the author, Igor Pavlov, but he doesn't have a fix for this yet.

Last edited by Micro420; 12-13-2006 at 01:07 PM.
 
Old 12-13-2006, 01:24 PM   #13
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
Quote:
Originally Posted by Micro420
Thanks for the info. Unfortunately I think it is my slow USB 1.1 external hard drive that is the bottleneck. I will have to get myself a USB 2.0 PCI card. Ubuntu Linux should be able to recognize this USB 2.0 PCI card, right?
Should do. Works for me. I have an old digital camera which uses USB 1, and it's SO slow, but with a 2.5" HDD in a USB 2.0 caddy it seems nearly as fast as my internal disk. I'm using Ubuntu 6.10.

Quote:
Originally Posted by Micro420
I really do like 7z's compression. Yes, it is slow, especially when I use the highest compression method, but man, this thing can really compress! I actually use it for my Windows 2000 server to do differential backups remotely. The compression saves me bandwidth. I was able to compress a 100MB Microsoft Access file down to just 10MB. I have had limited success with it in Linux, unfortunately. It sometimes gives me errors when I try to do differential backups. I contacted the author, Igor Pavlov, but he doesn't have a fix for this yet.
I'm not familiar with 7zip. I think I saw it mentioned on The Open CD's site some years ago when I was trying to spread the Open Source word to some friends still using Windows...

It's a Free Software app in the style of WinZip isn't it? Does it only run on win32?

It's been so long since I've had a Windows machine, I'm a bit out of touch.

Anyhow, check out bzip2, it get's significantly better compression than gzip at the cost of CPU time (and probably memory use) during compression.

Do a test... take your 100Mb Access database, compress it with 7zip, compress it with:
Code:
gzip -9
and with
Code:
bzip2 -9
And post the sizes here. I'm curious to see how they compare.
 
Old 12-13-2006, 01:49 PM   #14
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Original Poster
Rep: Reputation: 45
7-zip is for Linux and for Windows. There's just that little bug in Linux which makes me not use it as a differential backup source. Other than that, it works fine in both OS's and I don't have any hesitations using it.

Here are the results for compression, although not completely objective since I used the 7-zip in Windows rather than in Linux (if you really want to see, I can install 7zip on my Linux machine):
Source file is a 104MB Microsoft Access database file

7-zip (Windows version 4.42), ULTRA compression settings (compression method: LZMA, dictionary size: 64MB, Word size: 64)
Compressed size: 25.4MB
Time it took to compress: ~3:45

bzip2 running command: time bzip2 -z -9 filename.mdb
Compressed size: 29MB
Time it took to compress as reported by `time`: 1:18

gzip running command: time gzip -9 filename.mdb
Compressed size: 45MB
Time it took to compress as reported by `time`: 45 seconds
 
Old 12-13-2006, 02:07 PM   #15
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
Thanks, that's a nice comparison. If 7zip could provide a utility with a similar interface to gzip/bzip2, tar could use it (it provides for the use of "external" compression programs which support such an interface).
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how can i decompress this tar.tar file? hmmm sounds new.. tar.tar.. help ;) kublador Linux - Software 14 10-25-2016 02:48 AM
tar tar cvf - . | (cd /root/; tar xvf -) ewt3y Linux - General 10 02-19-2014 10:55 AM
.rpms, .tar.gz, .tgz, .src.rpm, & .tar.bz2 whoots Mandriva 10 10-18-2003 12:08 PM
problem unzipping a tar.bz2 file tar: Error is not recov jyome Linux - Software 4 09-04-2003 01:04 PM
Diferance between rpm, tar, tar.gz, scr.tar, etc mobassir Linux - General 12 08-21-2003 06:30 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration