LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Lzma Vs Bzip2 (Stability and performance) (https://www.linuxquestions.org/questions/linux-software-2/lzma-vs-bzip2-stability-and-performance-4175452058/)

alaios 02-28-2013 05:41 AM

Lzma Vs Bzip2 (Stability and performance)
 
Dear all,
I have been reading the following link, that is about lzma performance compred to bzip2.

Lzma Vs Bzip2 – Better Compression than bzip2 on UNIX / Linux

I would like to ask you if you have the same feeling that the article claims that lzma can achieve a better of compressed ratio. Moreover, I would like to ask you if the lzma is well supported in linux (in terms that I can download it and start using it, without major and minor bugs floating around).I

I would like to thank you in advance for your reply

Regards
Alex

kooru 02-28-2013 07:24 AM

I don't see the link into your post :)

H_TeXMeX_H 02-28-2013 08:04 AM

I recommend LZMA2 = xz. Yes it has been well tested and should be mostly bug-free. If you want the best compression algorithm available, then this is it.

On a daily basis, however, I still use gz, because it is much faster and I've got plenty of space on the disk.

alaios 02-28-2013 10:35 AM

Hi,
thanks for the answer.
In my opensuse I have actually almost all the tools
man lzma returns at header
Quote:

XZ(1) XZ Utils XZ(1)



NAME
xz, unxz, xzcat, lzma, unlzma, lzcat - Compress or decompress .xz and
.lzma files

1. Is that what you mean?
2. I just wonder if tar and the aforementioned algorithms were made with large files in mind. For example, I have a tar of 1.2 TB that should be largely compressible, file includes mostly sequences of 0s and 1s.... I have found though in the man lzma the following text

Quote:

Level xz LZMA Utils 4.32.x
-0 3 MiB N/A
-1 9 MiB 2 MiB
-2 17 MiB 12 MiB
-3 32 MiB 12 MiB
-4 48 MiB 16 MiB
-5 94 MiB 26 MiB
-6 94 MiB 45 MiB
-7 186 MiB 83 MiB
-8 370 MiB 159 MiB
-9 674 MiB 311 MiB

.....
Column descriptions:

· DictSize is the LZMA2 dictionary size. It is waste of memory to use a dictionary bigger than the size of the uncompressed file.
This is why it is good to avoid using the presets -7 ... -9 when there's no real need for them. At -6 and lower, the amount of
memory wasted is usually low enough to not matter.



In my case the -9 setting should be okay as my file is way larger that the dictionary (check bolds). I just wonder though if these compression schemes and tar were made for so large files
Alex

designator 02-28-2013 10:45 AM

If you have enough RAM and a fast processor, lzma2 is great. You can also play with dictionary size to get even better ratios especially since you say your files have a lot of repetitive sequences.

The easiest way to find out is to try both. In my experience with everyday files you want to compress (backups,) the difference between bzip2 and lzma2 is minimal and rarely justifies the extra time and memory lzma2 requires.

jefro 02-28-2013 02:52 PM

It would depend on data to be compressed as inferred by designator. My guess is that most people could depend on LZMA to provide better results in general data if newish system.

It is quite easy to use older tools and provide backward support for older systems. That may be a reason to use older means.

Also consider the ability of your apps to use smp. The newer tools tend to use multiple cpu's much better.

H_TeXMeX_H 03-01-2013 02:07 AM

Wow, 1.2 TB ... that will probably take a long time to compress (days). Maybe try gzip first, because it is much faster.

alaios 03-05-2013 05:22 AM

Hi I have did with gzip -9 as it finishes in only 4 days...

jefro 03-05-2013 04:56 PM

Ouch! "gzip -9 as it finishes in only 4 days..."

alaios 03-06-2013 01:51 AM

Indeed I had the xz also running at the first four days. It had only written 44GigaBytes and as the source was 1.2T I though this will takes months to finish :p

Mr. Alex 03-06-2013 07:09 AM

LZMA is definitely better than bzip2. But it's also a lot slower.

Actually, there are barely any tasks that require compression these days. Algorithms like gzip/bzip2/lzma are designed to compress text, not binary data. So it's good for, say, a lot of source code (Linux kernel). But when you try to compress images, videos, other binary stuff, you just waste your time because processing several GB of that data will take you a lot of time and you will only save maybe a couple of megabytes of space... And then you'll transfer it through Internet with speed like 3 MB/s. Thus it will save you about 1 second of transferring and take several hours to compress and then decompress.

H_TeXMeX_H 03-06-2013 07:47 AM

This benchmark is relevant:
http://stephane.lesimple.fr/blog/201...-reloaded.html

jefro 03-06-2013 03:46 PM

Those kind of numbers should make you look at using way lower compression settings. Seems your system is pretty much low on resources during all of this. All the compression tools rely on processor(s). Newer tools might rely on smp above 4 cores.

I use some very very (very) old computers to back up a qnx system. Even at 1.5G it gets done in 2 or 3 hours.


All times are GMT -5. The time now is 08:18 AM.