How can I detect compressible data?

yaplej · 04-13-2011, 03:37 PM

Hello,

I am working on a project that takes multiple blocks of memory that are no more than 1500 bytes in side and tries to compress them. Currently I just compress them and compare the old length to the new length. If the new length is smaller it copies the compressed data into the original datas memory space.

The problem I have is when lots of these blocks of memory do not compress well its huge amounts of wasted CPU.

Is there any trick to detect how compressible some data is before using the CPU to compress it?

Thanks.

H_TeXMeX_H · 04-14-2011, 08:07 AM

I would use ent:
http://www.fourmilab.ch/random/

One thing it tells you is 'Optimum compression would reduce the size of this * character file by * percent.'

Of course, it depends on what language you are using, so basically you have to see if the data is random and how random. You may be able to tailor this to a certain compression algorithm.

theNbomr · 04-14-2011, 03:40 PM

Most non-lossy compressors work by finding and removing redundancies; those patterns or sequences that are repeated in some way. It seems likely to me that any thorough test would be justas compute intensive as the actual compression.

--- rod.

H_TeXMeX_H · 04-15-2011, 08:30 AM

Quote:

Originally Posted by theNbomr

Most non-lossy compressors work by finding and removing redundancies; those patterns or sequences that are repeated in some way. It seems likely to me that any thorough test would be justas compute intensive as the actual compression.

--- rod.

Yeah, it probably would, so I guess just ent would work better.

yaplej · 04-17-2011, 12:52 AM

I was hoping there would be some way to get an idea of how compressible the data just using a little CPU before the "compress then test" method that I am doing now. It would have to be very quick test though to really make it worth using.

H_TeXMeX_H · 04-17-2011, 04:38 AM

The problem is no quick test would be thorough enough to give you useful info. Any thorough test (like said above) would be too CPU intensive. It has to be something in-between.

theNbomr · 04-17-2011, 11:55 AM

Since your data is in-memory, I assume you must be talking about an application for which you are a developer. Perhaps whatever code populates the memory could perform some simple evaluation to ascertain compressibility. Or, if the memory space is at the end of the process (a sort of output, in some sense), perhaps the code that populates the space could use some kind of serializing compressor, as the buffer is populated. LZW is a common example of this type of compression.
--- rod.