LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-13-2011, 03:37 PM   #1
yaplej
Member
 
Registered: Apr 2009
Distribution: CentOS, Ubuntu, openSuSE
Posts: 165
Blog Entries: 1

Rep: Reputation: 22
How can I detect compressible data?


Hello,

I am working on a project that takes multiple blocks of memory that are no more than 1500 bytes in side and tries to compress them. Currently I just compress them and compare the old length to the new length. If the new length is smaller it copies the compressed data into the original datas memory space.

The problem I have is when lots of these blocks of memory do not compress well its huge amounts of wasted CPU.

Is there any trick to detect how compressible some data is before using the CPU to compress it?

Thanks.
 
Old 04-14-2011, 08:07 AM   #2
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292
I would use ent:
http://www.fourmilab.ch/random/

One thing it tells you is 'Optimum compression would reduce the size of this * character file by * percent.'

Of course, it depends on what language you are using, so basically you have to see if the data is random and how random. You may be able to tailor this to a certain compression algorithm.
 
Old 04-14-2011, 03:40 PM   #3
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,398
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Most non-lossy compressors work by finding and removing redundancies; those patterns or sequences that are repeated in some way. It seems likely to me that any thorough test would be justas compute intensive as the actual compression.

--- rod.
 
Old 04-15-2011, 08:30 AM   #4
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292
Quote:
Originally Posted by theNbomr View Post
Most non-lossy compressors work by finding and removing redundancies; those patterns or sequences that are repeated in some way. It seems likely to me that any thorough test would be justas compute intensive as the actual compression.

--- rod.
Yeah, it probably would, so I guess just ent would work better.
 
Old 04-17-2011, 12:52 AM   #5
yaplej
Member
 
Registered: Apr 2009
Distribution: CentOS, Ubuntu, openSuSE
Posts: 165

Original Poster
Blog Entries: 1

Rep: Reputation: 22
I was hoping there would be some way to get an idea of how compressible the data just using a little CPU before the "compress then test" method that I am doing now. It would have to be very quick test though to really make it worth using.
 
Old 04-17-2011, 04:38 AM   #6
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292
The problem is no quick test would be thorough enough to give you useful info. Any thorough test (like said above) would be too CPU intensive. It has to be something in-between.
 
Old 04-17-2011, 11:55 AM   #7
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,398
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Since your data is in-memory, I assume you must be talking about an application for which you are a developer. Perhaps whatever code populates the memory could perform some simple evaluation to ascertain compressibility. Or, if the memory space is at the end of the process (a sort of output, in some sense), perhaps the code that populates the space could use some kind of serializing compressor, as the buffer is populated. LZW is a common example of this type of compression.
--- rod.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to detect a closed tcp client connection when client is only receiving data programlight Programming 9 10-24-2011 09:19 AM
How to detect broken connection before sending data with sockets Fabio Paolini Linux - Networking 3 11-03-2010 04:59 AM
select() call does not detect data in pipe between parent and child program rasbambober Programming 5 09-24-2009 11:32 PM
Does anyone know if there is an IDS that can detect refers from a google hacking data abefroman Linux - Security 5 06-26-2008 08:42 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration