LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-26-2010, 02:51 AM   #1
dimpu
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Rep: Reputation: 0
Exclamation Compression help


Friends,

I was wondering if some one can help me out with my problem.

I have a C program which outputs a text file test.out which contains hexadecimal addresses. The file size is a little more than 100GB. The problem is I don't have enough space to store this file on my hard drive. The space I have on my drive is only 72GB. Can anyone tell me how to compress this file on the fly so that I can store this on my hard disk?

Again, after I get this file test.out in the compressed format like in .gz or .bz2 I want to use this as an input to a shell script..for example ./simulator < test.out.gz. I will be really grateful if someone helps me out with this as well.

Thanks
 
Old 09-26-2010, 03:28 AM   #2
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Jessie (Fluxbox WM)
Posts: 1,387
Blog Entries: 52

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
You could just pipe it through gzip:
Code:
./test | gzip -c >test.out.gz
And then recover using
Code:
gunzip -c <test.out.gz | ./simulator
(the '-c' flag is just so that gzip will use stdin/stdout).
 
Old 09-26-2010, 08:05 PM   #3
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
An alternative cmd to read the file is zcat
 
Old 09-27-2010, 12:47 AM   #4
dimpu
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Lightbulb Compression help

Quote:
Originally Posted by neonsignal View Post
You could just pipe it through gzip:
Code:
./test | gzip -c >test.out.gz
And then recover using
Code:
gunzip -c <test.out.gz | ./simulator
(the '-c' flag is just so that gzip will use stdin/stdout).
Thanks Neo and Chris for your promptness, but the problem is my C program says the output will be test.out (this is stated in the C code) and when I pipe the way you suggested I get both the file, the test.out and the testing.out.gz. The test.out shows all the required stuff but the testing.out.gz just says "done..." and shows no text (the hexadecimal address I am looking for).

May I know why is this happening?

Last edited by dimpu; 09-27-2010 at 12:50 AM.
 
Old 09-27-2010, 01:05 AM   #5
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Jessie (Fluxbox WM)
Posts: 1,387
Blog Entries: 52

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Quote:
Originally Posted by dimpu View Post
The test.out shows all the required stuff but the testing.out.gz just says "done..." and shows no text (the hexadecimal address I am looking for). May I know why is this happening?
Redirection (using the '|') only affects the standard output of the program. Because your C program writes directly to a file, it has no output, so nothing goes into test.out.gz.

Ideally you should change the C program to send its output to stdout. If you cannot do that, you can work around it in the following way:

1. Set up a named pipe to replace the normal output file:
Code:
rm test.out
mkfifo test.out
2. Have the compression program ready at the end of the pipe (the ampersand places it into the background so that you can keep using the shell):
Code:
gzip -c <test.out >test.out.gz &
3. Run the C program into the pipe:
Code:
./test

Last edited by neonsignal; 09-27-2010 at 01:12 AM.
 
Old 09-27-2010, 02:10 AM   #6
dimpu
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Thumbs up Compression Help

Neo,

Thank you very much. You are just great and a wonderful person. I thought I would end-up with no solution for my question but you did it all! Kudos to you and you deserve it.

If you have time please let me know what this mkfifo does. I ran a test program and everything works just fine. I am ready to run few large programs and hopefully they'll give me similar results.

I saw something wierd happening though. When I type the command mkfifo test.out and run I don't see that the memory is consumed but I do see that the output of my C program, if it's test.out is greyed out and then when I use the head command to look at the content of test.out (the grayed) it doesn't show anything but the zipped file does have everything.

I use df -h to know the amount of memory being consumed.

Last edited by dimpu; 09-27-2010 at 02:12 AM.
 
Old 09-27-2010, 02:34 AM   #7
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Jessie (Fluxbox WM)
Posts: 1,387
Blog Entries: 52

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Quote:
Originally Posted by dimpu View Post
let me know what this mkfifo does
The mkfifo creates an object that can be used as a data pipe from a source to a sink. Although it looks like a file, it exists on the file system as a name only, and does not use up any other space. Any data put into it by one process goes into a memory buffer, and is consumed by the process that you set up at the other end.

Quote:
Originally Posted by dimpu View Post
when I use the head command to look at the content of test.out it doesn't show anything but the zipped file does have everything.
In the example, the gzip has already grabbed the data out of the pipe, so head does not see anything inside test.out (remember that test.out is now a pipe object, not a file).
 
Old 09-28-2010, 02:15 AM   #8
dimpu
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Lightbulb Compression Help

Neo, I want to know how many minutes/ hours should a benchmark (you know, they are huge programs)take to run using your technique. I remember it use to take few hours to produce the text file but now it just takes few seconds. Do you think it is working fine? Also I want to know what command should I use to find the size of a single file?

Thanks

Last edited by dimpu; 09-28-2010 at 02:24 AM.
 
Old 09-28-2010, 03:00 AM   #9
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Jessie (Fluxbox WM)
Posts: 1,387
Blog Entries: 52

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Quote:
Originally Posted by dimpu View Post
I want to know how many minutes/ hours should a benchmark (you know, they are huge programs)take to run using your technique.
Depends what is in the output. If it is highly repetitive, it might compress down a lot. What is the benchmark program you are using?

Quote:
Also I want to know what command should I use to find the size of a single file?
I'm not clear what you mean. Can't you just use 'ls -l'? If you mean the uncompressed length, then something like:
Code:
gunzip -c <test.out.gz | wc -c
 
Old 09-28-2010, 08:50 AM   #10
dimpu
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Lightbulb Compression Help

Neo,

I am running several benchmarks but I first started with "Cactus". I've written a program and that took almost 5 hours to run.

The other program which is the original program took just a few seconds (with a zipped output) but if I run the same program without using gzip feature it takes an hour or two or a little more therefore I am a little skeptical. Also this output is an input to my memory simulator.

The memory simulator with a zipped output shows a million instruction was fetched but with the test.out output which is a direct output file it shows an instruction fetch of 100 million in just a few minutes.

It's quite possible that you are right. There is a lot of repetition I believe.

In my second question I wanted to know in terms of size (GB/ MB) of a single file; like, suppose I have several files in my directory and I want to find the size of the output file test.out then what command should I use?

Thanks

Last edited by dimpu; 09-28-2010 at 08:55 AM.
 
Old 09-29-2010, 02:30 AM   #11
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
There's no way to predict the size of the compressed file; it depends on the content and compression algorithm and compression factor. Both gzip and bzip2 have a compression flag with values 1-9. Better compression = slower to create. Start with some small files and see how the trend goes.
see the man pages.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Slax with SquashFS-4 new compression algorithm and layered compression ratios? lincaptainhenryjbrown Linux - Software 2 06-19-2009 06:29 PM
Compression une Linux - General 7 12-27-2007 05:20 PM
About compression on CD s26c.sayan Linux - Newbie 2 04-18-2007 05:27 PM
compression oldstinkyfish Linux - Software 2 12-02-2004 03:53 PM
best compression drigz Linux - Software 2 06-05-2004 08:38 AM


All times are GMT -5. The time now is 08:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration