Compression help
Friends,
I was wondering if some one can help me out with my problem. I have a C program which outputs a text file test.out which contains hexadecimal addresses. The file size is a little more than 100GB. The problem is I don't have enough space to store this file on my hard drive. The space I have on my drive is only 72GB. Can anyone tell me how to compress this file on the fly so that I can store this on my hard disk? Again, after I get this file test.out in the compressed format like in .gz or .bz2 I want to use this as an input to a shell script..for example ./simulator < test.out.gz. I will be really grateful if someone helps me out with this as well. Thanks |
You could just pipe it through gzip:
Code:
./test | gzip -c >test.out.gz Code:
gunzip -c <test.out.gz | ./simulator |
An alternative cmd to read the file is zcat
|
Compression help
Quote:
May I know why is this happening? |
Quote:
Ideally you should change the C program to send its output to stdout. If you cannot do that, you can work around it in the following way: 1. Set up a named pipe to replace the normal output file: Code:
rm test.out Code:
gzip -c <test.out >test.out.gz & Code:
./test |
Compression Help
Neo,
Thank you very much. You are just great and a wonderful person. I thought I would end-up with no solution for my question but you did it all! Kudos to you and you deserve it. If you have time please let me know what this mkfifo does. I ran a test program and everything works just fine. I am ready to run few large programs and hopefully they'll give me similar results. I saw something wierd happening though. When I type the command mkfifo test.out and run I don't see that the memory is consumed but I do see that the output of my C program, if it's test.out is greyed out and then when I use the head command to look at the content of test.out (the grayed) it doesn't show anything but the zipped file does have everything. I use df -h to know the amount of memory being consumed. |
Quote:
Quote:
|
Compression Help
Neo, I want to know how many minutes/ hours should a benchmark (you know, they are huge programs)take to run using your technique. I remember it use to take few hours to produce the text file but now it just takes few seconds. Do you think it is working fine? Also I want to know what command should I use to find the size of a single file?
Thanks |
Quote:
Quote:
Code:
gunzip -c <test.out.gz | wc -c |
Compression Help
Neo,
I am running several benchmarks but I first started with "Cactus". I've written a program and that took almost 5 hours to run. The other program which is the original program took just a few seconds (with a zipped output) but if I run the same program without using gzip feature it takes an hour or two or a little more therefore I am a little skeptical. Also this output is an input to my memory simulator. The memory simulator with a zipped output shows a million instruction was fetched but with the test.out output which is a direct output file it shows an instruction fetch of 100 million in just a few minutes. It's quite possible that you are right. There is a lot of repetition I believe. In my second question I wanted to know in terms of size (GB/ MB) of a single file; like, suppose I have several files in my directory and I want to find the size of the output file test.out then what command should I use? Thanks |
There's no way to predict the size of the compressed file; it depends on the content and compression algorithm and compression factor. Both gzip and bzip2 have a compression flag with values 1-9. Better compression = slower to create. Start with some small files and see how the trend goes.
see the man pages. |
All times are GMT -5. The time now is 11:48 PM. |