[SOLVED] How to read content of .gz file without extracting it anywhere, not even STDOUT!
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How to read content of .gz file without extracting it anywhere, not even STDOUT!
Hello there!
I work on plain text log files which are approximately 4-5GB in size and I need to chop few lines (say 30000) among these file, as per my requirement. Now to save space, I compress these files in .gz, which take hardly take 300MB for same 4-5GB file and saves great amount of space.
So, it will be good if I could read such .gz files directly, without extracting it anywhere.
I have tried zcat, zless, zmore: but these shells, uncompresses the file to /dev/null. I don't want even this extraction.
So can anyone tell me, is it possible?
Any help, is highly appreciated.
(..)I need to chop few lines (..) if I could read such .gz files directly, without extracting it anywhere. I have tried zcat, zless, zmore: but these shells, uncompresses the file to /dev/null. I don't want even this extraction.
More items than you think are "compressed" (kernel, music, ebooks, OOo documents, etc) and all of those need to be decompressed to work or to work on the stream and you wouldn't mind using those, right? Decompressing to /dev/null doesn't create temporary files like you would with "in place" editing so I don't understand why you're bothered with it. Do explain.
Well, thanks for quick reply.
What you said is indeed true.
Here, with zless command on some .gz file will follow exec gzip -d -c "$1 2>/dev/null, that means output is written at /dev/null.
What I want is to use the compressed file directly.
say, if I write like head -n 40000 file.gz | tail -n 20000, then it shows some weird output, which one cannot read.
Well, maybe I'm asking for something which is practically not possible.
Thanks.
you could try to split large gz files to volumes lets say 100mb and uncompress only them ... im not sure this will work but if each volume will have own vocabulary you may have readable text.
@sunnydrake
Hi there.
Well, to find the small chunk what I was looking for, first I have to search through entire archive then one can split wherever required.
So, at the end same question.
By any means, I don't want to uncompress my archives.
Thanks.
smaller archives takes less memory and disk space to process. so you will have smaller system load vs large file search.
you search procedure will be like for file in files zcat file exit if found etc.
BASIC archive idea is to minimize file space by finding repeating patterns and reuse them like lets char '7' be '123456' that occurs in file (and '7' not) so without uncompressing data will be crypted.
Last edited by sunnydrake; 09-08-2011 at 07:50 PM.
After reading over what you are really trying to do, the answer is no, you can not do it. The reason why is because you don't have a proper utility to decompress part of an original compressed file(s). The file or files are either compressed, or not. If they are, then you can't work with them directly unless fully compressed first. Hope that sums it up good enough for you man.
According to explanation under "CGrep Library", it says that it can search inside compressed files and no need of decompression.
Now, All I wanted to know is how to get this utility.
I'm using Ubuntu 10.4 and tried to execute this cgrep command. But it says "No command found", that means I have to install some packages first.
Please please tell me how get those packages and how to install them.
I tried apt-get install cgrep, but nothing exists like that.
Desperately waiting for help.
Thanks a ton.
According to explanation under "CGrep Library", it says that it can search inside compressed files and no need of decompression.
Now, All I wanted to know is how to get this utility.
I'm using Ubuntu 10.4 and tried to execute this cgrep command. But it says "No command found", that means I have to install some packages first.
Please please tell me how get those packages and how to install them.
I tried apt-get install cgrep, but nothing exists like that.
Desperately waiting for help.
Thanks a ton.
Yeah, I wrote it in my post itself that zcat works.
But it extracts the file first.
What do you mean by “it extracts the file first” in detail? You need a temporary space in /tmp or so to hold the 4-5GB of data? This is not the way it should be.
The problem with the compressed file is, that it’s not compressing each line as a single record therein. It would be better to have a compression like it’s used for music streams in your case. You can start at any point, and at least at the next frame you can interpret the compressed data.
Thank you very much corp769 and everyone.
@Reuti: It meant that I want to operate directly on compressed files without extracting it.
Anyways, things are working just fine.
This was my first post here on LinuxQuestions.org and got very quick reply.
Thanks again.
What is working fine in detail? The link about searching in compressed files you pointed to is interesting, but it requires the file to be compressed by their huffm to be searched by their cgrep (both available by the links in the page). To me it looks, like the cgrep corp769 pointed to is a different one. But according to the paper, you could investigate zgrep, though it won’t speedup things much if it’s really only a combination of grep and gunzip.
Just thinking, I'm not sure if it would work, but how about a named pipe?
As the decompression operation is feeding the fifo on one end, grep or something could be reading it on the other for the lines you want, and discarding the rest.
Only a small part of the file should then be in memory at any given time.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.