LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 09-08-2011, 01:54 AM   #1
pawan613
LQ Newbie
 
Registered: Sep 2011
Posts: 23

Rep: Reputation: Disabled
Exclamation How to read content of .gz file without extracting it anywhere, not even STDOUT!


Hello there!
I work on plain text log files which are approximately 4-5GB in size and I need to chop few lines (say 30000) among these file, as per my requirement. Now to save space, I compress these files in .gz, which take hardly take 300MB for same 4-5GB file and saves great amount of space.
So, it will be good if I could read such .gz files directly, without extracting it anywhere.
I have tried zcat, zless, zmore: but these shells, uncompresses the file to /dev/null. I don't want even this extraction.
So can anyone tell me, is it possible?
Any help, is highly appreciated.
 
Old 09-08-2011, 02:39 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by pawan613 View Post
(..)I need to chop few lines (..) if I could read such .gz files directly, without extracting it anywhere. I have tried zcat, zless, zmore: but these shells, uncompresses the file to /dev/null. I don't want even this extraction.
More items than you think are "compressed" (kernel, music, ebooks, OOo documents, etc) and all of those need to be decompressed to work or to work on the stream and you wouldn't mind using those, right? Decompressing to /dev/null doesn't create temporary files like you would with "in place" editing so I don't understand why you're bothered with it. Do explain.
 
Old 09-08-2011, 03:12 AM   #3
pawan613
LQ Newbie
 
Registered: Sep 2011
Posts: 23

Original Poster
Rep: Reputation: Disabled
Well, thanks for quick reply.
What you said is indeed true.
Here, with zless command on some .gz file will follow exec gzip -d -c "$1 2>/dev/null, that means output is written at /dev/null.
What I want is to use the compressed file directly.
say, if I write like head -n 40000 file.gz | tail -n 20000, then it shows some weird output, which one cannot read.
Well, maybe I'm asking for something which is practically not possible.
Thanks.
 
Old 09-08-2011, 04:11 AM   #4
sunnydrake
Member
 
Registered: Jul 2009
Location: Kiev,Ukraine
Distribution: Ubuntu,Slax,RedHat
Posts: 289
Blog Entries: 1

Rep: Reputation: 61
you could try to split large gz files to volumes lets say 100mb and uncompress only them ... im not sure this will work but if each volume will have own vocabulary you may have readable text.
 
Old 09-08-2011, 05:08 AM   #5
pawan613
LQ Newbie
 
Registered: Sep 2011
Posts: 23

Original Poster
Rep: Reputation: Disabled
@sunnydrake
Hi there.
Well, to find the small chunk what I was looking for, first I have to search through entire archive then one can split wherever required.
So, at the end same question.
By any means, I don't want to uncompress my archives.
Thanks.
 
Old 09-08-2011, 06:48 AM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by pawan613 View Post
say, if I write like head -n 40000 file.gz | tail -n 20000, then it shows some weird output, which one cannot read.
Something like 'zcat file.gz|sed -n '40000,20000p' should print starting at line 40000 and then the next 20000 lines.
 
Old 09-08-2011, 07:14 AM   #7
pawan613
LQ Newbie
 
Registered: Sep 2011
Posts: 23

Original Poster
Rep: Reputation: Disabled
Yeah, I wrote it in my post itself that zcat works.
But it extracts the file first.
Thanks!
 
Old 09-08-2011, 07:43 PM   #8
sunnydrake
Member
 
Registered: Jul 2009
Location: Kiev,Ukraine
Distribution: Ubuntu,Slax,RedHat
Posts: 289
Blog Entries: 1

Rep: Reputation: 61
smaller archives takes less memory and disk space to process. so you will have smaller system load vs large file search.
you search procedure will be like for file in files zcat file exit if found etc.
BASIC archive idea is to minimize file space by finding repeating patterns and reuse them like lets char '7' be '123456' that occurs in file (and '7' not) so without uncompressing data will be crypted.

Last edited by sunnydrake; 09-08-2011 at 07:50 PM.
 
Old 09-08-2011, 07:54 PM   #9
corp769
LQ Guru
 
Registered: Apr 2005
Location: /dev/null
Posts: 5,818

Rep: Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007
After reading over what you are really trying to do, the answer is no, you can not do it. The reason why is because you don't have a proper utility to decompress part of an original compressed file(s). The file or files are either compressed, or not. If they are, then you can't work with them directly unless fully compressed first. Hope that sums it up good enough for you man.
 
Old 09-09-2011, 05:11 AM   #10
pawan613
LQ Newbie
 
Registered: Sep 2011
Posts: 23

Original Poster
Rep: Reputation: Disabled
Hello there.
Just got to know about a utility cgrep.
You must be aware of it, but in case not then check:
http://www.di.unipi.it/~ferragin/Lib...pressedSearch/

According to explanation under "CGrep Library", it says that it can search inside compressed files and no need of decompression.
Now, All I wanted to know is how to get this utility.
I'm using Ubuntu 10.4 and tried to execute this cgrep command. But it says "No command found", that means I have to install some packages first.
Please please tell me how get those packages and how to install them.
I tried apt-get install cgrep, but nothing exists like that.
Desperately waiting for help.
Thanks a ton.
 
Old 09-09-2011, 07:59 AM   #11
corp769
LQ Guru
 
Registered: Apr 2005
Location: /dev/null
Posts: 5,818

Rep: Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007
Quote:
Originally Posted by pawan613 View Post
Hello there.
Just got to know about a utility cgrep.
You must be aware of it, but in case not then check:
http://www.di.unipi.it/~ferragin/Lib...pressedSearch/

According to explanation under "CGrep Library", it says that it can search inside compressed files and no need of decompression.
Now, All I wanted to know is how to get this utility.
I'm using Ubuntu 10.4 and tried to execute this cgrep command. But it says "No command found", that means I have to install some packages first.
Please please tell me how get those packages and how to install them.
I tried apt-get install cgrep, but nothing exists like that.
Desperately waiting for help.
Thanks a ton.
http://sourceforge.net/projects/cgrep/

There's the source man, enjoy
 
1 members found this post helpful.
Old 09-09-2011, 09:00 AM   #12
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
Quote:
Originally Posted by pawan613 View Post
Yeah, I wrote it in my post itself that zcat works.
But it extracts the file first.
What do you mean by “it extracts the file first” in detail? You need a temporary space in /tmp or so to hold the 4-5GB of data? This is not the way it should be.

The problem with the compressed file is, that it’s not compressing each line as a single record therein. It would be better to have a compression like it’s used for music streams in your case. You can start at any point, and at least at the next frame you can interpret the compressed data.
 
Old 09-11-2011, 05:03 AM   #13
pawan613
LQ Newbie
 
Registered: Sep 2011
Posts: 23

Original Poster
Rep: Reputation: Disabled
Thank you very much corp769 and everyone.
@Reuti: It meant that I want to operate directly on compressed files without extracting it.
Anyways, things are working just fine.
This was my first post here on LinuxQuestions.org and got very quick reply.
Thanks again.
 
Old 09-11-2011, 06:26 AM   #14
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
Quote:
Originally Posted by pawan613 View Post
Anyways, things are working just fine.
What is working fine in detail? The link about searching in compressed files you pointed to is interesting, but it requires the file to be compressed by their huffm to be searched by their cgrep (both available by the links in the page). To me it looks, like the cgrep corp769 pointed to is a different one. But according to the paper, you could investigate zgrep, though it won’t speedup things much if it’s really only a combination of grep and gunzip.
 
Old 09-11-2011, 06:40 AM   #15
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Just thinking, I'm not sure if it would work, but how about a named pipe?

As the decompression operation is feeding the fifo on one end, grep or something could be reading it on the other for the lines you want, and discarding the rest.

Only a small part of the file should then be in memory at any given time.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to read the content of a file with conky Adso Linux - Newbie 5 05-17-2009 09:33 AM
read a line from text file and extracting the details needed pdklinux79 Linux - Newbie 6 06-06-2008 10:41 PM
How to read content environment file? ArthurHuang Programming 8 06-22-2006 09:28 PM
C++ file I/O, read content as int value Artanicus Programming 7 12-18-2005 01:31 PM
Extracting tarball to stdout syberdave Linux - Software 2 01-30-2004 07:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration