LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-02-2012, 03:53 PM   #1
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 550

Rep: Reputation: 46
zcat on Windows/DOS


Hi All,


I am working on a small project that requires searching for patterns/keywords in a specified log file. This log file is a simple text file but is very large, say, 5 GB or more. Hence, it is in a compressed format (.zip).

To work with this file, we cannot first unzip it as it is very large and will take considerable time and system resources.

Please Note: Another restriction for not un-compressing the zipped files is the storage capacity. Hence, we can do that only on the fly and not writing to any output files. (Think "zcat" on a Linux/Unix box.)

If the requirement were to deal with this file on a Unix or Linux Box then it would not be a problem at all. But we have a Windows Box whereon the gigantic zipped log files land from the application.

Here are some examples:

Code:
 Directory of C:\

10/24/2011  05:26 PM     1,625,721,289 winLog_0.zip
10/24/2011  04:15 PM       631,934,821 winLog_1.zip
Here is a sample Perl script to open a file and read its contents. However, I already knew this would not work because the file is not a text file - it is a zipped file (or maybe binary as it is zipped and is not in a human readable format):

Code:
open(FH,"C:\\winLog_0.zip") or die("Zip File Opening Error: $!");

while(<FH>){
	print $_ if($_ =~ "sender");
}

Execute the Perl Script:

Code:
C:\>perl zip.pl

C:\>
Well, there is no output even if the original file does contain the word "sender" but the script can't identify it among the input alien characters that can be otherwise printed out by simply saying "print $_;" in the "while" loop above as shown below:

Code:
C:\>perl zip.pl log.zip
PK♥♦  7B@↓Y߼♂  ∞5
...
...
...
It is notable that on a Linux / Unix Box, "zcat" is identical to "gunzip -c". It uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.

Can we perform similar tasks on a Windows/DOS platform too?

I have just devised a workaround:

Code:
@lines=`unzip -c C:\\log.zip`;

foreach $LINE (@lines){
	print $LINE if($LINE =~ "sender");
}
But, it is very resource consuming as the input files would be very large.

Looking for some insight and views from your end.
 
Old 02-02-2012, 03:58 PM   #2
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Remember, there's (normally) a module for that.

See if this one helps: http://perldoc.perl.org/IO/Zlib.html

You can compare performance against the MS-DOS "unzip" approach.

---

Ooh.. I may have sent you to the wrong module. Also see:
http://search.cpan.org/~adamk/Archiv...Archive/Zip.pm

Last edited by anomie; 02-02-2012 at 04:02 PM.
 
Old 02-02-2012, 04:20 PM   #3
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 550

Original Poster
Rep: Reputation: 46
Quote:
Originally Posted by anomie View Post
Remember, there's (normally) a module for that.

See if this one helps: http://perldoc.perl.org/IO/Zlib.html

The example given on that page:

Code:
use IO::Zlib;
    tie *FILE, 'IO::Zlib', "file.gz", "wb";
    print FILE "line 1\nline2\n";
    tie *FILE, 'IO::Zlib', "file.gz", "rb";
    while (<FILE>) { print "LINE: ", $_ };

	print "\nPress any key to exit...";
	$PAUSE=<STDIN>;

does not work. It does create a .gz file and put a text file inside it but does not read the lines back.

I am going to try the other module you have added now.

Thanks!
 
Old 02-02-2012, 04:47 PM   #4
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 550

Original Poster
Rep: Reputation: 46
Quote:
Originally Posted by anomie View Post
Ooh.. I may have sent you to the wrong module. Also see:
http://search.cpan.org/~adamk/Archiv...Archive/Zip.pm

This example is given there:
Code:
# Read a Zip file
   my $somezip = Archive::Zip->new();
   unless ( $somezip->read( 'someZip.zip' ) == AZ_OK ) {
       die 'read error';
   }
and it is not printing out anything. I further read on the "read()" function and found that it only returns a status code:

Code:
$status = $somezip->read( 'someZip.zip' );
and not the contents of the file inside the zipped file.

Well, the example given there does work for creating a valid zip file. I tested it and it worked. But reading back contents from the zipped file is not working.

Maybe, I am not getting it.

Please Note: I would usually have exactly one file inside a zipped file. That is only one single large text file would be zipped.
 
Old 02-02-2012, 04:57 PM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,445

Rep: Reputation: 829Reputation: 829Reputation: 829Reputation: 829Reputation: 829Reputation: 829Reputation: 829
Quote:
It is notable that on a Linux / Unix Box, "zcat" is identical to "gunzip -c". It uncompresses either a list of files on the command line or its standard input and writes the uncompressed data on standard output.

Can we perform similar tasks on a Windows/DOS platform too?
InfoZip's "unzip -p" is equivalent to "gunzip -c".


Quote:
I further read on the "read()" function and found that it only returns a status code:
and not the contents of the file inside the zipped file.
Try the contents method, maybe?
 
1 members found this post helpful.
Old 02-02-2012, 09:54 PM   #6
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Crud. That is some fairly complicated module documentation I pointed you to. (Well, it's a very featured module.)

Here's what I put together:
Code:
#!/usr/bin/perl

use warnings ;
use strict ;
use Archive::Zip ;
use Archive::Zip qw(:ERROR_CODES);

my $zipfile = 'bar.zip' ;
my $member = 'bar.txt' ;
my $zipo = Archive::Zip->new() ;

unless($zipo->read($zipfile) == AZ_OK) {

  die "Unable to read $zipfile" ;

}

unless($zipo->extractMember($member) == AZ_OK) {

  die "Unable to extract $member" ;

}
Note that you can list available members by printing the output of ->memberNames().

Finally, if it's important to not have to extract the file at all (which you mentioned in your first post), better have a look at: http://search.cpan.org/~adamk/Archiv.../MemberRead.pm
 
1 members found this post helpful.
Old 02-02-2012, 10:56 PM   #7
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 550

Original Poster
Rep: Reputation: 46
Thanks to both of you! I will check it further and will let you know my findings.
 
Old 02-03-2012, 12:53 AM   #8
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 550

Original Poster
Rep: Reputation: 46
To ntubski ->

Thanks for the pointer! It works:

Code:
print "\nxyz.txt contains: " . $zipo->contents( 'xyz.txt' );
Output:

Code:
C:\>perl try.pl

xyz.txt contains: Hello, world!

Last edited by devUnix; 02-03-2012 at 01:04 AM.
 
Old 02-03-2012, 01:01 AM   #9
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 550

Original Poster
Rep: Reputation: 46
Quote:
Originally Posted by anomie View Post
Finally, if it's important to not have to extract the file at all (which you mentioned in your first post), better have a look at: http://search.cpan.org/~adamk/Archiv.../MemberRead.pm
Thanks again! I can't give you more credit because of the limit set by the website.

It is helpful and is really more efficient because if a pattern is found then we can decide to close the file instead of first reading/loading all the contents from the zipped file.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] zcat command created link in home directory EDDY1 Linux - Newbie 3 04-24-2011 05:37 AM
du and rm for DOS/Windows, stf92 General 8 04-04-2011 10:49 PM
Compression with zcat and cpio hal8000b Linux - General 1 01-01-2009 05:00 AM
Diffrernce between cat and zcat ZAMO Linux - General 5 06-27-2007 09:31 PM
Zcat broken? keysorsoze Solaris / OpenSolaris 3 03-28-2007 01:01 AM


All times are GMT -5. The time now is 09:24 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration