LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-17-2004, 12:52 PM   #1
iago
LQ Newbie
 
Registered: Nov 2004
Location: Canada
Distribution: Slackware
Posts: 26

Rep: Reputation: 15
Unzip + Piping into Perl


I need to process about 10gb of zipped files with a very small Perl script I'm writing. The problem is, I can't seem to get it to pipe properly. The command I'm using is this:

$ unzip -p /mnt/cdrom/MyBigFile.zip | ProxyParser.pl

Right now, it's just printing everything to stdout:
Code:
#!/usr/bin/perl -w

foreach(<STDIN>)
{
  print "$_";
}
The problem is, it seems to be unzipping the entire file in memory and not sending it to my perl script. Because of the size, I'm running out of memory:

Code:
iago@Slayer:~$ dmesg | grep ProxyParser.pl
Out of Memory: Killed process 14423 (ProxyParser.pl).
Out of Memory: Killed process 14432 (ProxyParser.pl).
iago@Slayer:~$
This problem also happens on Windows, where I'm using a similar unzip program. My goal is for this to run on Windows, since that's what our server is running. But getting it to work right on Linux would be a good step forward.

Any help is appreciated!

Thanks,
-iago

<edit> I noticed that I can pipe it into "more" just fine, but not into my perl script. So Apparently, I can't use foreach(<STDIN>) to parse it in Perl. Any idea how I can do it properly?

Last edited by iago; 11-17-2004 at 01:06 PM.
 
Old 11-17-2004, 01:55 PM   #2
Medievalist
Member
 
Registered: Aug 2003
Distribution: Dead Rat
Posts: 191

Rep: Reputation: 56
No joy in mudville

I apologize in advance for posting without a solution... but I can give you a lead or two.

First off, I'm not sure windows implements a true pipe. DOS didn't, it ran the entire first process and stored the output, then ran the second program. This often led to storage exhaustion on DOS when the same program ran fine on linux or VMS (which both implement true pipes). You'd probably be better off developing your code on the target OS because of such non-obvious incompatibilities.

Second, by default perl is reading records separated by newlines. Enormously long lines which don't bother gzip at all will overflow perl's read buffer, generating errors. Perl's read behaviour is somewhat configurable, but it won't parse records on a regex like awk can, or read fixed size data blocks as readily as C, so perl might not be the best language for your task.

Good luck!
--Charlie
 
Old 11-17-2004, 02:11 PM   #3
iago
LQ Newbie
 
Registered: Nov 2004
Location: Canada
Distribution: Slackware
Posts: 26

Original Poster
Rep: Reputation: 15
Right now I have a working version in Java. But here are some timings of our various strategies:

VB: 8 hours/month of logs (unzipped)
Java: 1 hour/month of logs (zipped)
Perl: 5 seconds/day of logs (unzipped), which is about 10 minutes for a month.

The longest line I encountered is about 1200 characters. I'm not sure whether or not Perl can handle that much.

If Windows is actually storing it and running the program with that input, then doing it with Perl is hopeless. After we finish up the first 3 sections, which are urgent, we can look into maybe moving onto a Linux server. The problem is, I'm the only Linux "expert" they have here, and I'm only here on a term position.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with piping mijohnst Linux - General 7 10-21-2005 04:14 PM
piping both ways? quinton Linux - Newbie 3 09-20-2004 07:10 PM
Piping video out to the TV only. dredgemortle Mandriva 3 07-05-2004 12:48 AM
piping data tearinox Linux - Newbie 9 12-09-2003 01:43 PM
piping commands john8675309 Linux - Software 3 12-06-2003 07:42 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration