LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 05-18-2004, 12:13 PM   #1
chipix
LQ Newbie
 
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22

Rep: Reputation: 15
Question Split large file in several files using scripting (awk etc.)


Hi,

i have a large text file that contains several portions of text. Here is an example:
"Log 1:

mpla
mpla ok ok ,pla mpla mpla
mpla
mpla

Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2

Log3:
mpla? mpla?

Log4:
&&& mpla
"

I want to split the above file into 4 different files, which will contain the appropriate log entries. That is:
file1 => Log1:
mpla
mpla ok ok ,pla mpla mpla
mpla
mpla

file2 => Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2

etc.

Any template available using awk, sed or something alike?

Thanks
 
Old 05-18-2004, 04:47 PM   #2
nukkel
Member
 
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323

Rep: Reputation: 30
With perl it's quite easy...
Look and learn
Code:
#!/usr/bin/perl
#
use strict;

my ($line, $nr);

my $thebigfile = "/path/to/input/file.log";    # input file location
my $log = "path/to/output/log";                # output files basename

# open input file
open(INFILE, "<$thebigfile");

foreach $line (<INFILE>) { # for each line
  if( $line =~ /^Log[0-9]+\:$/ ) {
    $nr = $line;
    $nr =~ s/^Log//;
    $nr =~ s/\://;
    close(OUTFILE);
    open(OUTFILE, ">$log$nr");
  }
  print OUTFILE $line;
}

close OUTFILE;
close INFILE;
put this in a file, make it executable and run.

Enjoy
nukkel
 
Old 05-19-2004, 04:53 AM   #3
chipix
LQ Newbie
 
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22

Original Poster
Rep: Reputation: 15
Very good script.
Thanks

How can I check if a file with the same name (e.g. log1) exists in the output directory in order not to erase it?
 
Old 05-19-2004, 05:14 AM   #4
nukkel
Member
 
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323

Rep: Reputation: 30
Something along the lines of
Code:
use File::stat;

if( stat($outfile) ) {
    print "$outfile already exists.\n";
    $outfile = "new_$outfile";
    print "Saving to $outfile instead.\n";
}
 
Old 05-21-2004, 04:05 AM   #5
chipix
LQ Newbie
 
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22

Original Poster
Rep: Reputation: 15
Thanks. I needed the stat function anyway to test the modification time of the file.

Another thing is that I openned a file for logging and I write something to it but nothing is written until I close the file handle. Is there a flush equivalent in Perl?

I have also problem displaying the current time as string using localtime function, let's say "Fri 21 May 2004 11:08:45". Any suggestion?

Thanks again
 
Old 05-21-2004, 04:15 AM   #6
chipix
LQ Newbie
 
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22

Original Poster
Rep: Reputation: 15
Do you know how can I check if a file is modified without polling? I mean now i check in endless loop the file and check every 30 secs the modification of the file.

Thanks
 
Old 05-21-2004, 04:59 AM   #7
nukkel
Member
 
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323

Rep: Reputation: 30
Try if this works for auto-flushing the output file:

use IO::Handle;
open(FILE, ">$file");
FILE->autoflush(1);

For the current localtime try
print scalar localtime();

for the polling... you can use select() at the beginning of the main loop so the loop only gets executed when the file changes. But the downside is your program can't do anything else in the meantime. I forgot the exact syntax though... Think it's a bit like the posix select() used in C (man 2 select)

Have fun programming
 
Old 05-21-2004, 05:54 AM   #8
chipix
LQ Newbie
 
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22

Original Poster
Rep: Reputation: 15
Ok. here is my code after your suggestions:

Code:
my $thebigfile = "bigFile.dat";    # input file location
my $logFile = "logFile.dat";
my ($previousFileTimeSize, $currentFileTimeSize);
$previousFileTimeSize = 1;

print "START";
open(LOGFILE, ">$logFile");
LOGFILE->autoflush(1);
while (1) {
	$currentFileTimeSize = (stat($thebigfile))[7]; # size
	print $currentFileTimeSize;
	if ($currentFileTimeSize != $previousFileTimeSize) {
		print LOGFILE scalar localtime;
		print LOGFILE ": sent-mail MODIFIED\n";
        	$previousFileTimeSize = $currentFileTimeSize;
	} else {
		print LOGFILE scalar localtime;
		print LOGFILE ": sent-mail no modification\n";
	}
	sleep 30;
}
close LOGFILE;
The strange things are the following:
1. START is never printed
2. $currentFileTimeSize never printed
3. "if block" executed only once, even though the file is modified

Any suggestions?
 
Old 05-21-2004, 06:28 AM   #9
nukkel
Member
 
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323

Rep: Reputation: 30
1 & 2: It's the more or less same problem you had on the output file: the buffer for stdout only gets flushed when a newline is written... So print "START\n", with a \n, is better.

3: when you use (stat(...))[7] you should delete the 'use File::stat' line, because the File::stat package overrides the stat() function with a little more user-friendly one where you can say '$filesize = (stat($file))->size'
So you don't have to look up which array member is which number

So like you did it you should remove the 'use File::stat' and it should work
 
Old 05-21-2004, 07:20 AM   #10
chipix
LQ Newbie
 
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22

Original Poster
Rep: Reputation: 15
I left the use File::stat and used your aproach (more readble).

Thanks a lot
 
Old 05-21-2004, 07:48 AM   #11
nukkel
Member
 
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323

Rep: Reputation: 30
No problem... See you around
 
Old 04-14-2006, 10:24 AM   #12
michaelyu33
LQ Newbie
 
Registered: Mar 2005
Posts: 8

Rep: Reputation: 0
Unhappy Split Large file in several small files using scripting

Quote:
Originally Posted by chipix
I left the use File::stat and used your aproach (more readble).

Thanks a lot
Hello,

I am reading your previous post on the linuxquestions.org regarding split large file into severel small files using perl. I got the exact same problem as you had in the post. I tried your perl script (I just modified the input and output filename). I got the perl compile error
Can't locate object method "autoflush" via package "IO::Handle" (perhaps you forgot to load "IO::Handle"?) at wbsplit.pl line 15.

Here is my script

#!/usr/bin/perl
#
use strict;

my ($line, $nr);

my $thebigfile = "/home/oracle/projects/test/wbreports.txt"; # input file location
my $logfile = "/home/oracle/projects/test/newwb"; # output files basename

my ($previousFileTimeSize, $currentFileTimeSize);
$previousFileTimeSize = 1;

print "START\n";
open(LOGFILE, ">$logfile");
LOGFILE->autoflush(1);
while (1) {
$currentFileTimeSize = (stat($thebigfile))[7]; # size
print $currentFileTimeSize;
if ($currentFileTimeSize != $previousFileTimeSize) {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail MODIFIED\n";
$previousFileTimeSize = $currentFileTimeSize;
} else {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail no modification\n";
}
sleep 30;
}
close LOGFILE;

Do you have any idea what went wrong in the code? I would appreciate your time and help.

Thanks
Michael
 
Old 04-14-2006, 01:40 PM   #13
nukkel
Member
 
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323

Rep: Reputation: 30
I think you'll need to put "use IO::Handle" at the beginning of the script, before the "autoflush" function can be used. Let me know if that works out for you.

Best regards,
nukkel
 
Old 04-14-2006, 03:58 PM   #14
michaelyu33
LQ Newbie
 
Registered: Mar 2005
Posts: 8

Rep: Reputation: 0
Question

Hi Nukkel,

Thank you so much for your reply. I have put the "use IO::Handle;" in the top of the program. After I executed the program, it did print out the START. Then it just hangs over there for ever. I guess we just open the IO and didn't really write the lines to the file.
Here is the resluts of executing the program
oracle@dbsdata.nrtc.org:twsb$erl wbsplit.pl
START

Any ideas?

Thanks
Michael
 
Old 10-29-2007, 12:16 PM   #15
vikas027
Senior Member
 
Registered: May 2007
Location: Sydney
Distribution: RHEL, CentOS, Debian, OS X
Posts: 1,275

Rep: Reputation: 99
Quote:
Originally Posted by chipix View Post
Hi,

i have a large text file that contains several portions of text. Here is an example:
"Log 1:

mpla
mpla ok ok ,pla mpla mpla
mpla
mpla

Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2

Log3:
mpla? mpla?

Log4:
&&& mpla
"

I want to split the above file into 4 different files, which will contain the appropriate log entries. That is:
file1 => Log1:
mpla
mpla ok ok ,pla mpla mpla
mpla
mpla

file2 => Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2

etc.

Any template available using awk, sed or something alike?

Thanks
just replace log text with some string, just log

u can simply use

awk '/log/{n++}{print > f n}' f=destination file source file

But the problem is that u can only have atmost 10 files, even i am looking for some other one line command which can make more than 10 files.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Split Large Very Files (Software) kolmogorov Solaris / OpenSolaris 5 11-18-2005 12:46 PM
Compress and split a big sized file into smaller files hicham007 Programming 3 07-28-2005 09:56 PM
split files using awk (or similar) lgualteri Programming 1 06-13-2005 10:17 AM
Split large file into multiples jdozarchuk Linux - Newbie 1 11-04-2004 10:42 AM
split a large mpeg file into two zstingx Linux - General 3 11-06-2003 07:26 PM


All times are GMT -5. The time now is 12:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration