Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
05-18-2004, 11:13 AM
|
#1
|
LQ Newbie
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22
Rep:
|
Split large file in several files using scripting (awk etc.)
Hi,
i have a large text file that contains several portions of text. Here is an example:
"Log 1:
mpla
mpla ok ok ,pla mpla mpla
mpla
mpla
Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2
Log3:
mpla? mpla?
Log4:
&&& mpla
"
I want to split the above file into 4 different files, which will contain the appropriate log entries. That is:
file1 => Log1:
mpla
mpla ok ok ,pla mpla mpla
mpla
mpla
file2 => Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2
etc.
Any template available using awk, sed or something alike?
Thanks
|
|
|
05-18-2004, 03:47 PM
|
#2
|
Member
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323
Rep:
|
With perl it's quite easy...
Look and learn
Code:
#!/usr/bin/perl
#
use strict;
my ($line, $nr);
my $thebigfile = "/path/to/input/file.log"; # input file location
my $log = "path/to/output/log"; # output files basename
# open input file
open(INFILE, "<$thebigfile");
foreach $line (<INFILE>) { # for each line
if( $line =~ /^Log[0-9]+\:$/ ) {
$nr = $line;
$nr =~ s/^Log//;
$nr =~ s/\://;
close(OUTFILE);
open(OUTFILE, ">$log$nr");
}
print OUTFILE $line;
}
close OUTFILE;
close INFILE;
put this in a file, make it executable and run.
Enjoy
nukkel
|
|
|
05-19-2004, 03:53 AM
|
#3
|
LQ Newbie
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22
Original Poster
Rep:
|
Very good script.
Thanks
How can I check if a file with the same name (e.g. log1) exists in the output directory in order not to erase it?
|
|
|
05-19-2004, 04:14 AM
|
#4
|
Member
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323
Rep:
|
Something along the lines of
Code:
use File::stat;
if( stat($outfile) ) {
print "$outfile already exists.\n";
$outfile = "new_$outfile";
print "Saving to $outfile instead.\n";
}
|
|
|
05-21-2004, 03:05 AM
|
#5
|
LQ Newbie
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22
Original Poster
Rep:
|
Thanks. I needed the stat function anyway to test the modification time of the file.
Another thing is that I openned a file for logging and I write something to it but nothing is written until I close the file handle. Is there a flush equivalent in Perl?
I have also problem displaying the current time as string using localtime function, let's say "Fri 21 May 2004 11:08:45". Any suggestion?
Thanks again
|
|
|
05-21-2004, 03:15 AM
|
#6
|
LQ Newbie
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22
Original Poster
Rep:
|
Do you know how can I check if a file is modified without polling? I mean now i check in endless loop the file and check every 30 secs the modification of the file.
Thanks
|
|
|
05-21-2004, 03:59 AM
|
#7
|
Member
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323
Rep:
|
Try if this works for auto-flushing the output file:
use IO::Handle;
open(FILE, ">$file");
FILE->autoflush(1);
For the current localtime try
print scalar localtime();
for the polling... you can use select() at the beginning of the main loop so the loop only gets executed when the file changes. But the downside is your program can't do anything else in the meantime. I forgot the exact syntax though... Think it's a bit like the posix select() used in C (man 2 select)
Have fun programming
|
|
|
05-21-2004, 04:54 AM
|
#8
|
LQ Newbie
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22
Original Poster
Rep:
|
Ok. here is my code after your suggestions:
Code:
my $thebigfile = "bigFile.dat"; # input file location
my $logFile = "logFile.dat";
my ($previousFileTimeSize, $currentFileTimeSize);
$previousFileTimeSize = 1;
print "START";
open(LOGFILE, ">$logFile");
LOGFILE->autoflush(1);
while (1) {
$currentFileTimeSize = (stat($thebigfile))[7]; # size
print $currentFileTimeSize;
if ($currentFileTimeSize != $previousFileTimeSize) {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail MODIFIED\n";
$previousFileTimeSize = $currentFileTimeSize;
} else {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail no modification\n";
}
sleep 30;
}
close LOGFILE;
The strange things are the following:
1. START is never printed
2. $currentFileTimeSize never printed
3. "if block" executed only once, even though the file is modified
Any suggestions?
|
|
|
05-21-2004, 05:28 AM
|
#9
|
Member
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323
Rep:
|
1 & 2: It's the more or less same problem you had on the output file: the buffer for stdout only gets flushed when a newline is written... So print "START\n", with a \n, is better.
3: when you use (stat(...))[7] you should delete the 'use File::stat' line, because the File::stat package overrides the stat() function with a little more user-friendly one where you can say '$filesize = (stat($file))->size'
So you don't have to look up which array member is which number
So like you did it you should remove the 'use File::stat' and it should work 
|
|
|
05-21-2004, 06:20 AM
|
#10
|
LQ Newbie
Registered: Dec 2002
Location: Athens, HELLAS
Distribution: Fedora Core 4
Posts: 22
Original Poster
Rep:
|
I left the use File::stat and used your aproach (more readble).
Thanks a lot
|
|
|
05-21-2004, 06:48 AM
|
#11
|
Member
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323
Rep:
|
No problem... See you around
|
|
|
04-14-2006, 09:24 AM
|
#12
|
LQ Newbie
Registered: Mar 2005
Posts: 8
Rep:
|
Split Large file in several small files using scripting
Quote:
Originally Posted by chipix
I left the use File::stat and used your aproach (more readble).
Thanks a lot
|
Hello,
I am reading your previous post on the linuxquestions.org regarding split large file into severel small files using perl. I got the exact same problem as you had in the post. I tried your perl script (I just modified the input and output filename). I got the perl compile error
Can't locate object method "autoflush" via package "IO::Handle" (perhaps you forgot to load "IO::Handle"?) at wbsplit.pl line 15.
Here is my script
#!/usr/bin/perl
#
use strict;
my ($line, $nr);
my $thebigfile = "/home/oracle/projects/test/wbreports.txt"; # input file location
my $logfile = "/home/oracle/projects/test/newwb"; # output files basename
my ($previousFileTimeSize, $currentFileTimeSize);
$previousFileTimeSize = 1;
print "START\n";
open(LOGFILE, ">$logfile");
LOGFILE->autoflush(1);
while (1) {
$currentFileTimeSize = (stat($thebigfile))[7]; # size
print $currentFileTimeSize;
if ($currentFileTimeSize != $previousFileTimeSize) {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail MODIFIED\n";
$previousFileTimeSize = $currentFileTimeSize;
} else {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail no modification\n";
}
sleep 30;
}
close LOGFILE;
Do you have any idea what went wrong in the code? I would appreciate your time and help.
Thanks
Michael
|
|
|
04-14-2006, 12:40 PM
|
#13
|
Member
Registered: Mar 2003
Location: Belgium
Distribution: Hardened gentoo
Posts: 323
Rep:
|
I think you'll need to put "use IO::Handle" at the beginning of the script, before the "autoflush" function can be used. Let me know if that works out for you.
Best regards,
nukkel
|
|
|
04-14-2006, 02:58 PM
|
#14
|
LQ Newbie
Registered: Mar 2005
Posts: 8
Rep:
|
Hi Nukkel,
Thank you so much for your reply. I have put the "use IO::Handle;" in the top of the program. After I executed the program, it did print out the START. Then it just hangs over there for ever. I guess we just open the IO and didn't really write the lines to the file.
Here is the resluts of executing the program
oracle@dbsdata.nrtc.org:twsb$  erl wbsplit.pl
START
Any ideas?
Thanks
Michael
|
|
|
10-29-2007, 11:16 AM
|
#15
|
Senior Member
Registered: May 2007
Location: Sydney
Distribution: RHEL, CentOS, Ubuntu, Debian, OS X
Posts: 1,305
Rep: 
|
Quote:
Originally Posted by chipix
Hi,
i have a large text file that contains several portions of text. Here is an example:
"Log 1:
mpla
mpla ok ok ,pla mpla mpla
mpla
mpla
Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2
Log3:
mpla? mpla?
Log4:
&&& mpla
"
I want to split the above file into 4 different files, which will contain the appropriate log entries. That is:
file1 => Log1:
mpla
mpla ok ok ,pla mpla mpla
mpla
mpla
file2 => Log2:
mpla2 mpla2 mpla2 mpla2
mpla2 mpla2
mpla2 mpla2
etc.
Any template available using awk, sed or something alike?
Thanks
|
just replace log text with some string, just log
u can simply use
awk '/log/{n++}{print > f n}' f=destination file source file
But the problem is that u can only have atmost 10 files, even i am looking for some other one line command which can make more than 10 files.
|
|
|
All times are GMT -5. The time now is 11:32 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|