Split a large file and get the names of output files using Perl
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Split a large file and get the names of output files using Perl
Hi,
I have to take a large file (name from command line) and then check its line count and then make them into small files.
I want to capture the name of all the small files that have been created and then parse through these small files to check for a condition..in the date.
All these has to be accomplished in a Perl script.
Do you want to specify a prefix for the split file names?
When you say "check for a condition in the date", what do you mean? It sounds like these are log files, so maybe each line starts with a date? Can you provide an example line?
I was able to parse through the data file and check the condition but the file name was hardcoded in the perl script.
Now my trouble is with the first part of the problem : regarding splitting the file and taking the names of the files created so that i can parse them.
With
$count=`wc -l Temp.dat`;
I am able to get the count of the lines...now to split the large file ..i want to use split function but how can i capture the names of teh files created???
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Then all you have to do it read the input and extract the file names. Careful to escape meta-characters in the regular expression used to extract the filenames:
Code:
my @files;
while (<SPLIT>) {
if ( /^creating file \`(.*)'$/ ) {
push(@files, $1);
}
else {
warn "Oh dear, a line of input we can't parse: $_;";
}
}
close(SPLIT);
# now you have extracted them, you can do what you like...
foreach my $file (@files) {
print "got a file name: $file\n";
}
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Is there any difference when executing this code and split from command prompt..
The program will be invoked the same, except the output will not go to the terminal - it will be readable from the SPLIT filehandle.
Quote:
and what is this for in the while loop
" if ( /^creating file \`(.*)'$/ ) {"
I donthv much idea abt perl....its an urgent requirement...
The m/PATTERN/ operator (see the perlop manual page for full documentation) tests to see if some value (by default $_ - the current input line) matches a regular expression pattern, PATTERN.
/PATTERN/ is an abbreviation of m/PATTERN/.
Regular expressions are the most amazing things, but I'm not going to describe them fully here. You should read the perlre manual page. Parts of a regular expression in (parenthesis), if found, are assigned to the variables $1, $2, $3 etc. So the whole if block says: "if the current line of input matches this pattern, push the bit between the parenthesis onto the array @files, otherwise print a warning".
It is because when you are initialising the array @file_array with one member - undef, and then in the for loop pushing "a"s onto it. After the for loop, it looks like this: (undef, "a", "a", "a", "a", "a").
There is a difference between doing this:
Code:
@array = undef;
and this:
Code:
@array = ();
The first creates @array with one member, undef. The second creates an array with no members.
By the way, if you started your script like this:
Code:
#!/usr/bin/perl -w
use strict;
...
You would have been warned when you print the first member of the array that you are trying to print an undef. It's good practice to use strict and the -w flag whenever possible. The are a few exceptions when it's not worth it, but they are few and far between.
I executed your code and this is the output i got...
bash-2.03$ perl TestSpli.pl
Oh dear, a line of input we can't parse: split: illegal option -- -
; at TestSpli.pl line 10, <SPLIT> chunk 1.
Oh dear, a line of input we can't parse: Usage: split [-l #] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 2.
Oh dear, a line of input we can't parse: split [-b #[k|m]] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 3.
Oh dear, a line of input we can't parse: split [-#] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 4.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.