Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
|
01-30-2007, 02:08 AM
|
#1
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Rep:
|
Split a large file and get the names of output files using Perl
Hi,
I have to take a large file (name from command line) and then check its line count and then make them into small files.
I want to capture the name of all the small files that have been created and then parse through these small files to check for a condition..in the date.
All these has to be accomplished in a Perl script.
Thanks in advance
|
|
|
|
01-30-2007, 02:40 AM
|
#2
|
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
Does it have to split the files at a line break?
Do you want to specify a prefix for the split file names?
When you say "check for a condition in the date", what do you mean? It sounds like these are log files, so maybe each line starts with a date? Can you provide an example line?
|
|
|
|
01-30-2007, 02:55 AM
|
#3
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Original Poster
Rep:
|
Hi,
I was able to parse through the data file and check the condition but the file name was hardcoded in the perl script.
Now my trouble is with the first part of the problem : regarding splitting the file and taking the names of the files created so that i can parse them.
With
$count=`wc -l Temp.dat`;
I am able to get the count of the lines...now to split the large file ..i want to use split function but how can i capture the names of teh files created???
Thanks
|
|
|
|
01-30-2007, 09:28 AM
|
#4
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Original Poster
Rep:
|
Any one knows!!!
|
|
|
|
01-30-2007, 09:55 AM
|
#5
|
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
If you answer my questions, I can help.
|
|
|
|
01-30-2007, 11:42 PM
|
#6
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Original Poster
Rep:
|
I am splitting the file using split -l option
It is working fine....but i want to capture the names of the files generated programmatically
For the present,I am hardcoding the names.
I parse the generated file and check whether a particular column length is beyond a value....
Regards
|
|
|
|
01-31-2007, 04:45 AM
|
#7
|
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
split will print the names of the files it creates to standard output,
Code:
$ split --verbose -l 10 input_file
creating file `xaa'
creating file `xab'
creating file `xac'
creating file `xad'
You can capture it like this:
Code:
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Then all you have to do it read the input and extract the file names. Careful to escape meta-characters in the regular expression used to extract the filenames:
Code:
my @files;
while (<SPLIT>) {
if ( /^creating file \`(.*)'$/ ) {
push(@files, $1);
}
else {
warn "Oh dear, a line of input we can't parse: $_;";
}
}
close(SPLIT);
# now you have extracted them, you can do what you like...
foreach my $file (@files) {
print "got a file name: $file\n";
}
}[/CODE]
|
|
|
|
01-31-2007, 08:23 AM
|
#8
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Original Poster
Rep:
|
Code:
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Is there any difference when executing this code and split from command prompt..
and what is this for in the while loop
" if ( /^creating file \`(.*)'$/ ) {"
I donthv much idea abt perl....its an urgent requirement...
|
|
|
|
01-31-2007, 09:41 AM
|
#9
|
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
Quote:
|
Originally Posted by Sherlock
Code:
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Is there any difference when executing this code and split from command prompt..
|
The program will be invoked the same, except the output will not go to the terminal - it will be readable from the SPLIT filehandle.
Quote:
and what is this for in the while loop
" if ( /^creating file \`(.*)'$/ ) {"
I donthv much idea abt perl....its an urgent requirement...
|
The m/PATTERN/ operator (see the perlop manual page for full documentation) tests to see if some value (by default $_ - the current input line) matches a regular expression pattern, PATTERN.
/PATTERN/ is an abbreviation of m/PATTERN/.
Regular expressions are the most amazing things, but I'm not going to describe them fully here. You should read the perlre manual page. Parts of a regular expression in (parenthesis), if found, are assigned to the variables $1, $2, $3 etc. So the whole if block says: "if the current line of input matches this pattern, push the bit between the parenthesis onto the array @files, otherwise print a warning".
|
|
|
|
02-01-2007, 12:40 AM
|
#10
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Original Poster
Rep:
|
Thanks matthew!!!!!
I hv a doubt abt push
for ($i=0;$i < 5;$i++) {
push(@file_array,"a");
}
# shift(@file_array);
when i print $file_array[0]
ouptut is nothing
but for index [1] it is a
If i use shift it is fine..i am able to remove the first non existent value...
Why does push behave like this..???
|
|
|
|
02-01-2007, 03:58 AM
|
#11
|
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
Works for me.
Code:
#!/usr/bin/perl -w
use strict;
my @a;
for(my $i=0; $i<3; $i++) {
push(@a, "a");
}
foreach my $val (@a) {
print "got: $val\n";
}
Output is:
Code:
got: a
got: a
got: a
Post your full code and output. Use [code] tags to make it more readable.
|
|
|
|
02-01-2007, 07:28 AM
|
#12
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Original Poster
Rep:
|
Code:
#!/usr/bin/perl
@file_array=undef;
for ($i=0;$i < 5;$i++) {
push(@file_array,"a");
}
# shift(@file_array);
print $file_array[0] #nothing prints
print $file_array[1] #a
print $file_array[2] #a
|
|
|
|
02-01-2007, 08:03 AM
|
#13
|
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
It is because when you are initialising the array @file_array with one member - undef, and then in the for loop pushing "a"s onto it. After the for loop, it looks like this: (undef, "a", "a", "a", "a", "a").
There is a difference between doing this:
and this:
The first creates @array with one member, undef. The second creates an array with no members.
By the way, if you started your script like this:
Code:
#!/usr/bin/perl -w
use strict;
...
You would have been warned when you print the first member of the array that you are trying to print an undef. It's good practice to use strict and the -w flag whenever possible. The are a few exceptions when it's not worth it, but they are few and far between.
|
|
|
|
02-01-2007, 08:28 AM
|
#14
|
|
Member
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60
Original Poster
Rep:
|
Hi matthew,
Thanks for the input!!!
I executed your code and this is the output i got...
bash-2.03$ perl TestSpli.pl
Oh dear, a line of input we can't parse: split: illegal option -- -
; at TestSpli.pl line 10, <SPLIT> chunk 1.
Oh dear, a line of input we can't parse: Usage: split [-l #] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 2.
Oh dear, a line of input we can't parse: split [-b #[k|m]] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 3.
Oh dear, a line of input we can't parse: split [-#] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 4.
I tried in general
bash-2.03$ split --verbose -l 10 Employees.txt
split: illegal option -- -
Usage: split [-l #] [-a #] [file [name]]
split [-b #[k|m]] [-a #] [file [name]]
split [-#] [-a #] [file [name]]
Regards
|
|
|
|
02-01-2007, 08:41 AM
|
#15
|
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
Looks like you don't have the same version of split that I do - are you using the GNU implementation? What OS are you doing this on?
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 03:23 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|