LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 01-30-2007, 02:08 AM   #1
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Rep: Reputation: 15
Split a large file and get the names of output files using Perl


Hi,

I have to take a large file (name from command line) and then check its line count and then make them into small files.

I want to capture the name of all the small files that have been created and then parse through these small files to check for a condition..in the date.

All these has to be accomplished in a Perl script.

Thanks in advance
 
Old 01-30-2007, 02:40 AM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Does it have to split the files at a line break?

Do you want to specify a prefix for the split file names?

When you say "check for a condition in the date", what do you mean? It sounds like these are log files, so maybe each line starts with a date? Can you provide an example line?
 
Old 01-30-2007, 02:55 AM   #3
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Original Poster
Rep: Reputation: 15
Hi,

I was able to parse through the data file and check the condition but the file name was hardcoded in the perl script.

Now my trouble is with the first part of the problem : regarding splitting the file and taking the names of the files created so that i can parse them.

With

$count=`wc -l Temp.dat`;

I am able to get the count of the lines...now to split the large file ..i want to use split function but how can i capture the names of teh files created???

Thanks
 
Old 01-30-2007, 09:28 AM   #4
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Original Poster
Rep: Reputation: 15
Any one knows!!!
 
Old 01-30-2007, 09:55 AM   #5
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
If you answer my questions, I can help.
 
Old 01-30-2007, 11:42 PM   #6
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Original Poster
Rep: Reputation: 15
I am splitting the file using split -l option
It is working fine....but i want to capture the names of the files generated programmatically


For the present,I am hardcoding the names.

I parse the generated file and check whether a particular column length is beyond a value....

Regards
 
Old 01-31-2007, 04:45 AM   #7
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
split will print the names of the files it creates to standard output,
Code:
$ split --verbose -l 10 input_file
creating file `xaa'
creating file `xab'
creating file `xac'
creating file `xad'
You can capture it like this:
Code:
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Then all you have to do it read the input and extract the file names. Careful to escape meta-characters in the regular expression used to extract the filenames:
Code:
my @files;
while (<SPLIT>) {
        if ( /^creating file \`(.*)'$/ ) {
                push(@files, $1);
        }
        else {
                warn "Oh dear, a line of input we can't parse: $_;";
        }
}
close(SPLIT);

# now you have extracted them, you can do what you like...
foreach my $file (@files) {
        print "got a file name: $file\n";
}
}[/CODE]
 
Old 01-31-2007, 08:23 AM   #8
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Original Poster
Rep: Reputation: 15
Code:
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Is there any difference when executing this code and split from command prompt..

and what is this for in the while loop

" if ( /^creating file \`(.*)'$/ ) {"

I donthv much idea abt perl....its an urgent requirement...
 
Old 01-31-2007, 09:41 AM   #9
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Quote:
Originally Posted by Sherlock
Code:
open(SPLIT, "split -l 10 --verbose input_file 2>&1 |") || die "couldn't run split : $!\n";
Is there any difference when executing this code and split from command prompt..
The program will be invoked the same, except the output will not go to the terminal - it will be readable from the SPLIT filehandle.

Quote:
and what is this for in the while loop

" if ( /^creating file \`(.*)'$/ ) {"

I donthv much idea abt perl....its an urgent requirement...
The m/PATTERN/ operator (see the perlop manual page for full documentation) tests to see if some value (by default $_ - the current input line) matches a regular expression pattern, PATTERN.

/PATTERN/ is an abbreviation of m/PATTERN/.

Regular expressions are the most amazing things, but I'm not going to describe them fully here. You should read the perlre manual page. Parts of a regular expression in (parenthesis), if found, are assigned to the variables $1, $2, $3 etc. So the whole if block says: "if the current line of input matches this pattern, push the bit between the parenthesis onto the array @files, otherwise print a warning".
 
Old 02-01-2007, 12:40 AM   #10
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Original Poster
Rep: Reputation: 15
Thanks matthew!!!!!

I hv a doubt abt push


for ($i=0;$i < 5;$i++) {
push(@file_array,"a");
}
# shift(@file_array);

when i print $file_array[0]

ouptut is nothing
but for index [1] it is a

If i use shift it is fine..i am able to remove the first non existent value...

Why does push behave like this..???
 
Old 02-01-2007, 03:58 AM   #11
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Works for me.
Code:
#!/usr/bin/perl -w

use strict;

my @a;
for(my $i=0; $i<3; $i++) {
        push(@a, "a");
}

foreach my $val (@a) {
        print "got: $val\n";
}
Output is:
Code:
got: a
got: a
got: a
Post your full code and output. Use [code] tags to make it more readable.
 
Old 02-01-2007, 07:28 AM   #12
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Original Poster
Rep: Reputation: 15
Code:
#!/usr/bin/perl

@file_array=undef;

for ($i=0;$i < 5;$i++) {
    push(@file_array,"a");
}
 # shift(@file_array);
print $file_array[0] #nothing prints
print $file_array[1] #a
print $file_array[2] #a
 
Old 02-01-2007, 08:03 AM   #13
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
It is because when you are initialising the array @file_array with one member - undef, and then in the for loop pushing "a"s onto it. After the for loop, it looks like this: (undef, "a", "a", "a", "a", "a").

There is a difference between doing this:
Code:
@array = undef;
and this:
Code:
@array = ();
The first creates @array with one member, undef. The second creates an array with no members.

By the way, if you started your script like this:
Code:
#!/usr/bin/perl -w

use strict;
...
You would have been warned when you print the first member of the array that you are trying to print an undef. It's good practice to use strict and the -w flag whenever possible. The are a few exceptions when it's not worth it, but they are few and far between.
 
Old 02-01-2007, 08:28 AM   #14
Sherlock
Member
 
Registered: Mar 2004
Location: India
Distribution: RedHat Linux 8
Posts: 60

Original Poster
Rep: Reputation: 15
Hi matthew,

Thanks for the input!!!

I executed your code and this is the output i got...

bash-2.03$ perl TestSpli.pl
Oh dear, a line of input we can't parse: split: illegal option -- -
; at TestSpli.pl line 10, <SPLIT> chunk 1.
Oh dear, a line of input we can't parse: Usage: split [-l #] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 2.
Oh dear, a line of input we can't parse: split [-b #[k|m]] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 3.
Oh dear, a line of input we can't parse: split [-#] [-a #] [file [name]]
; at TestSpli.pl line 10, <SPLIT> chunk 4.

I tried in general

bash-2.03$ split --verbose -l 10 Employees.txt
split: illegal option -- -
Usage: split [-l #] [-a #] [file [name]]
split [-b #[k|m]] [-a #] [file [name]]
split [-#] [-a #] [file [name]]


Regards
 
Old 02-01-2007, 08:41 AM   #15
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Looks like you don't have the same version of split that I do - are you using the GNU implementation? What OS are you doing this on?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Split large file in several files using scripting (awk etc.) chipix Programming 14 10-29-2007 11:16 AM
Split Large Very Files (Software) kolmogorov Solaris / OpenSolaris 5 11-18-2005 11:46 AM
different names for output files rohr Programming 2 03-23-2005 09:44 AM
Split large file into multiples jdozarchuk Linux - Newbie 1 11-04-2004 09:42 AM
split a large mpeg file into two zstingx Linux - General 3 11-06-2003 06:26 PM


All times are GMT -5. The time now is 08:50 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration