LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Breaking up large .txt file (http://www.linuxquestions.org/questions/programming-9/breaking-up-large-txt-file-863607/)

monkeyorhunter 02-18-2011 03:59 PM

Breaking up large .txt file
 
Hi,

I have a large text file with three columns. I'm trying to write a PERL script that splits the file up based on the value of the 3rd column. So every time the third column reads 0, a new file is created and all the data up until the next 0 is found is written to that new file. This should happen over and over until the initial file has been entirely split up.

ex data:


0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878
0 0 0
2 0 34
3 0 34
3 3 52


so with the data above, the file would be split into two files

data_1.txt would contain

0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878

and data_2.txt would contain

2 3 0
2 0 34
3 0 34
3 3 52

any help would be much appreciated.

Thanks!

monkeyorhunter 02-18-2011 04:01 PM

oops, data_2.txt the file should contain

0 0 0
2 0 34
3 0 34
3 3 52

TB0ne 02-18-2011 04:36 PM

Ok, we'll be glad to help. Post what you've written so far, and where you're stuck...

theNbomr 02-18-2011 05:22 PM

May I suggest creating filenames whose numeric indexes are padded with enough leading zeros that they sort equivalently both alphabetically and numerically?
Code:

    $filename=sprintf("data_%06d.txt",$counter++);
--- rod.

monkeyorhunter 02-18-2011 05:59 PM

Hi,

This is my scripts so far. What seems to happen though, is all the data simply gets rewritten into the new file.


#!/usr/bin/perl
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
($a, $b, $c) = split/\t/,$line;
if ($c eq 0) {
$chr++;
close NData;
open (NData,">$Output");
}
print NData ($line);
}
}

Thanks for the help!

theNbomr 02-18-2011 06:47 PM

Code:

#!/usr/bin/perl -w
use strict;
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
    ($a, $b, $c) = split/\t/,$line;
    if ($c eq 0) {
        $chr++;
        close NData;
        $Output= "data_$chr.txt";
        open (NData,">$Output");
    }
    print NData $line;
}

--- rod.

paulsm4 02-18-2011 07:04 PM

And of course, there's always that perennial, pop favorite "split" :)

Won't necessarily work the way you want ... but might actually work a lot better :)

Just a thought...

Tinkster 02-18-2011 07:21 PM

Quote:

Originally Posted by paulsm4 (Post 4263355)
And of course, there's always that perennial, pop favorite "split" :)

Won't necessarily work the way you want ... but might actually work a lot better :)

Just a thought...


Split won't be any good if the input isn't always split on the same
interval, which is what his sample data suggests; the criteria is
of the "0 0 0" kind, not "split at every 5th line".



Cheers,
Tink

monkeyorhunter 02-18-2011 08:10 PM

Hi

Unfortunately the files are still all being rewritten to data_1.txt.

Does anyone know why this might be happening?

Thanks!

paranoidx 02-18-2011 09:06 PM

change:
Quote:

if ($c eq 0) {
to

Quote:

if ($c == 0) {
"eq" is used to compare strings, another alternative is to chomp $c and compare with c$ eq "0".

since the first line matches the if, it will then immediately close the first file(data_1.txt) with 0 bytes, but it shouldn't be much drama to exclude it with a condition.

monkeyorhunter 02-18-2011 11:35 PM

Thank you all very much for your help. The script is working great!


All times are GMT -5. The time now is 01:14 AM.