Breaking up large .txt file

monkeyorhunter · 02-18-2011, 03:59 PM

Hi,

I have a large text file with three columns. I'm trying to write a PERL script that splits the file up based on the value of the 3rd column. So every time the third column reads 0, a new file is created and all the data up until the next 0 is found is written to that new file. This should happen over and over until the initial file has been entirely split up.

ex data:

0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878
0 0 0
2 0 34
3 0 34
3 3 52

so with the data above, the file would be split into two files

data_1.txt would contain

0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878

and data_2.txt would contain

2 3 0
2 0 34
3 0 34
3 3 52

any help would be much appreciated.

Thanks!

monkeyorhunter · 02-18-2011, 04:01 PM

oops, data_2.txt the file should contain

0 0 0
2 0 34
3 0 34
3 3 52

TB0ne · 02-18-2011, 04:36 PM

Ok, we'll be glad to help. Post what you've written so far, and where you're stuck...

theNbomr · 02-18-2011, 05:22 PM

May I suggest creating filenames whose numeric indexes are padded with enough leading zeros that they sort equivalently both alphabetically and numerically?

Code:

    $filename=sprintf("data_%06d.txt",$counter++);

--- rod.

monkeyorhunter · 02-18-2011, 05:59 PM

Hi,

This is my scripts so far. What seems to happen though, is all the data simply gets rewritten into the new file.

#!/usr/bin/perl
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
($a, $b, $c) = split/\t/,$line;
if ($c eq 0) {
$chr++;
close NData;
open (NData,">$Output");
}
print NData ($line);
}
}

Thanks for the help!

theNbomr · 02-18-2011, 06:47 PM

Code:

#!/usr/bin/perl -w
use strict;
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
    ($a, $b, $c) = split/\t/,$line;
    if ($c eq 0) {
        $chr++;
        close NData;
        $Output= "data_$chr.txt";
        open (NData,">$Output");
    }
    print NData $line;
}

--- rod.

paulsm4 · 02-18-2011, 07:04 PM

And of course, there's always that perennial, pop favorite "split"

Won't necessarily work the way you want ... but might actually work a lot better

Just a thought...

Tinkster · 02-18-2011, 07:21 PM

Quote:

Originally Posted by paulsm4

And of course, there's always that perennial, pop favorite "split"

Won't necessarily work the way you want ... but might actually work a lot better

Just a thought...

Split won't be any good if the input isn't always split on the same
interval, which is what his sample data suggests; the criteria is
of the "0 0 0" kind, not "split at every 5th line".

Cheers,
Tink

monkeyorhunter · 02-18-2011, 08:10 PM

Hi

Unfortunately the files are still all being rewritten to data_1.txt.

Does anyone know why this might be happening?

Thanks!

paranoidx · 02-18-2011, 09:06 PM

change:

Quote:

if ($c eq 0) {

to

Quote:

if ($c == 0) {

"eq" is used to compare strings, another alternative is to chomp $c and compare with c$ eq "0".

since the first line matches the if, it will then immediately close the first file(data_1.txt) with 0 bytes, but it shouldn't be much drama to exclude it with a condition.

monkeyorhunter · 02-18-2011, 11:35 PM

Thank you all very much for your help. The script is working great!