LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-18-2011, 03:59 PM   #1
monkeyorhunter
LQ Newbie
 
Registered: Feb 2011
Posts: 10

Rep: Reputation: 0
Breaking up large .txt file


Hi,

I have a large text file with three columns. I'm trying to write a PERL script that splits the file up based on the value of the 3rd column. So every time the third column reads 0, a new file is created and all the data up until the next 0 is found is written to that new file. This should happen over and over until the initial file has been entirely split up.

ex data:


0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878
0 0 0
2 0 34
3 0 34
3 3 52


so with the data above, the file would be split into two files

data_1.txt would contain

0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878

and data_2.txt would contain

2 3 0
2 0 34
3 0 34
3 3 52

any help would be much appreciated.

Thanks!
 
Old 02-18-2011, 04:01 PM   #2
monkeyorhunter
LQ Newbie
 
Registered: Feb 2011
Posts: 10

Original Poster
Rep: Reputation: 0
oops, data_2.txt the file should contain

0 0 0
2 0 34
3 0 34
3 3 52
 
Old 02-18-2011, 04:36 PM   #3
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,778

Rep: Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625Reputation: 2625
Ok, we'll be glad to help. Post what you've written so far, and where you're stuck...
 
Old 02-18-2011, 05:22 PM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
May I suggest creating filenames whose numeric indexes are padded with enough leading zeros that they sort equivalently both alphabetically and numerically?
Code:
    $filename=sprintf("data_%06d.txt",$counter++);
--- rod.
 
Old 02-18-2011, 05:59 PM   #5
monkeyorhunter
LQ Newbie
 
Registered: Feb 2011
Posts: 10

Original Poster
Rep: Reputation: 0
Hi,

This is my scripts so far. What seems to happen though, is all the data simply gets rewritten into the new file.


#!/usr/bin/perl
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
($a, $b, $c) = split/\t/,$line;
if ($c eq 0) {
$chr++;
close NData;
open (NData,">$Output");
}
print NData ($line);
}
}

Thanks for the help!
 
Old 02-18-2011, 06:47 PM   #6
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Code:
#!/usr/bin/perl -w
use strict;
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
    ($a, $b, $c) = split/\t/,$line;
    if ($c eq 0) {
        $chr++;
        close NData;
        $Output= "data_$chr.txt";
        open (NData,">$Output");
    }
    print NData $line;
}
--- rod.
 
Old 02-18-2011, 07:04 PM   #7
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
And of course, there's always that perennial, pop favorite "split"

Won't necessarily work the way you want ... but might actually work a lot better

Just a thought...

Last edited by paulsm4; 02-18-2011 at 07:05 PM.
 
Old 02-18-2011, 07:21 PM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,986
Blog Entries: 11

Rep: Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880
Quote:
Originally Posted by paulsm4 View Post
And of course, there's always that perennial, pop favorite "split"

Won't necessarily work the way you want ... but might actually work a lot better

Just a thought...

Split won't be any good if the input isn't always split on the same
interval, which is what his sample data suggests; the criteria is
of the "0 0 0" kind, not "split at every 5th line".



Cheers,
Tink
 
Old 02-18-2011, 08:10 PM   #9
monkeyorhunter
LQ Newbie
 
Registered: Feb 2011
Posts: 10

Original Poster
Rep: Reputation: 0
Hi

Unfortunately the files are still all being rewritten to data_1.txt.

Does anyone know why this might be happening?

Thanks!
 
Old 02-18-2011, 09:06 PM   #10
paranoidx
LQ Newbie
 
Registered: Jul 2006
Posts: 24

Rep: Reputation: 2
change:
Quote:
if ($c eq 0) {
to

Quote:
if ($c == 0) {
"eq" is used to compare strings, another alternative is to chomp $c and compare with c$ eq "0".

since the first line matches the if, it will then immediately close the first file(data_1.txt) with 0 bytes, but it shouldn't be much drama to exclude it with a condition.
 
Old 02-18-2011, 11:35 PM   #11
monkeyorhunter
LQ Newbie
 
Registered: Feb 2011
Posts: 10

Original Poster
Rep: Reputation: 0
Thank you all very much for your help. The script is working great!
 
  


Reply

Tags
perl


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Copy the contents of a txt file to other txt files (with similar names) by cp command TheIndependentAquarius Linux - Newbie 7 07-03-2010 12:54 AM
BASH-Constructing array from large numerical matrix .txt hippotonic Linux - Newbie 8 12-13-2009 07:24 PM
Need to convert a large number of file types from none standard to txt metalme Linux - Newbie 2 09-28-2009 05:46 PM
[quick] trying to split a large file but linux says it's to large steve51184 Linux - General 16 05-06-2008 07:40 AM
How can read from file.txt C++ where can save this file(file.txt) to start reading sam_22 Programming 1 01-11-2007 05:11 PM


All times are GMT -5. The time now is 12:59 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration