Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
02-18-2011, 03:59 PM
|
#1
|
|
LQ Newbie
Registered: Feb 2011
Posts: 10
Rep:
|
Breaking up large .txt file
Hi,
I have a large text file with three columns. I'm trying to write a PERL script that splits the file up based on the value of the 3rd column. So every time the third column reads 0, a new file is created and all the data up until the next 0 is found is written to that new file. This should happen over and over until the initial file has been entirely split up.
ex data:
0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878
0 0 0
2 0 34
3 0 34
3 3 52
so with the data above, the file would be split into two files
data_1.txt would contain
0 0 0
2 0 24
2 2 43
2 1 43
96 96 2871
97 97 2878
and data_2.txt would contain
2 3 0
2 0 34
3 0 34
3 3 52
any help would be much appreciated.
Thanks!
|
|
|
|
02-18-2011, 04:01 PM
|
#2
|
|
LQ Newbie
Registered: Feb 2011
Posts: 10
Original Poster
Rep:
|
oops, data_2.txt the file should contain
0 0 0
2 0 34
3 0 34
3 3 52
|
|
|
|
02-18-2011, 04:36 PM
|
#3
|
|
Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 11,805
|
Ok, we'll be glad to help. Post what you've written so far, and where you're stuck...
|
|
|
|
02-18-2011, 05:22 PM
|
#4
|
|
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,265
|
May I suggest creating filenames whose numeric indexes are padded with enough leading zeros that they sort equivalently both alphabetically and numerically?
Code:
$filename=sprintf("data_%06d.txt",$counter++);
--- rod.
|
|
|
|
02-18-2011, 05:59 PM
|
#5
|
|
LQ Newbie
Registered: Feb 2011
Posts: 10
Original Poster
Rep:
|
Hi,
This is my scripts so far. What seems to happen though, is all the data simply gets rewritten into the new file.
#!/usr/bin/perl
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
($a, $b, $c) = split/\t/,$line;
if ($c eq 0) {
$chr++;
close NData;
open (NData,">$Output");
}
print NData ($line);
}
}
Thanks for the help!
|
|
|
|
02-18-2011, 06:47 PM
|
#6
|
|
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,265
|
Code:
#!/usr/bin/perl -w
use strict;
my $chr = 1;
my $Input = "data4.txt";
my $Output= "data_$chr.txt";
open (Data,"<$Input");
open (NData,">$Output");
foreach $line(<Data>){
($a, $b, $c) = split/\t/,$line;
if ($c eq 0) {
$chr++;
close NData;
$Output= "data_$chr.txt";
open (NData,">$Output");
}
print NData $line;
}
--- rod.
|
|
|
|
02-18-2011, 07:04 PM
|
#7
|
|
Guru
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,861
Rep: 
|
And of course, there's always that perennial, pop favorite " split"
Won't necessarily work the way you want ... but might actually work a lot better
Just a thought...
Last edited by paulsm4; 02-18-2011 at 07:05 PM.
|
|
|
|
02-18-2011, 07:21 PM
|
#8
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
Quote:
Originally Posted by paulsm4
And of course, there's always that perennial, pop favorite " split"
Won't necessarily work the way you want ... but might actually work a lot better
Just a thought...
|
Split won't be any good if the input isn't always split on the same
interval, which is what his sample data suggests; the criteria is
of the "0 0 0" kind, not "split at every 5th line".
Cheers,
Tink
|
|
|
|
02-18-2011, 08:10 PM
|
#9
|
|
LQ Newbie
Registered: Feb 2011
Posts: 10
Original Poster
Rep:
|
Hi
Unfortunately the files are still all being rewritten to data_1.txt.
Does anyone know why this might be happening?
Thanks!
|
|
|
|
02-18-2011, 09:06 PM
|
#10
|
|
LQ Newbie
Registered: Jul 2006
Posts: 22
Rep:
|
change:
to
"eq" is used to compare strings, another alternative is to chomp $c and compare with c$ eq "0".
since the first line matches the if, it will then immediately close the first file(data_1.txt) with 0 bytes, but it shouldn't be much drama to exclude it with a condition.
|
|
|
|
02-18-2011, 11:35 PM
|
#11
|
|
LQ Newbie
Registered: Feb 2011
Posts: 10
Original Poster
Rep:
|
Thank you all very much for your help. The script is working great!
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 02:47 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|