Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
04-13-2006, 04:48 PM
|
#1
|
LQ Newbie
Registered: Mar 2005
Posts: 8
Rep:
|
How to split a file into more sub files
Hello All,
I have a report file in regular text format, which was concatnated by 5 different BASE files by my Web team. In the report file, I have first 20 lines represent the first base file. Then 2 blank lines, next will be the second base file, then 2 blank lines,...until the 5th base file.
It looks like the following (In my report file):
12, test1
....
....
20, test20
get, report2, name
fiad, dfdfd, dfdfd
....
....
....
dff, fdfd, fdfd
get, report3, file, time
dfd, fdfd, fdfd, rrf
...
...
fdfd, hhg, ere, erer
What I want is to split one report file back to five base files. In the report file, my 5 portion are seperated by 2 lines of space. I have tried CSPILT, AWK, and CUT. It just doesn't work out. Please help...
|
|
|
04-13-2006, 06:34 PM
|
#2
|
Member
Registered: Dec 2005
Location: Tväråmark, Sweden
Distribution: Debian/Kubuntu
Posts: 105
Rep:
|
Try the line editor ed. I haven't used it recently but would apply it like
First read the file, then
repeat until the file is empty:
find the double empty lines
write line (1,.) to a new file
delete line (1,.)
endrepeat
done!
Read "man ed" and work it out!
|
|
|
04-14-2006, 09:30 AM
|
#3
|
LQ Newbie
Registered: Mar 2005
Posts: 8
Original Poster
Rep:
|
Split one big file to 5 files
Thank you so much for the reply. I will try that. If possible, would you please provide a sample code?
Thank
Michael
|
|
|
04-14-2006, 10:14 AM
|
#4
|
LQ Newbie
Registered: Mar 2005
Posts: 8
Original Poster
Rep:
|
I have found the exact post on this forum athttp://www.linuxquestions.org/questions/showthread.php?t=182909. But when I tried the perl script in that post, it didn't work out. I have modified the perl as follow:
#!/usr/bin/perl
#
use strict;
use IO::Handle;
my ($line, $nr);
my $thebigfile = "/home/oracle/projects/Achaya/test/wbreports.txt"; # input file location
my $logfile = "/home/oracle/projects/Achaya/test/newwb"; # output files basename
my ($previousFileTimeSize, $currentFileTimeSize);
$previousFileTimeSize = 1;
print "START\n";
open(LOGFILE, ">$logfile");
LOGFILE->autoflush(1);
while (1) {
$currentFileTimeSize = (stat($thebigfile))[7]; # size
print $currentFileTimeSize;
if ($currentFileTimeSize != $previousFileTimeSize) {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail MODIFIED\n";
$previousFileTimeSize = $currentFileTimeSize;
} else {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail no modification\n";
}
sleep 30;
}
close LOGFILE;
Unfortunately, no outfile was generated.
Michael
|
|
|
04-14-2006, 10:30 AM
|
#5
|
Moderator
Registered: Mar 2003
Location: Scotland
Distribution: Slackware, RedHat, Debian
Posts: 12,047
Rep:
|
Using head and tail would probably be quicker:
#!/bin/bash
head -n 20 /tmp/report.txt > /tmp/part.1
head -n 42 /tmp/report.txt | tail -n 20 > /tmp/part.2
head -n 64 /tmp/report.txt | tail -n 20 > /tmp/part.3
head -n 86 /tmp/report.txt | tail -n 20 > /tmp/part.4
|
|
|
04-14-2006, 11:12 AM
|
#6
|
LQ Newbie
Registered: Feb 2006
Location: Vienna, VA
Distribution: SUSE Leap 15.2
Posts: 9
Rep:
|
Csplit is a contextual splitter, so you can split files depending on matching lines. Head and tail would work, but you don't need to know how many lines there are with csplit.
For your file, run something like this:
csplit -z infile /"get,"/ '{*}'
You'll get files xx00, xx01, etc. xx00 contains the part of infile from the start up to the first matching line. xx01 contains the matching line and up to the next matching line. Etc.
Check the man page for more stuff that it'll do.
Dogmatix
edit: fixed argument order...
Last edited by Dogmatix; 04-14-2006 at 11:25 AM.
|
|
|
04-14-2006, 11:14 AM
|
#7
|
Member
Registered: Dec 2005
Location: Tväråmark, Sweden
Distribution: Debian/Kubuntu
Posts: 105
Rep:
|
Or simply use ed. If 'text5.txt' is the file with the five sections of arbitrary length subdivided by double empty lines, and if you prepare the content of the file 'edinp' like this:
Code:
e text5.txt
/^$/
/^$/
1,.w part1
1,.d
/^$/
/^$/
1,.w part2
1,.d
/^$/
/^$/
1,.w part3
1,.d
/^$/
/^$/
1,.w part4
1,.d
/^$/
/^$/
1,.w part5
q
Then run the command 'ed < edinp' to produce the five part# files.
P.S. You may extend edinp with more part#s than are actually present in the input file with no harm. Then ed will gracefully exit with ?. And, of course, this approach permits that you change the section location regexp(s) to something more relevant for each section in cases when two empty lines wouldn't suffice. Nice old line editor!
Last edited by toreric; 04-14-2006 at 11:16 AM.
|
|
|
04-14-2006, 02:09 PM
|
#8
|
LQ Newbie
Registered: Mar 2005
Posts: 8
Original Poster
Rep:
|
Thank you all for your help. Because I will get the big report file daily, I need a program to split it into 5 small files base on the space lines. I tried the csplit, it just won't take the blank space as a pattern to split the file. I would like to stick with the solution posted in the previous post at : http://www.linuxquestions.org/questi...d.php?t=182909.
It looks like that's the reasonable solution for my case. Unfortuanlly, I am not perl guy. I got stuck on writing the output to the file.
Thanks and have a great weekend
Michael
|
|
|
04-15-2006, 10:09 AM
|
#9
|
LQ Newbie
Registered: Feb 2006
Location: Vienna, VA
Distribution: SUSE Leap 15.2
Posts: 9
Rep:
|
Most *nix utilities are line-based, so you can't search for two blank lines, only one. I thought each report had a similar header that you could search for ("get," in your example). If not, csplit won't work. Another thought: you could use sed to look for a blank line, and replace it with unique string, the use csplit to find it, then use sed again to change the unique string back to a blank line.
Searching for a blank line in csplit is easy. Use "/^$/" as the regexp. You'll end up with 9 files, though, since 4 of them will be just one blank line. If you just ignore them and use xx00, xx02, xx04, xx06, and xx08, you'll have your 5 reports.
Or, you could whip up a perl script. I'd be tempted to write a short C program since I don't know perl either.
Dogmatix
|
|
|
04-15-2006, 10:23 AM
|
#10
|
Member
Registered: Dec 2005
Location: Tväråmark, Sweden
Distribution: Debian/Kubuntu
Posts: 105
Rep:
|
Or, if you know neither Perl nor C/C++ very well but some Bash: Just make a nice Bash script where the Ed line editor is utilized: a straightforward way to obtain the desired functionality in five minutes...
|
|
|
All times are GMT -5. The time now is 08:01 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|