LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 04-13-2006, 03:48 PM   #1
michaelyu33
LQ Newbie
 
Registered: Mar 2005
Posts: 8

Rep: Reputation: 0
How to split a file into more sub files


Hello All,

I have a report file in regular text format, which was concatnated by 5 different BASE files by my Web team. In the report file, I have first 20 lines represent the first base file. Then 2 blank lines, next will be the second base file, then 2 blank lines,...until the 5th base file.

It looks like the following (In my report file):
12, test1
....
....
20, test20


get, report2, name
fiad, dfdfd, dfdfd
....
....
....
dff, fdfd, fdfd


get, report3, file, time
dfd, fdfd, fdfd, rrf
...
...
fdfd, hhg, ere, erer

What I want is to split one report file back to five base files. In the report file, my 5 portion are seperated by 2 lines of space. I have tried CSPILT, AWK, and CUT. It just doesn't work out. Please help...
 
Old 04-13-2006, 05:34 PM   #2
toreric
Member
 
Registered: Dec 2005
Location: Tväråmark, Sweden
Distribution: Debian/Kubuntu
Posts: 96

Rep: Reputation: 16
Try the line editor ed. I haven't used it recently but would apply it like

First read the file, then
repeat until the file is empty:
find the double empty lines
write line (1,.) to a new file
delete line (1,.)
endrepeat
done!

Read "man ed" and work it out!
 
Old 04-14-2006, 08:30 AM   #3
michaelyu33
LQ Newbie
 
Registered: Mar 2005
Posts: 8

Original Poster
Rep: Reputation: 0
Split one big file to 5 files

Thank you so much for the reply. I will try that. If possible, would you please provide a sample code?

Thank
Michael
 
Old 04-14-2006, 09:14 AM   #4
michaelyu33
LQ Newbie
 
Registered: Mar 2005
Posts: 8

Original Poster
Rep: Reputation: 0
I have found the exact post on this forum athttp://www.linuxquestions.org/questions/showthread.php?t=182909. But when I tried the perl script in that post, it didn't work out. I have modified the perl as follow:
#!/usr/bin/perl
#
use strict;
use IO::Handle;

my ($line, $nr);

my $thebigfile = "/home/oracle/projects/Achaya/test/wbreports.txt"; # input file location
my $logfile = "/home/oracle/projects/Achaya/test/newwb"; # output files basename

my ($previousFileTimeSize, $currentFileTimeSize);
$previousFileTimeSize = 1;

print "START\n";
open(LOGFILE, ">$logfile");
LOGFILE->autoflush(1);
while (1) {
$currentFileTimeSize = (stat($thebigfile))[7]; # size
print $currentFileTimeSize;
if ($currentFileTimeSize != $previousFileTimeSize) {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail MODIFIED\n";
$previousFileTimeSize = $currentFileTimeSize;
} else {
print LOGFILE scalar localtime;
print LOGFILE ": sent-mail no modification\n";
}
sleep 30;
}
close LOGFILE;

Unfortunately, no outfile was generated.

Michael
 
Old 04-14-2006, 09:30 AM   #5
david_ross
Moderator
 
Registered: Mar 2003
Location: Scotland
Distribution: Slackware, RedHat, Debian
Posts: 12,047

Rep: Reputation: 64
Using head and tail would probably be quicker:
#!/bin/bash

head -n 20 /tmp/report.txt > /tmp/part.1
head -n 42 /tmp/report.txt | tail -n 20 > /tmp/part.2
head -n 64 /tmp/report.txt | tail -n 20 > /tmp/part.3
head -n 86 /tmp/report.txt | tail -n 20 > /tmp/part.4
 
Old 04-14-2006, 10:12 AM   #6
Dogmatix
LQ Newbie
 
Registered: Feb 2006
Location: Vienna, VA
Distribution: SUSE 10.3
Posts: 3

Rep: Reputation: 0
Csplit is a contextual splitter, so you can split files depending on matching lines. Head and tail would work, but you don't need to know how many lines there are with csplit.

For your file, run something like this:

csplit -z infile /"get,"/ '{*}'

You'll get files xx00, xx01, etc. xx00 contains the part of infile from the start up to the first matching line. xx01 contains the matching line and up to the next matching line. Etc.

Check the man page for more stuff that it'll do.

Dogmatix

edit: fixed argument order...

Last edited by Dogmatix; 04-14-2006 at 10:25 AM.
 
Old 04-14-2006, 10:14 AM   #7
toreric
Member
 
Registered: Dec 2005
Location: Tväråmark, Sweden
Distribution: Debian/Kubuntu
Posts: 96

Rep: Reputation: 16
Or simply use ed. If 'text5.txt' is the file with the five sections of arbitrary length subdivided by double empty lines, and if you prepare the content of the file 'edinp' like this:
Code:
e text5.txt
/^$/
/^$/
1,.w part1
1,.d
/^$/
/^$/
1,.w part2
1,.d
/^$/
/^$/
1,.w part3
1,.d
/^$/
/^$/
1,.w part4
1,.d
/^$/
/^$/
1,.w part5
q
Then run the command 'ed < edinp' to produce the five part# files.

P.S. You may extend edinp with more part#s than are actually present in the input file with no harm. Then ed will gracefully exit with ?. And, of course, this approach permits that you change the section location regexp(s) to something more relevant for each section in cases when two empty lines wouldn't suffice. Nice old line editor!

Last edited by toreric; 04-14-2006 at 10:16 AM.
 
Old 04-14-2006, 01:09 PM   #8
michaelyu33
LQ Newbie
 
Registered: Mar 2005
Posts: 8

Original Poster
Rep: Reputation: 0
Thank you all for your help. Because I will get the big report file daily, I need a program to split it into 5 small files base on the space lines. I tried the csplit, it just won't take the blank space as a pattern to split the file. I would like to stick with the solution posted in the previous post at : http://www.linuxquestions.org/questi...d.php?t=182909.

It looks like that's the reasonable solution for my case. Unfortuanlly, I am not perl guy. I got stuck on writing the output to the file.

Thanks and have a great weekend
Michael
 
Old 04-15-2006, 09:09 AM   #9
Dogmatix
LQ Newbie
 
Registered: Feb 2006
Location: Vienna, VA
Distribution: SUSE 10.3
Posts: 3

Rep: Reputation: 0
Most *nix utilities are line-based, so you can't search for two blank lines, only one. I thought each report had a similar header that you could search for ("get," in your example). If not, csplit won't work. Another thought: you could use sed to look for a blank line, and replace it with unique string, the use csplit to find it, then use sed again to change the unique string back to a blank line.

Searching for a blank line in csplit is easy. Use "/^$/" as the regexp. You'll end up with 9 files, though, since 4 of them will be just one blank line. If you just ignore them and use xx00, xx02, xx04, xx06, and xx08, you'll have your 5 reports.

Or, you could whip up a perl script. I'd be tempted to write a short C program since I don't know perl either.

Dogmatix
 
Old 04-15-2006, 09:23 AM   #10
toreric
Member
 
Registered: Dec 2005
Location: Tväråmark, Sweden
Distribution: Debian/Kubuntu
Posts: 96

Rep: Reputation: 16
Or, if you know neither Perl nor C/C++ very well but some Bash: Just make a nice Bash script where the Ed line editor is utilized: a straightforward way to obtain the desired functionality in five minutes...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Split large file in several files using scripting (awk etc.) chipix Programming 14 10-29-2007 11:16 AM
how does one use DD to recombine files from using pipe to split files originally? nerdful1 Linux - General 3 03-28-2006 07:46 AM
mysqldump : Can I split the file up to 2GB max per file? Swakoo Linux - General 10 10-17-2005 04:13 AM
Compress and split a big sized file into smaller files hicham007 Programming 3 07-28-2005 08:56 PM
How to rejoin split files SharpyWarpy Linux - General 2 02-07-2003 02:42 PM


All times are GMT -5. The time now is 01:37 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration