LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 11-17-2009, 01:49 PM   #1
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Rep: Reputation: 47
Splitting files by pattern match


I have a series of files which have been concatenated together. Each of the files has a header, something like this:

metasyntactic_variables.txt


Code:
header----- # primary
xyzzy foo
header----- # secondary
bar
baz
header----- # quux family
quux
quuux
quuuux
quuuuux
Does anyone know if there is a tool, similar to split, which can separate these in to files, by header, something like this?:

Code:
$ wondersplit --patern-"^header-----" metasyntactic_variables.txt
which produces

xa
Code:
header----- # primary as the hills
xyzzy foo
xb
Code:
header----- # secondary
bar
baz
xc
Code:
header----- # quux family
quux
quuux
quuuux
quuuuux
I know that I could hack this together in perl in about 15 minutes, but it would be nice know if this exists as a stand alone tool.
 
Old 11-17-2009, 02:01 PM   #2
sarum1990
Member
 
Registered: Dec 2008
Distribution: Gentoo, Debian
Posts: 31

Rep: Reputation: 21
This can be solved with a faily simple gawk command:

gawk 'BEGIN{fnum=0; out="outf";} /^header----/ {fnum++;} {print $0 >> out""fnum}' <INPUT-FILE>

essentially scan through outputing every line to the file "outf#" and increment # everytime you find the regexp ^header----.

Hope This Helps
 
1 members found this post helpful.
Old 11-17-2009, 02:20 PM   #3
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
I knew that someone would jump in with an awk script...

very nice.

Thanks.
 
Old 11-17-2009, 02:25 PM   #4
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
Wow. That ran so fast I thought that it had failed, but I got output.

note to self: learn some awk.
 
Old 11-19-2009, 07:08 PM   #5
sarum1990
Member
 
Registered: Dec 2008
Distribution: Gentoo, Debian
Posts: 31

Rep: Reputation: 21
Just wanted to update this, I stumbled across a much better option today

the command csplit

"content split"

just a man csplit will show you how to use it. I feel kinda silly running to gawk when this option was available.
 
Old 11-19-2009, 07:27 PM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
the awk code is simply
Code:
awk '/header/{++d}{print $0>"file_"d}' file
 
Old 11-19-2009, 07:46 PM   #7
sarum1990
Member
 
Registered: Dec 2008
Distribution: Gentoo, Debian
Posts: 31

Rep: Reputation: 21
Quote:
Originally Posted by ghostdog74 View Post
the awk code is simply
Code:
awk '/header/{++d}{print $0>"file_"d}' file
Running that on my Mac OS X version of awk I get an error due to too many files being open for write when it's run with > 18 different files combined in one. I have the same error with the version I posted earlier.

If I'm not mistaken closing the files after having written to them fixes this though.

Code:
awk '/header/{close("file_"d);++d}{print $0>"file_"d}' file
but still I think I'd use the program csplit for anything like this in the future, since that is the entire functionality of that program.
 
Old 11-19-2009, 09:30 PM   #8
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Original Poster
Rep: Reputation: 47
Well... I'll be damned. I think that I used csplit about 10 years ago, and I totally forgot about it. How did you run across it?
 
Old 11-20-2009, 11:05 AM   #9
sarum1990
Member
 
Registered: Dec 2008
Distribution: Gentoo, Debian
Posts: 31

Rep: Reputation: 21
I was in a situation without internet and was navigating around info pages looking for the pr -m or paste command to merge two files line by line for awk processing. When I got to the text-processing commands I noticed csplit and decided to check out what it did.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Match pattern and replace sol_nov Programming 7 11-30-2009 09:23 PM
[SOLVED] Adding (not replacing) a pattern match with a similar pattern? b-bri Linux - Newbie 2 08-31-2009 01:36 AM
printing pattern match and not whole line that matches pattern Avatar33 Programming 13 05-06-2009 07:17 AM
How to delete the file using pattern match? nishanthhampali Programming 3 04-16-2008 01:50 PM
Select the files of a directory that match a specific pattern jianelisj Linux - Newbie 2 03-17-2008 01:25 PM


All times are GMT -5. The time now is 08:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration