LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 11-10-2010, 08:54 PM   #1
taskmaster
LQ Newbie
 
Registered: Jul 2006
Location: Vanceboro, NC
Distribution: CentOS
Posts: 10

Rep: Reputation: 0
Will gawk extract bits of text fields from a few thousand identically structured file


I have several thousand small text files with the following structure:

<NAME> WATSON ANTHONY M </NAME>
<DOB> 01 21 70 </DOB>
<xref image="00003RHV.TIF|V3|1999:11:23:15:54:04.00|51981|0"> image: </xref>
------- <xref image="00003RHW.TIF|V3|1999:11:23:15:54:16.00|59254|0"> image: </xref>
------- <xref image="00003RHX.TIF|V3|1999:11:23:15:54:18.00|60390|0"> image: </xref>
------- <xref image="00003RHY.TIF|V3|1999:11:23:15:54:18.00|38973|0"> image: </xref>
-------

Each file has different NAME value, different DOB value, and can contain from 1 to 50 "image" lines.

I am going to write a shell program to read in the couple of thousand text files, and output to a single file with the following format:

NAME1, DOB1, image1.tif
NAME1, DOB1, image2.tif
NAME1, DOB1, image3.tif
NAME2, DOB2, image4.tif
etc

Is sed or awk (gawk) the best tool for this process?

Thanks in advance for assistance, I haven't used awk and sed in appx 8 years, and forget their specific attributes / suitability, but have programmed professionally for appx 10 years (which I gave up for Hardware Platform / Infrastructure Support about 8 years ago).
 
Old 11-10-2010, 09:01 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,445

Rep: Reputation: 1068Reputation: 1068Reputation: 1068Reputation: 1068Reputation: 1068Reputation: 1068Reputation: 1068Reputation: 1068
[g]awk - unquestionably. Or perl if you're into that.
 
1 members found this post helpful.
Old 11-10-2010, 09:23 PM   #3
taskmaster
LQ Newbie
 
Registered: Jul 2006
Location: Vanceboro, NC
Distribution: CentOS
Posts: 10

Original Poster
Rep: Reputation: 0
So [g]awk is the way to go, cool. Steam is coming out of my ears and lightbulbs are going lighting over my head, Now, where on the internet can I find a great [g]awk reference?
 
Old 11-10-2010, 09:27 PM   #4
GrapefruiTgirl
Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Best place to begin, I think, for instructions and examples: http://www.gnu.org/manual/gawk/gawk.html

Though you will likely also find the information you need in other LQ threads - see the Search page - but not in the same nice orderly fashion as in the manual.
 
1 members found this post helpful.
Old 11-10-2010, 09:46 PM   #5
taskmaster
LQ Newbie
 
Registered: Jul 2006
Location: Vanceboro, NC
Distribution: CentOS
Posts: 10

Original Poster
Rep: Reputation: 0
I am at the site and it is in fact a perfect starting point (http://www.gnu.org/manual/gawk/gawk.html). thanks all for your assistance.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] specifying fields for printing in gawk from command line David the H. Programming 8 08-04-2009 04:32 PM
How to do search & replace on a text file--need to extract URLs from a sitemap file Mountain Linux - General 3 04-05-2009 02:22 PM
Extract certain text info from text file xmrkite Linux - Software 30 02-26-2008 12:06 PM
how not to print the 4th field from a text file with six fields livetoday Red Hat 3 10-02-2007 02:19 PM


All times are GMT -5. The time now is 10:26 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration