LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-18-2011, 11:42 PM   #1
pikcolo
LQ Newbie
 
Registered: Apr 2011
Posts: 2

Rep: Reputation: Disabled
using sed to extrac data from xml tags and make the result displayed in one line


I have an xml file that is similar to this.
Suppose that this file name is Example.

<PMID>10605436</PMID>
<Year>2000</Year>
<ArticleTitle>Steroids</ArticleTitle>
<MedlinePgn>255-60</MedlinePgn>
<AbstractText>Steroids Abstracts </AbstractText>
<PMID>10605437</PMID>
<Year>2001</Year>
<ArticleTitle>Hormone</ArticleTitle>
<MedlinePgn>123-34</MedlinePgn>
<AbstractText>Hormones Abstracts</AbstractText>

I used
sed -n -e 's/.*<PMID>\(.*\)<\/PMID>.*/\1/p'
-e 's/.*<ArticleTitle>\(.*\)<\/ArticleTitle>.*/\1/p'
-e 's/.*<AbstractText>\(.*\)<\/AbstractText>.*/\1/p'
Example

I get the output
10605436
Steroids
Steroids Abstracts
10605437
Hormone
Hormones Abstracts


How do I modify my sed command so that it prints my needed information in one line, i.e.
10605436 Steroids Steroids Abstracts
10605437 Hormone Hormones Abstracts
 
Old 04-19-2011, 12:59 AM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Hi, welcome to LQ!

And because I know awk better than sed ... ;}

Code:
awk '{payload=gensub(/[^>]+>([^<]+).*/, "\\1", "1")}/PMID|ArticleTitle/{printf "%s\t",payload}/AbstractText/{printf "%s\n",payload}'


Cheers,
Tink
 
Old 04-19-2011, 01:51 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,253

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Or maybe:
Code:
awk -F"[><]" '/PMID|ArticleTitle|AbstractText/{ORS=/AbstractText/?"\n":" ";print $3}' file
 
Old 04-19-2011, 08:02 PM   #4
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 333

Rep: Reputation: 141Reputation: 141
Using GNU sed.
The h and H commands build the output in the hold space.
The g command copies the contents of the hold space back into the pattern space.
s/\n/ /g replaces the newlines with spaces.
Code:
sed -n '/<PMID>/{s/.*>\(.*\)<.*/\1/;h}
/<ArticleTitle>/{s/.*>\(.*\)<.*/\1/;H}
/<AbstractText>/{s/.*>\(.*\)<.*/\1/;H;g;s/\n/ /g;p}'
 
Old 04-20-2011, 12:13 AM   #5
pikcolo
LQ Newbie
 
Registered: Apr 2011
Posts: 2

Original Poster
Rep: Reputation: Disabled
Many thanks to all. It works perfectly!!
 
Old 04-20-2011, 02:27 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,253

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Please mark as SOLVED if you have a solution.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 12:54 PM
[SOLVED] How can I use the command line to split a single-lined XML into a multi-line XML xexers Linux - Software 3 12-09-2010 08:25 AM
Extract Data between XML tags aharrison Linux - Newbie 13 11-17-2010 08:28 PM
how-to make sed read 1 random line into a file and parse it ot a variable?? Speedy2k Linux - Newbie 7 05-24-2009 12:23 PM
sed/awk/grep for multiple line data hotrodmacman Programming 8 10-18-2007 12:06 PM


All times are GMT -5. The time now is 12:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration