LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   using sed to extrac data from xml tags and make the result displayed in one line (https://www.linuxquestions.org/questions/linux-newbie-8/using-sed-to-extrac-data-from-xml-tags-and-make-the-result-displayed-in-one-line-875777/)

pikcolo 04-18-2011 10:42 PM

using sed to extrac data from xml tags and make the result displayed in one line
 
I have an xml file that is similar to this.
Suppose that this file name is Example.

<PMID>10605436</PMID>
<Year>2000</Year>
<ArticleTitle>Steroids</ArticleTitle>
<MedlinePgn>255-60</MedlinePgn>
<AbstractText>Steroids Abstracts </AbstractText>
<PMID>10605437</PMID>
<Year>2001</Year>
<ArticleTitle>Hormone</ArticleTitle>
<MedlinePgn>123-34</MedlinePgn>
<AbstractText>Hormones Abstracts</AbstractText>

I used
sed -n -e 's/.*<PMID>\(.*\)<\/PMID>.*/\1/p'
-e 's/.*<ArticleTitle>\(.*\)<\/ArticleTitle>.*/\1/p'
-e 's/.*<AbstractText>\(.*\)<\/AbstractText>.*/\1/p'
Example

I get the output
10605436
Steroids
Steroids Abstracts
10605437
Hormone
Hormones Abstracts


How do I modify my sed command so that it prints my needed information in one line, i.e.
10605436 Steroids Steroids Abstracts
10605437 Hormone Hormones Abstracts

Tinkster 04-18-2011 11:59 PM

Hi, welcome to LQ!

And because I know awk better than sed ... ;}

Code:

awk '{payload=gensub(/[^>]+>([^<]+).*/, "\\1", "1")}/PMID|ArticleTitle/{printf "%s\t",payload}/AbstractText/{printf "%s\n",payload}'


Cheers,
Tink

grail 04-19-2011 12:51 AM

Or maybe:
Code:

awk -F"[><]" '/PMID|ArticleTitle|AbstractText/{ORS=/AbstractText/?"\n":" ";print $3}' file

Kenhelm 04-19-2011 07:02 PM

Using GNU sed.
The h and H commands build the output in the hold space.
The g command copies the contents of the hold space back into the pattern space.
s/\n/ /g replaces the newlines with spaces.
Code:

sed -n '/<PMID>/{s/.*>\(.*\)<.*/\1/;h}
/<ArticleTitle>/{s/.*>\(.*\)<.*/\1/;H}
/<AbstractText>/{s/.*>\(.*\)<.*/\1/;H;g;s/\n/ /g;p}'


pikcolo 04-19-2011 11:13 PM

Many thanks to all. It works perfectly!!

grail 04-20-2011 01:27 AM

Please mark as SOLVED if you have a solution.


All times are GMT -5. The time now is 10:25 AM.