using sed to extrac data from xml tags and make the result displayed in one line
I have an xml file that is similar to this.
Suppose that this file name is Example. <PMID>10605436</PMID> <Year>2000</Year> <ArticleTitle>Steroids</ArticleTitle> <MedlinePgn>255-60</MedlinePgn> <AbstractText>Steroids Abstracts </AbstractText> <PMID>10605437</PMID> <Year>2001</Year> <ArticleTitle>Hormone</ArticleTitle> <MedlinePgn>123-34</MedlinePgn> <AbstractText>Hormones Abstracts</AbstractText> I used sed -n -e 's/.*<PMID>\(.*\)<\/PMID>.*/\1/p' -e 's/.*<ArticleTitle>\(.*\)<\/ArticleTitle>.*/\1/p' -e 's/.*<AbstractText>\(.*\)<\/AbstractText>.*/\1/p' Example I get the output 10605436 Steroids Steroids Abstracts 10605437 Hormone Hormones Abstracts How do I modify my sed command so that it prints my needed information in one line, i.e. 10605436 Steroids Steroids Abstracts 10605437 Hormone Hormones Abstracts |
Hi, welcome to LQ!
And because I know awk better than sed ... ;} Code:
awk '{payload=gensub(/[^>]+>([^<]+).*/, "\\1", "1")}/PMID|ArticleTitle/{printf "%s\t",payload}/AbstractText/{printf "%s\n",payload}' Cheers, Tink |
Or maybe:
Code:
awk -F"[><]" '/PMID|ArticleTitle|AbstractText/{ORS=/AbstractText/?"\n":" ";print $3}' file |
Using GNU sed.
The h and H commands build the output in the hold space. The g command copies the contents of the hold space back into the pattern space. s/\n/ /g replaces the newlines with spaces. Code:
sed -n '/<PMID>/{s/.*>\(.*\)<.*/\1/;h} |
Many thanks to all. It works perfectly!!
|
Please mark as SOLVED if you have a solution.
|
All times are GMT -5. The time now is 10:25 AM. |