LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-20-2011, 02:53 PM   #1
bcrawl
LQ Newbie
 
Registered: Jan 2011
Posts: 11

Rep: Reputation: 0
xml parsing using sed?


Hey guys,

I have a huge xml file like this...
Code:
<manufacturers>

<manufacturer_data>
<action>UPDATE</action>
<mfr_id>6515951</mfr_id>
<local_content>0</local_content>
<name>Johnsonville Sausage, Llc</name>
</manufacturer_data>

<manufacturer_data>
<action>INSERT</action>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Foodmark</name>
</manufacturer_data>

</manufacturers>

<brands>

<brand_data>
<action>INSERT</action>
<brand_id>6594088</brand_id>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Good Food Made Simple</name>
</brand_data>

<brand_data>
<action>INSERT</action>
<brand_id>6523125</brand_id>
<mfr_id>105873</mfr_id>
<local_content>0</local_content>
<name>Hawaiian(Tm) Kettle Style Potato Chips</name>
</brand_data>
<brand_data>
</brands>
Yesterday I asked for assistance to extract mfr_id from the list and I used
Code:
grep mfr_id | sed -rn 's@</?mfr_id>@@gp'
to extract the data/ids which I later then sorted and removed duplicates for my actual analysis.

Today, I am looking to extract <mfr_id> and <name> from <manufacturer_data>

Issues I am having.
- sed is extracting all instances of <name>

So I need to
- tell sed to "hold" data between <manufactuer_data> tags and do pattern search to strip <mfr_id> and <name> tags and print them into columns.

This is a little above league. Can some one help me out?
 
Old 01-20-2011, 06:57 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Quote:
Originally Posted by bcrawl View Post
Hey guys,

I have a huge xml file like this...
Code:
<manufacturers>

<manufacturer_data>
<action>UPDATE</action>
<mfr_id>6515951</mfr_id>
<local_content>0</local_content>
<name>Johnsonville Sausage, Llc</name>
</manufacturer_data>

<manufacturer_data>
<action>INSERT</action>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Foodmark</name>
</manufacturer_data>

</manufacturers>

<brands>

<brand_data>
<action>INSERT</action>
<brand_id>6594088</brand_id>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Good Food Made Simple</name>
</brand_data>

<brand_data>
<action>INSERT</action>
<brand_id>6523125</brand_id>
<mfr_id>105873</mfr_id>
<local_content>0</local_content>
<name>Hawaiian(Tm) Kettle Style Potato Chips</name>
</brand_data>
<brand_data>
</brands>
Yesterday I asked for assistance to extract mfr_id from the list and I used
Code:
grep mfr_id | sed -rn 's@</?mfr_id>@@gp'
to extract the data/ids which I later then sorted and removed duplicates for my actual analysis.

Today, I am looking to extract <mfr_id> and <name> from <manufacturer_data>

Issues I am having.
- sed is extracting all instances of <name>

So I need to
- tell sed to "hold" data between <manufactuer_data> tags and do pattern search to strip <mfr_id> and <name> tags and print them into columns.

This is a little above league. Can some one help me out?

I'm sure this can be done w/ sed, but I'd use awk for this one:
Code:
awk '/<manufacturers>/,/<\/manufacturers>/{if($0~/<name>/){print gensub(/.*>([^<]+)<.*/,"\\1","1")}}' hooga.xml
Johnsonville Sausage, Llc
Foodmark
Btw, the grep statement in your solution above was superfluous.


Cheers,
Tink

Last edited by Tinkster; 01-20-2011 at 07:04 PM. Reason: [italics]
 
1 members found this post helpful.
Old 01-21-2011, 01:34 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,508

Rep: Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890
The sed looks kinda the same:
Code:
sed -rn '/<manufacturers>/,/<\/manufacturers>/s@</?name>@@pg' file
 
1 members found this post helpful.
Old 01-24-2011, 01:57 PM   #4
bcrawl
LQ Newbie
 
Registered: Jan 2011
Posts: 11

Original Poster
Rep: Reputation: 0
Thanks guys, both commands worked. I thought I replied to this thread but now when I was cross checking the thread I realized my response never got posted. I deeply apologize. I used awk example in this case.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 11:54 AM
[SOLVED] Is *libqt4-xml* equivalent of *libxml2* with respect to *XML* parsing ? TheIndependentAquarius Linux - Software 1 11-25-2010 07:34 PM
Xml parsing in linux using sed and awk richiep Linux - Newbie 9 09-29-2010 03:16 PM
Parsing XML file sneha hendre Linux - Newbie 2 09-15-2008 10:55 PM
XML parsing in C irfanhab Programming 3 05-06-2006 12:47 AM


All times are GMT -5. The time now is 08:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration