LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-20-2011, 03:53 PM   #1
bcrawl
LQ Newbie
 
Registered: Jan 2011
Posts: 11

Rep: Reputation: 0
xml parsing using sed?


Hey guys,

I have a huge xml file like this...
Code:
<manufacturers>

<manufacturer_data>
<action>UPDATE</action>
<mfr_id>6515951</mfr_id>
<local_content>0</local_content>
<name>Johnsonville Sausage, Llc</name>
</manufacturer_data>

<manufacturer_data>
<action>INSERT</action>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Foodmark</name>
</manufacturer_data>

</manufacturers>

<brands>

<brand_data>
<action>INSERT</action>
<brand_id>6594088</brand_id>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Good Food Made Simple</name>
</brand_data>

<brand_data>
<action>INSERT</action>
<brand_id>6523125</brand_id>
<mfr_id>105873</mfr_id>
<local_content>0</local_content>
<name>Hawaiian(Tm) Kettle Style Potato Chips</name>
</brand_data>
<brand_data>
</brands>
Yesterday I asked for assistance to extract mfr_id from the list and I used
Code:
grep mfr_id | sed -rn 's@</?mfr_id>@@gp'
to extract the data/ids which I later then sorted and removed duplicates for my actual analysis.

Today, I am looking to extract <mfr_id> and <name> from <manufacturer_data>

Issues I am having.
- sed is extracting all instances of <name>

So I need to
- tell sed to "hold" data between <manufactuer_data> tags and do pattern search to strip <mfr_id> and <name> tags and print them into columns.

This is a little above league. Can some one help me out?
 
Old 01-20-2011, 07:57 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Quote:
Originally Posted by bcrawl View Post
Hey guys,

I have a huge xml file like this...
Code:
<manufacturers>

<manufacturer_data>
<action>UPDATE</action>
<mfr_id>6515951</mfr_id>
<local_content>0</local_content>
<name>Johnsonville Sausage, Llc</name>
</manufacturer_data>

<manufacturer_data>
<action>INSERT</action>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Foodmark</name>
</manufacturer_data>

</manufacturers>

<brands>

<brand_data>
<action>INSERT</action>
<brand_id>6594088</brand_id>
<mfr_id>6594084</mfr_id>
<local_content>0</local_content>
<name>Good Food Made Simple</name>
</brand_data>

<brand_data>
<action>INSERT</action>
<brand_id>6523125</brand_id>
<mfr_id>105873</mfr_id>
<local_content>0</local_content>
<name>Hawaiian(Tm) Kettle Style Potato Chips</name>
</brand_data>
<brand_data>
</brands>
Yesterday I asked for assistance to extract mfr_id from the list and I used
Code:
grep mfr_id | sed -rn 's@</?mfr_id>@@gp'
to extract the data/ids which I later then sorted and removed duplicates for my actual analysis.

Today, I am looking to extract <mfr_id> and <name> from <manufacturer_data>

Issues I am having.
- sed is extracting all instances of <name>

So I need to
- tell sed to "hold" data between <manufactuer_data> tags and do pattern search to strip <mfr_id> and <name> tags and print them into columns.

This is a little above league. Can some one help me out?

I'm sure this can be done w/ sed, but I'd use awk for this one:
Code:
awk '/<manufacturers>/,/<\/manufacturers>/{if($0~/<name>/){print gensub(/.*>([^<]+)<.*/,"\\1","1")}}' hooga.xml
Johnsonville Sausage, Llc
Foodmark
Btw, the grep statement in your solution above was superfluous.


Cheers,
Tink

Last edited by Tinkster; 01-20-2011 at 08:04 PM. Reason: [italics]
 
1 members found this post helpful.
Old 01-21-2011, 02:34 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
The sed looks kinda the same:
Code:
sed -rn '/<manufacturers>/,/<\/manufacturers>/s@</?name>@@pg' file
 
1 members found this post helpful.
Old 01-24-2011, 02:57 PM   #4
bcrawl
LQ Newbie
 
Registered: Jan 2011
Posts: 11

Original Poster
Rep: Reputation: 0
Thanks guys, both commands worked. I thought I replied to this thread but now when I was cross checking the thread I realized my response never got posted. I deeply apologize. I used awk example in this case.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 12:54 PM
[SOLVED] Is *libqt4-xml* equivalent of *libxml2* with respect to *XML* parsing ? TheIndependentAquarius Linux - Software 1 11-25-2010 08:34 PM
Xml parsing in linux using sed and awk richiep Linux - Newbie 9 09-29-2010 04:16 PM
Parsing XML file sneha hendre Linux - Newbie 2 09-15-2008 11:55 PM
XML parsing in C irfanhab Programming 3 05-06-2006 01:47 AM


All times are GMT -5. The time now is 09:18 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration