LinuxQuestions.org - [SOLVED] xml parsing using sed?

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - xml parsing using sed? (https://www.linuxquestions.org/questions/linux-newbie-8/xml-parsing-using-sed-857623/)

xml parsing using sed?

Hey guys,

I have a huge xml file like this...

Code:

<manufacturers>



<manufacturer_data>

<action>UPDATE</action>

<mfr_id>6515951</mfr_id>

<local_content>0</local_content>

<name>Johnsonville Sausage, Llc</name>

</manufacturer_data>



<manufacturer_data>

<action>INSERT</action>

<mfr_id>6594084</mfr_id>

<local_content>0</local_content>

<name>Foodmark</name>

</manufacturer_data>



</manufacturers>



<brands>



<brand_data>

<action>INSERT</action>

<brand_id>6594088</brand_id>

<mfr_id>6594084</mfr_id>

<local_content>0</local_content>

<name>Good Food Made Simple</name>

</brand_data>



<brand_data>

<action>INSERT</action>

<brand_id>6523125</brand_id>

<mfr_id>105873</mfr_id>

<local_content>0</local_content>

<name>Hawaiian(Tm) Kettle Style Potato Chips</name>

</brand_data>

<brand_data>

</brands>

Yesterday I asked for assistance to extract mfr_id from the list and I used

Code:

grep mfr_id | sed -rn 's@</?mfr_id>@@gp'

to extract the data/ids which I later then sorted and removed duplicates for my actual analysis.

Today, I am looking to extract <mfr_id> and <name> from <manufacturer_data>

Issues I am having.
- sed is extracting all instances of <name>

So I need to
- tell sed to "hold" data between <manufactuer_data> tags and do pattern search to strip <mfr_id> and <name> tags and print them into columns.

This is a little above league. Can some one help me out?

Quote:

Originally Posted by bcrawl (Post 4232084)

Hey guys,

I have a huge xml file like this...

Code:

<manufacturers>



<manufacturer_data>

<action>UPDATE</action>

<mfr_id>6515951</mfr_id>

<local_content>0</local_content>

<name>Johnsonville Sausage, Llc</name>

</manufacturer_data>



<manufacturer_data>

<action>INSERT</action>

<mfr_id>6594084</mfr_id>

<local_content>0</local_content>

<name>Foodmark</name>

</manufacturer_data>



</manufacturers>



<brands>



<brand_data>

<action>INSERT</action>

<brand_id>6594088</brand_id>

<mfr_id>6594084</mfr_id>

<local_content>0</local_content>

<name>Good Food Made Simple</name>

</brand_data>



<brand_data>

<action>INSERT</action>

<brand_id>6523125</brand_id>

<mfr_id>105873</mfr_id>

<local_content>0</local_content>

<name>Hawaiian(Tm) Kettle Style Potato Chips</name>

</brand_data>

<brand_data>

</brands>

Yesterday I asked for assistance to extract mfr_id from the list and I used

Code:

grep mfr_id | sed -rn 's@</?mfr_id>@@gp'

I'm sure this can be done w/ sed, but I'd use awk for this one:

Code:

awk '/<manufacturers>/,/<\/manufacturers>/{if($0~/<name>/){print gensub(/.*>([^<]+)<.*/,"\\1","1")}}' hooga.xml

Johnsonville Sausage, Llc

Foodmark

Btw, the grep statement in your solution above was superfluous.

Cheers,
Tink

The sed looks kinda the same:

Code:

sed -rn '/<manufacturers>/,/<\/manufacturers>/s@</?name>@@pg' file

Thanks guys, both commands worked. I thought I replied to this thread but now when I was cross checking the thread I realized my response never got posted. I deeply apologize. I used awk example in this case.