LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Parse XML tags with attribute with SED (https://www.linuxquestions.org/questions/programming-9/parse-xml-tags-with-attribute-with-sed-938992/)

adamzuber 04-09-2012 09:31 PM

Parse XML tags with attribute with SED
 
Hi people, I've been searching all over the net for the correct regex/command, but cant find anything..

file.xml
Code:

<koko value="92029">
<arnab>5</arnab>
<kambing>3</kambing>
</xoxo>
<xoxo value="13245">
<kambing>2</kambing>
<kambing>3</kambing>
</xoxo>
<popo value="12345">
<kambing>2</kambing>
<kambing>3</kambing>
</popo>

I would like to extract only a spesific tag, including its tag attribute. Eg:
Code:

<xoxo value="13245">
<kambing>2</kambing>
<kambing>3</kambing>
</xoxo>

I tried egrep before with this regex, but nothing came up.
Code:

egrep '<xoxo[\s\S]*?/xoxo>' file.xml
Currently Im working for a one line solution like sed or egrep. I heard awk could do too but regex is too difficult for me to understand. Any hint on the solution is pretty much appreciated.

Thanks,
Adam.

jhwilliams 04-09-2012 10:01 PM

Awk would be a better next step, but I propose to use the xpath command. It's a wrapper around Perl's XML parser.

It assumes, though, that your document is valid XML. E.g.:

Code:

<?xml version="1.0"?>
<root>
  <koko value="92029">
    <arnab>5</arnab>
    <kambing>3</kambing>
  </koko>
  <xoxo value="13245">
    <kambing>2</kambing>
    <kambing>3</kambing>
  </xoxo>
  <popo value="12345">
    <kambing>2</kambing>
    <kambing>3</kambing>
  </popo>
</root>

Then:

Code:

jameson@yellow:~$ xpath -q -e '/root/xoxo' input.xml
<xoxo value="13245">
  <kambing>2</kambing>
  <kambing>3</kambing>
</xoxo>


adamzuber 04-09-2012 10:22 PM

Hi jhwilliams, thanks for the reply. I manage to get what I want from the xpath, but, what if it is not a valid xml file? This is because I have few data that i have to merge.

Example:
Code:

cat file1.xml file2.xml > file3.xml
Assuming file1.xml and file2.xml is a valid xml file, now that i have file3.xml, which is not a valid xml file. I have tested with the 'xpath' but it breaks because of the invalid xml file and path. Any workaround?

adamzuber 04-09-2012 10:36 PM

Solved using sed.

Code:

sed '/<xoxo/,/<\/xoxo>/!d' notvalidxml.xml
Many thanks,
Adam.

firstfire 04-09-2012 10:37 PM

Removed


All times are GMT -5. The time now is 05:14 AM.