LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   [Grep,Awk,Sed]Parsing text between XML tags. (http://www.linuxquestions.org/questions/programming-9/%5Bgrep-awk-sed%5Dparsing-text-between-xml-tags-527480/)

////// 02-11-2007 12:43 AM

[Grep,Awk,Sed]Parsing text between XML tags.
 
Hello, I have a little problem with my bash script, I need to get all text between those blue XML tags with awk,sed or grep.

Code:

<com_section>
<com_create_instance inprocserver32="&#x25;SystemRoot&#x25;\system32\shdocvw.dll" interfaceid="&#x7B;000214E6-0000-0000-C000-000000000046&#x7D;"/>
<com_get_class_object inprocserver32="C:\WINDOWS\system32\urlmon.dll" interfaceid="&#x7B;00000001-0000-0000-C000-000000000046&#x7D;"/>
</com_section>

<dll_handling_section>
<load_dll dll="c:\910ac0d71833d902c1a824c0335761eb.exe" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\ntdll.dll" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\kernel32.dll" successful="1"/>
<load_dll dll="VERSION.dll" successful="1"/>
<snip>1-50 lines removed to make it easier to read.</snip>
</dll_handling_section>

<filesystem_section>
<delete_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<copy_file filetype="File" srcfile="c:\910ac0d71833d902c1a824c0335761eb.exe" dstfile="C:\WINDOWS\system32\algs.exe" creationdistribution="CREATE_ALWAYS" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_attributes filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="FILE_ATTRIBUTE_HIDDEN,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<open_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" creationdistribution="OPEN_EXISTING" desiredaccess="FILE_ANY_ACCESS" shareaccess="SHARE_WRITE" flags="FILE_ATTRIBUTE_NORMAL,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_time filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<delete_file filetype="File" srcfile="bonbw.bat" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<snip>1-50 lines removed to make it easier to read.</snip>
</filesystem_section>

<mutex_section>
<create_mutex name="dcf7d2f7071938ba83b50c70eedd5ceb8984" owned="0"/>
<create_mutex name="CTF.LBES.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Compart.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Asm.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Layouts.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TMD.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TimListCache.FMPDefaultS-1-5-21-1645522239-706699826-839522115-1003MUTEX.DefaultS-1-5-21-16455222" owned="0"/>
<create_mutex name="ZonesCounterMutex" owned="0"/>
<create_mutex name="ZonesCacheCounterMutex" owned="0"/>
<create_mutex name="ZonesLockedCacheCounterMutex" owned="0"/>
</mutex_section>

I have tried several different approaches but I can only get text between tags if those tags are in the same line like this:
Code:

<com_section>foo</com_section>
But if those tags are in different lines I do not know how to make it work:
Code:

<com_section>
foo
</com_section>

Any hints ?

TIA,

///////

////// 02-11-2007 01:24 AM

Ooops.
Sorry, I found solution for this one :o
Code:

sed -n '/<com_section>/,/<\/com_section>/p'

iuaui 02-15-2007 03:44 PM

Oh dear...

You, my friend, just solved one of my biggest problem. I had a task to find RewriteRules from very complex Apache config (huge amount virtualhosts, but few needed, with gigantic rewriterule sets within).

I really do appreciate that you send this information about solving your problem...
It is not common (what a pity) that people share their solutions.

Thank you VERY MUCH!

sxjthefirst 01-20-2009 02:49 AM

I had data where the open and close tags could be on multiple lines or on the same line but no nested tags. This solution using awk worked

awk -F'[<|>]' '/Testcase/{print $3}

keshav.gp@gmail.com 07-26-2011 11:38 AM

Hello,

i have a similar problem, where i want to check the data between the 2 tags <filesystem_section> </filesystem_section> and check the value: if delete_file filetype="File" then replace the values of
copy_file filetype="New_File1" and
set_file_attributes filetype="New_File2" . <filesystem_section> </filesystem_section> tags exist mutiple times. how we can we do this?

Thanks

theNbomr 07-26-2011 11:54 AM

Neither sed nor grep alone or in combination are up to the task. Awk might give you a fighting chance, but is not ideal.

Use Perl and one of the mature XML parsers written for it. Search CPAN for details. Don't try to re-invent that particular wheel unless you're convinced that you can improve upon it (and since you're asking the question here, that seems unlikely).

--- rod.


All times are GMT -5. The time now is 02:54 PM.