-   Programming (
-   -   [Grep,Awk,Sed]Parsing text between XML tags. (

////// 02-11-2007 12:43 AM

[Grep,Awk,Sed]Parsing text between XML tags.
Hello, I have a little problem with my bash script, I need to get all text between those blue XML tags with awk,sed or grep.


<com_create_instance inprocserver32="&#x25;SystemRoot&#x25;\system32\shdocvw.dll" interfaceid="&#x7B;000214E6-0000-0000-C000-000000000046&#x7D;"/>
<com_get_class_object inprocserver32="C:\WINDOWS\system32\urlmon.dll" interfaceid="&#x7B;00000001-0000-0000-C000-000000000046&#x7D;"/>

<load_dll dll="c:\910ac0d71833d902c1a824c0335761eb.exe" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\ntdll.dll" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\kernel32.dll" successful="1"/>
<load_dll dll="VERSION.dll" successful="1"/>
<snip>1-50 lines removed to make it easier to read.</snip>

<delete_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<copy_file filetype="File" srcfile="c:\910ac0d71833d902c1a824c0335761eb.exe" dstfile="C:\WINDOWS\system32\algs.exe" creationdistribution="CREATE_ALWAYS" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_attributes filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="FILE_ATTRIBUTE_HIDDEN,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<open_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" creationdistribution="OPEN_EXISTING" desiredaccess="FILE_ANY_ACCESS" shareaccess="SHARE_WRITE" flags="FILE_ATTRIBUTE_NORMAL,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_time filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<delete_file filetype="File" srcfile="bonbw.bat" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<snip>1-50 lines removed to make it easier to read.</snip>

<create_mutex name="dcf7d2f7071938ba83b50c70eedd5ceb8984" owned="0"/>
<create_mutex name="CTF.LBES.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Compart.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Asm.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Layouts.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TMD.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TimListCache.FMPDefaultS-1-5-21-1645522239-706699826-839522115-1003MUTEX.DefaultS-1-5-21-16455222" owned="0"/>
<create_mutex name="ZonesCounterMutex" owned="0"/>
<create_mutex name="ZonesCacheCounterMutex" owned="0"/>
<create_mutex name="ZonesLockedCacheCounterMutex" owned="0"/>

I have tried several different approaches but I can only get text between tags if those tags are in the same line like this:

But if those tags are in different lines I do not know how to make it work:


Any hints ?



////// 02-11-2007 01:24 AM

Sorry, I found solution for this one :o

sed -n '/<com_section>/,/<\/com_section>/p'

iuaui 02-15-2007 03:44 PM

Oh dear...

You, my friend, just solved one of my biggest problem. I had a task to find RewriteRules from very complex Apache config (huge amount virtualhosts, but few needed, with gigantic rewriterule sets within).

I really do appreciate that you send this information about solving your problem...
It is not common (what a pity) that people share their solutions.

Thank you VERY MUCH!

sxjthefirst 01-20-2009 02:49 AM

I had data where the open and close tags could be on multiple lines or on the same line but no nested tags. This solution using awk worked

awk -F'[<|>]' '/Testcase/{print $3} 07-26-2011 11:38 AM


i have a similar problem, where i want to check the data between the 2 tags <filesystem_section> </filesystem_section> and check the value: if delete_file filetype="File" then replace the values of
copy_file filetype="New_File1" and
set_file_attributes filetype="New_File2" . <filesystem_section> </filesystem_section> tags exist mutiple times. how we can we do this?


theNbomr 07-26-2011 11:54 AM

Neither sed nor grep alone or in combination are up to the task. Awk might give you a fighting chance, but is not ideal.

Use Perl and one of the mature XML parsers written for it. Search CPAN for details. Don't try to re-invent that particular wheel unless you're convinced that you can improve upon it (and since you're asking the question here, that seems unlikely).

--- rod.

All times are GMT -5. The time now is 02:10 AM.