Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 02-11-2007, 12:43 AM   #1
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: win 10 | OpenBSD 6.1
Posts: 225

Rep: Reputation: 59
[Grep,Awk,Sed]Parsing text between XML tags.

Hello, I have a little problem with my bash script, I need to get all text between those blue XML tags with awk,sed or grep.

<com_create_instance inprocserver32="&#x25;SystemRoot&#x25;\system32\shdocvw.dll" interfaceid="&#x7B;000214E6-0000-0000-C000-000000000046&#x7D;"/>
<com_get_class_object inprocserver32="C:\WINDOWS\system32\urlmon.dll" interfaceid="&#x7B;00000001-0000-0000-C000-000000000046&#x7D;"/>

<load_dll dll="c:\910ac0d71833d902c1a824c0335761eb.exe" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\ntdll.dll" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\kernel32.dll" successful="1"/>
<load_dll dll="VERSION.dll" successful="1"/>
<snip>1-50 lines removed to make it easier to read.</snip>

<delete_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<copy_file filetype="File" srcfile="c:\910ac0d71833d902c1a824c0335761eb.exe" dstfile="C:\WINDOWS\system32\algs.exe" creationdistribution="CREATE_ALWAYS" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_attributes filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="FILE_ATTRIBUTE_HIDDEN,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<open_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" creationdistribution="OPEN_EXISTING" desiredaccess="FILE_ANY_ACCESS" shareaccess="SHARE_WRITE" flags="FILE_ATTRIBUTE_NORMAL,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_time filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<delete_file filetype="File" srcfile="bonbw.bat" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<snip>1-50 lines removed to make it easier to read.</snip>

<create_mutex name="dcf7d2f7071938ba83b50c70eedd5ceb8984" owned="0"/>
<create_mutex name="CTF.LBES.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Compart.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Asm.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Layouts.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TMD.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TimListCache.FMPDefaultS-1-5-21-1645522239-706699826-839522115-1003MUTEX.DefaultS-1-5-21-16455222" owned="0"/>
<create_mutex name="ZonesCounterMutex" owned="0"/>
<create_mutex name="ZonesCacheCounterMutex" owned="0"/>
<create_mutex name="ZonesLockedCacheCounterMutex" owned="0"/>
I have tried several different approaches but I can only get text between tags if those tags are in the same line like this:
But if those tags are in different lines I do not know how to make it work:
Any hints ?


Old 02-11-2007, 01:24 AM   #2
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: win 10 | OpenBSD 6.1
Posts: 225

Original Poster
Rep: Reputation: 59
Sorry, I found solution for this one
sed -n '/<com_section>/,/<\/com_section>/p'
Old 02-15-2007, 03:44 PM   #3
Registered: Sep 2005
Location: Nummela, Southern Finland
Distribution: Fedora Core 4, Fedora Core 5, Redhat 9.0, Solaris 10, WInXP pro, W2k pro
Posts: 30

Rep: Reputation: 15
Oh dear...

You, my friend, just solved one of my biggest problem. I had a task to find RewriteRules from very complex Apache config (huge amount virtualhosts, but few needed, with gigantic rewriterule sets within).

I really do appreciate that you send this information about solving your problem...
It is not common (what a pity) that people share their solutions.

Thank you VERY MUCH!
Old 01-20-2009, 02:49 AM   #4
LQ Newbie
Registered: Jan 2009
Distribution: Red hat
Posts: 1

Rep: Reputation: 0
I had data where the open and close tags could be on multiple lines or on the same line but no nested tags. This solution using awk worked

awk -F'[<|>]' '/Testcase/{print $3}

Last edited by sxjthefirst; 01-20-2009 at 02:50 AM.
Old 07-26-2011, 11:38 AM   #5
LQ Newbie
Registered: Jul 2011
Posts: 1

Rep: Reputation: Disabled

i have a similar problem, where i want to check the data between the 2 tags <filesystem_section> </filesystem_section> and check the value: if delete_file filetype="File" then replace the values of
copy_file filetype="New_File1" and
set_file_attributes filetype="New_File2" . <filesystem_section> </filesystem_section> tags exist mutiple times. how we can we do this?

Old 07-26-2011, 11:54 AM   #6
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,397
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Neither sed nor grep alone or in combination are up to the task. Awk might give you a fighting chance, but is not ideal.

Use Perl and one of the mature XML parsers written for it. Search CPAN for details. Don't try to re-invent that particular wheel unless you're convinced that you can improve upon it (and since you're asking the question here, that seems unlikely).

--- rod.


parsing, text, xml

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
awk question - parsing xml file epoo Programming 7 01-24-2007 02:13 PM
Sed or Awk question, looking for parsing help rwartell Linux - Software 2 05-17-2006 11:59 PM
Sed or Awk question, looking for parsing help rwartell Programming 1 05-17-2006 04:42 PM
Parsing XML tags with php, can't get attributes of a tag jimieee Programming 1 05-05-2004 10:32 AM

All times are GMT -5. The time now is 01:21 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration