LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-11-2007, 01:43 AM   #1
//////
Member
 
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: Dual boot :: Slackware 14.1 64bit multilib | Kali linux 64bit multi lib
Posts: 192

Rep: Reputation: 42
[Grep,Awk,Sed]Parsing text between XML tags.


Hello, I have a little problem with my bash script, I need to get all text between those blue XML tags with awk,sed or grep.

Code:
<com_section>
<com_create_instance inprocserver32="&#x25;SystemRoot&#x25;\system32\shdocvw.dll" interfaceid="&#x7B;000214E6-0000-0000-C000-000000000046&#x7D;"/>
<com_get_class_object inprocserver32="C:\WINDOWS\system32\urlmon.dll" interfaceid="&#x7B;00000001-0000-0000-C000-000000000046&#x7D;"/>
</com_section>

<dll_handling_section>
<load_dll dll="c:\910ac0d71833d902c1a824c0335761eb.exe" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\ntdll.dll" successful="1"/>
<load_dll dll="C:\WINDOWS\system32\kernel32.dll" successful="1"/>
<load_dll dll="VERSION.dll" successful="1"/>
<snip>1-50 lines removed to make it easier to read.</snip>
</dll_handling_section>

<filesystem_section>
<delete_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<copy_file filetype="File" srcfile="c:\910ac0d71833d902c1a824c0335761eb.exe" dstfile="C:\WINDOWS\system32\algs.exe" creationdistribution="CREATE_ALWAYS" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_attributes filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="FILE_ATTRIBUTE_HIDDEN,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<open_file filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" creationdistribution="OPEN_EXISTING" desiredaccess="FILE_ANY_ACCESS" shareaccess="SHARE_WRITE" flags="FILE_ATTRIBUTE_NORMAL,SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<set_file_time filetype="File" srcfile="C:\WINDOWS\system32\algs.exe" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<delete_file filetype="File" srcfile="bonbw.bat" desiredaccess="FILE_ANY_ACCESS" flags="SECURITY_ANONYMOUS" fileinformationclass="FileBasicInformation"/>
<snip>1-50 lines removed to make it easier to read.</snip>
</filesystem_section>

<mutex_section>
<create_mutex name="dcf7d2f7071938ba83b50c70eedd5ceb8984" owned="0"/>
<create_mutex name="CTF.LBES.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Compart.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Asm.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.Layouts.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TMD.MutexDefaultS-1-5-21-1645522239-706699826-839522115-1003" owned="0"/>
<create_mutex name="CTF.TimListCache.FMPDefaultS-1-5-21-1645522239-706699826-839522115-1003MUTEX.DefaultS-1-5-21-16455222" owned="0"/>
<create_mutex name="ZonesCounterMutex" owned="0"/>
<create_mutex name="ZonesCacheCounterMutex" owned="0"/>
<create_mutex name="ZonesLockedCacheCounterMutex" owned="0"/>
</mutex_section>
I have tried several different approaches but I can only get text between tags if those tags are in the same line like this:
Code:
<com_section>foo</com_section>
But if those tags are in different lines I do not know how to make it work:
Code:
<com_section>
foo
</com_section>
Any hints ?

TIA,

///////
 
Old 02-11-2007, 02:24 AM   #2
//////
Member
 
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: Dual boot :: Slackware 14.1 64bit multilib | Kali linux 64bit multi lib
Posts: 192

Original Poster
Rep: Reputation: 42
Ooops.
Sorry, I found solution for this one
Code:
sed -n '/<com_section>/,/<\/com_section>/p'
 
Old 02-15-2007, 04:44 PM   #3
iuaui
Member
 
Registered: Sep 2005
Location: Nummela, Southern Finland
Distribution: Fedora Core 4, Fedora Core 5, Redhat 9.0, Solaris 10, WInXP pro, W2k pro
Posts: 30

Rep: Reputation: 15
Oh dear...

You, my friend, just solved one of my biggest problem. I had a task to find RewriteRules from very complex Apache config (huge amount virtualhosts, but few needed, with gigantic rewriterule sets within).

I really do appreciate that you send this information about solving your problem...
It is not common (what a pity) that people share their solutions.

Thank you VERY MUCH!
 
Old 01-20-2009, 03:49 AM   #4
sxjthefirst
LQ Newbie
 
Registered: Jan 2009
Distribution: Red hat
Posts: 1

Rep: Reputation: 0
I had data where the open and close tags could be on multiple lines or on the same line but no nested tags. This solution using awk worked

awk -F'[<|>]' '/Testcase/{print $3}

Last edited by sxjthefirst; 01-20-2009 at 03:50 AM.
 
Old 07-26-2011, 12:38 PM   #5
keshav.gp@gmail.com
LQ Newbie
 
Registered: Jul 2011
Posts: 1

Rep: Reputation: Disabled
Hello,

i have a similar problem, where i want to check the data between the 2 tags <filesystem_section> </filesystem_section> and check the value: if delete_file filetype="File" then replace the values of
copy_file filetype="New_File1" and
set_file_attributes filetype="New_File2" . <filesystem_section> </filesystem_section> tags exist mutiple times. how we can we do this?

Thanks
 
Old 07-26-2011, 12:54 PM   #6
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Neither sed nor grep alone or in combination are up to the task. Awk might give you a fighting chance, but is not ideal.

Use Perl and one of the mature XML parsers written for it. Search CPAN for details. Don't try to re-invent that particular wheel unless you're convinced that you can improve upon it (and since you're asking the question here, that seems unlikely).

--- rod.
 
  


Reply

Tags
parsing, text, xml


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
awk question - parsing xml file epoo Programming 7 01-24-2007 03:13 PM
Sed or Awk question, looking for parsing help rwartell Linux - Software 2 05-18-2006 12:59 AM
Sed or Awk question, looking for parsing help rwartell Programming 1 05-17-2006 05:42 PM
Parsing XML tags with php, can't get attributes of a tag jimieee Programming 1 05-05-2004 11:32 AM


All times are GMT -5. The time now is 08:05 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration