LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-30-2010, 06:57 AM   #1
worm5252
Member
 
Registered: Oct 2004
Location: Atlanta
Distribution: CentOS, RHEL, HP-UX, OS X
Posts: 567

Rep: Reputation: 57
BASH: Parse XML


Hey guys,
I am working on a script to list out attributes of an XML file and then add a new IP to the list of nodes on the load balancer. I have a script that queries the Load Balancer to get it's information. It returns me the following xml file.

Code:
<?xml version="1.0"?>
<gogrid>
  <response method="/grid/loadbalancer/list" status="success">
    <summary total="1" start="0" numpages="0" returned="1"/>
    <list>
      <object name="loadbalancer">
        <attribute name="id">6577</attribute>
        <attribute name="name">Web Server LB</attribute>
        <attribute name="description"/>
        <attribute name="virtualip">
          <object name="ipportpair">
            <attribute name="ip">
              <object name="ip">
                <attribute name="id">1654289</attribute>
                <attribute name="ip">173.204.16.196</attribute>
                <attribute name="subnet">173.204.16.192/255.255.255.240</attribute>
                <attribute name="public">true</attribute>
              </object>
            </attribute>
            <attribute name="port">80</attribute>
          </object>
        </attribute>
        <attribute name="type">
          <object name="option">
            <attribute name="id">1</attribute>
            <attribute name="name">Round Robin</attribute>
            <attribute name="description"/>
          </object>
        </attribute>
        <attribute name="persistence">
          <object name="option">
            <attribute name="id">1</attribute>
            <attribute name="name">None</attribute>
            <attribute name="description"/>
          </object>
        </attribute>
        <attribute name="realiplist">
          <list>
            <object name="ipportpair">
              <attribute name="ip">
                <object name="ip">
                  <attribute name="id">1654287</attribute>
                  <attribute name="ip">173.204.16.194</attribute>
                  <attribute name="subnet">173.204.16.192/255.255.255.240</attribute>
                  <attribute name="public">true</attribute>
                </object>
              </attribute>
              <attribute name="port">80</attribute>
            </object>
            <object name="ipportpair">
              <attribute name="ip">
                <object name="ip">
                  <attribute name="id">1654288</attribute>
                  <attribute name="ip">173.204.16.195</attribute>
                  <attribute name="subnet">173.204.16.192/255.255.255.240</attribute>
                  <attribute name="public">true</attribute>
                </object>
              </attribute>
              <attribute name="port">80</attribute>
            </object>
          </list>
        </attribute>
        <attribute name="os">
          <object name="option">
            <attribute name="id">1</attribute>
            <attribute name="name">F5</attribute>
            <attribute name="description">The F5 Load Balancer.</attribute>
          </object>
        </attribute>
        <attribute name="state">
          <object name="option">
            <attribute name="id">1</attribute>
            <attribute name="name">On</attribute>
            <attribute name="description">Loadbalancer is enabled and on.</attribute>
          </object>
        </attribute>
      </object>
    </list>
  </response>
</gogrid>
I need to figure out how to pull the Load Balancer ID# and the Load Balancer Name. Then I need to Pull the IP address and port number of ever single node on the load balancer. I do not need the IP and port for the load balancer itself. So far I am having trouble figuring out how to do it since everything uses an attribute tag.

Once I get this information I can go from there on adding the new IP and port to the list.

Any help is appriciated
 
Old 03-30-2010, 07:49 AM   #2
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
Do you have an option of using perl ?
 
Old 03-30-2010, 09:02 AM   #3
worm5252
Member
 
Registered: Oct 2004
Location: Atlanta
Distribution: CentOS, RHEL, HP-UX, OS X
Posts: 567

Original Poster
Rep: Reputation: 57
I prefer bash because it is the only scripting language I know. However this API I am using supports Java, PHP, Python, Perl, Ruby, C#, and bash.

The API is a REST system and I am doing a request for info of the load balancer. In the end I need to be able to iterate through all the IP addesses and ports of the nodes on the load balancer so I can add a new one. That is simple as generating the URL and using "curl -f" with the URL. However with a REST system I need to be able to list the Load Balancer ID, all the IPs and Port numbers as well as my authentication info within the URL. So that brings me back to why I am parsing this info to begin with. Crappy API in my opinion, but it is what I have to work with.
 
Old 03-30-2010, 09:03 AM   #4
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
Well if this is bash
this thing worked for me, may be you have to modify a bit according to your need
Code:
echo `grep -A2 name=\"loadbalancer test.xml` | sed -e 's/.*name="id">\([0-9]\+\)\?.*name="name">\([-a-zA-Z0-9 ._]\+\)\?<.*/LB_ID=\1 LB_NAME=\2/'
With perl, you have lots of options, It has got good XML parsing modules

Last edited by PMP; 03-30-2010 at 09:05 AM.
 
Old 03-30-2010, 09:11 AM   #5
worm5252
Member
 
Registered: Oct 2004
Location: Atlanta
Distribution: CentOS, RHEL, HP-UX, OS X
Posts: 567

Original Poster
Rep: Reputation: 57
Thanks
 
Old 03-30-2010, 09:13 AM   #6
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
If this worked for you please mark this thread as solved
 
Old 03-30-2010, 09:46 AM   #7
worm5252
Member
 
Registered: Oct 2004
Location: Atlanta
Distribution: CentOS, RHEL, HP-UX, OS X
Posts: 567

Original Poster
Rep: Reputation: 57
I ran it and it seems to just hang. It is probably how I modified your command. I have the xml file located at /tmp/ggapi.list_load_balancer.xml so I modified the command to read this
Code:
echo `grep -A2 name=\"/tmp/ggapi.list_load_balancer.xml` | sed -e 's/.*name="id">\([0-9]\+\)\?.*name="name">\([-a-zA-Z0-9 ._]\+\)\?<.*/LB_ID=\1 LB_NAME=\2/'
I have a feeling that is not correct.

Last edited by worm5252; 03-30-2010 at 09:47 AM. Reason: correcting case
 
Old 03-30-2010, 12:54 PM   #8
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
echo `grep -A2 name=\"loadbalancer test.xml` | sed -e 's/.*name="id">\([0-9]\+\)\?.*name="name">\([-a-zA-Z0-9 ._]\+\)\?<.*/LB_ID=\1 LB_NAME=\2/'

The test.xml is the name of the file, change this only not name=

The text in magenta is the text to be grepped and in the red is the file name you habe to change the filename to the file along with your path

Last edited by PMP; 03-30-2010 at 12:55 PM.
 
Old 03-30-2010, 02:05 PM   #9
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 115Reputation: 115
Bash is completely and utterly inappropriate for parsing out XML files.
Quote:
You can't realistically parse tag-based markup languages like HTML and XML using Bash or utilities such as grep, sed or cut. If you just want to dump/render HTML, see (links|links2|lynx|w3m) -dump, html2text, vilistextum. For parsing out pieces of data, see tidy+(xmlstarlet|xmlgawk|xpath|xml2), or learn xslt. Ask #xml and #html for more help. See http://www.codinghorror.com/blog/archives/001311.html
- FreeNode #bash factoid
 
1 members found this post helpful.
Old 03-30-2010, 02:15 PM   #10
worm5252
Member
 
Registered: Oct 2004
Location: Atlanta
Distribution: CentOS, RHEL, HP-UX, OS X
Posts: 567

Original Poster
Rep: Reputation: 57
Well my challege is not that I cant parse it so to speak. The problem is the API provides a poorly structured response. You can see the xml above in my first post, instead of making it structured like <LoadBalancerID></LoadBalancerID> they use
Code:
<object name="loadbalancer">
        <attribute name="id">6577</attribute>
Everything ends up being an attribute of <object name="loadbalancer">. So There is not a real way for me to parse it to obtain information on the load balancer members, the load balncer ID, etc. So thats where bash comes into play. Aside from it is the only language I know
 
Old 03-30-2010, 02:17 PM   #11
worm5252
Member
 
Registered: Oct 2004
Location: Atlanta
Distribution: CentOS, RHEL, HP-UX, OS X
Posts: 567

Original Poster
Rep: Reputation: 57
Quote:
Originally Posted by PMP View Post
echo `grep -A2 name=\"loadbalancer test.xml` | sed -e 's/.*name="id">\([0-9]\+\)\?.*name="name">\([-a-zA-Z0-9 ._]\+\)\?<.*/LB_ID=\1 LB_NAME=\2/'

The test.xml is the name of the file, change this only not name=

The text in magenta is the text to be grepped and in the red is the file name you habe to change the filename to the file along with your path
I tried this on my xml. I get the following results
Code:
[root@WS1 scripts]# echo `grep -A2 name=\"loadbalancer /tmp/ggapi.list_load_balancer.xml` | sed -e 's/.*name="id">\([0-9]\+\)\?.*name="name">\([-a-zA-Z0-9 ._]\+\)\?<.*/LB_ID=\1 LB_NAME=\2/'
LB_ID=1 LB_NAME=On
so it works, just doesn't give me desired information
 
Old 03-30-2010, 10:03 PM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,017

Rep: Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196Reputation: 3196
Well I would like a little feedback on how to multidimensional arrays, but my workaround here seems to work with
the input you provided:

Code:
#!/usr/bin/awk -f

BEGIN{
		resp = 0
		load = 0
		ip = 0
		ippair = 0
}

/<response.*loadbalancer/{ resp = 1 }

resp && /<\/response>/{ resp = 0 }

resp && /me="loadbalancer/{ load = 1 }

load && /me="id/{ id = gensub(/.*id">|<\/at.*/, "", "g") }
load && /me="name/{ name = gensub(/.*me">|<\/at.*/, "", "g") }
load && /me="port/{ load = 0; ip = 1 }

ip && /me="ipportpair/{ ippair = 1 }

ippair && /me="id/{ 
		arr_index = gensub(/.*id">|<\/at.*/, "", "g") 
		getline arr[arr_index]
		gsub(/.*ip">|<\/at.*/, "", arr[arr_index])

}

ippair && /me="port/{ 
		arr[arr_index] = arr[arr_index]":"gensub(/.*rt">|<\/at.*/, "", "g") 
		ippair = 0
}

END{
		print "Loadbalancer ID# "id
		print "Loadbalancer name: "name

		for (x in arr){
				print "\nIP ID# "x" has the following address and port:"
				print "\tAddress: "gensub(/:.*/,"","g",arr[x])
				print "\tPort: "gensub(/.*:/,"","g",arr[x])
		}
}
 
Old 03-31-2010, 02:34 AM   #13
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
I will suggest you to to this task with perl ,
have a look at
http://search.cpan.org/~grantm/XML-S.../XML/Simple.pm
 
Old 03-31-2010, 03:38 AM   #14
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by PMP View Post
I will suggest you to to this task with perl ,
have a look at
http://search.cpan.org/~grantm/XML-S.../XML/Simple.pm
Yep.

Or any other full-fledged XML (Perl is still my favorite language) parser.

The point is that XML is not line based while standard UNIX text munging tools are line-oriented, so ad-hoc parser written in them won't be robust.
 
Old 03-31-2010, 03:54 AM   #15
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
crikey you are making life hard for yourself.

as said here:
Quote:
Originally Posted by tuxdev View Post
Bash is completely and utterly inappropriate for parsing out XML files.

- FreeNode #bash factoid


would you service a car with a knife and a pair of pliers?
learn how to use the proper tools.

I would use perl and XML::Twig.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[C++] I just wanna parse a bit of xml.... BeaverusIV Programming 7 12-10-2008 09:31 AM
how to parse the xml string Dyuri Linux - Software 2 09-03-2008 08:49 AM
Parse XML in bash script MikeyCarter Linux - Software 1 02-16-2007 01:19 PM
kopete icq can't parse xml-documents ungua SUSE / openSUSE 2 10-24-2004 10:19 AM
fontconfig expat xml parse PROBLEMS!! OrganicOrange84 Linux - Software 0 08-18-2004 12:35 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration