How To get the data from a tag in XML File

kingmaker2003 · 12-03-2008, 12:46 PM

hi I have a tag in XML file in unix like this

<EmailAddress>abc@gmail.com</EmailAddress>

this tag is there for multiple times in the xml file and the data is in continuous line like below

<State>UN</State><Zip/><CompanyName/><EmailAddress>FDF@gmail.COM</EmailAddress><PromoType>UNKNOWN</PromoType></Promotion></PromotionList<State>UN</State><Zip/><CompanyName/><EmailAddress>zd4946@gmail.com</EmailAddress>

I have to check the data in between bold tags is valid or not ... means have to check whether its a email address or not

and have to find the length of the attribute means tag ...script is in ksh

sorry if its already asked...i checked but i didnt get Exatly matching result for my requirement

any help in this

Telemachos · 12-03-2008, 01:47 PM

You could try to write a regular expression to do this, but parsing xml, html, etc. is notoriously difficult. If ksh is like Bash in terms of syntax, that sounds like trying to sculpt a piece of marble with a spoon. I would recommend looking at a scripting language with XML parsers available (eg, Perl, Python or Ruby).

kingmaker2003 · 12-03-2008, 01:52 PM

Quote:

Originally Posted by Telemachos

You could try to write a regular expression to do this, but parsing xml, html, etc. is notoriously difficult. If ksh is like Bash in terms of syntax, that sounds like trying to sculpt a piece of marble with a spoon. I would recommend looking at a scripting language with XML parsers available (eg, Perl, Python or Ruby).

I think we can get with awk ... I got the answer but works with 1st occurance of the <EmailAddress></EmailAddress> tag only

Code:

awk -F '</?EmailAddress>' '{print $2}' 456.xml

but i need for multiple times .... means email address tag exists for multiple times in the file ...
so need to check whole xml file for email address wherever <EmailAddress></EmailAddress> tag is present.

chrism01 · 12-03-2008, 05:25 PM

Concur with Telemachos

kingmaker2003 · 12-04-2008, 08:33 AM

Quote:

Originally Posted by chrism01

Concur with Telemachos

what that means

chrism01 · 12-04-2008, 05:54 PM

http://dictionary.reference.com/dic?...&search=search

xhypno · 12-04-2008, 07:26 PM

Quote:

Originally Posted by kingmaker2003

I think we can get with awk ... I got the answer but works with 1st occurance of the <EmailAddress></EmailAddress> tag only

Code:

awk -F '</?EmailAddress>' '{print $2}' 456.xml

but i need for multiple times .... means email address tag exists for multiple times in the file ...
so need to check whole xml file for email address wherever <EmailAddress></EmailAddress> tag is present.

Take a look at egrep's multiline/return regex searching. It will allow you to parse the file for each occurrance of <></> and then pipe that to another egrep that uses -v and looks for <></>.

paulsm4 · 12-04-2008, 11:12 PM

kingmaker2003 -

I concur with Telemachos and Chrism01. Do yourself a big favor, and learn just enough Perl to parse a little bit of your file. Then see how easy it is to call Perl from your script. Just try it - and I think you'll concur, too.

IMHO .. PSM