LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   How To get the data from a tag in XML File (https://www.linuxquestions.org/questions/programming-9/how-to-get-the-data-from-a-tag-in-xml-file-688066/)

kingmaker2003 12-03-2008 12:46 PM

How to get data from xml files tags(from data tags)
 
hi I have a tag in XML file in unix like this

<EmailAddress>abc@gmail.com</EmailAddress>

this tag is there for multiple times in the xml file and the data is in continuous line like below

<State>UN</State><Zip/><CompanyName/><EmailAddress>FDF@gmail.COM</EmailAddress><PromoType>UNKNOWN</PromoType></Promotion></PromotionList<State>UN</State><Zip/><CompanyName/><EmailAddress>zd4946@gmail.com</EmailAddress>

I have to check the data in between bold tags is valid or not ... means have to check whether its a email address or not

and have to find the length of the attribute means tag ...script is in ksh

sorry if its already asked...i checked but i didnt get Exatly matching result for my requirement


any help in this

Telemachos 12-03-2008 01:47 PM

You could try to write a regular expression to do this, but parsing xml, html, etc. is notoriously difficult. If ksh is like Bash in terms of syntax, that sounds like trying to sculpt a piece of marble with a spoon. I would recommend looking at a scripting language with XML parsers available (eg, Perl, Python or Ruby).

kingmaker2003 12-03-2008 01:52 PM

Quote:

Originally Posted by Telemachos (Post 3363359)
You could try to write a regular expression to do this, but parsing xml, html, etc. is notoriously difficult. If ksh is like Bash in terms of syntax, that sounds like trying to sculpt a piece of marble with a spoon. I would recommend looking at a scripting language with XML parsers available (eg, Perl, Python or Ruby).

I think we can get with awk ... I got the answer but works with 1st occurance of the <EmailAddress></EmailAddress> tag only

Code:

awk -F '</?EmailAddress>' '{print $2}' 456.xml
but i need for multiple times .... means email address tag exists for multiple times in the file ...
so need to check whole xml file for email address wherever <EmailAddress></EmailAddress> tag is present.

chrism01 12-03-2008 05:25 PM

Concur with Telemachos

kingmaker2003 12-04-2008 08:33 AM

Quote:

Originally Posted by chrism01 (Post 3363543)
Concur with Telemachos

what that means

chrism01 12-04-2008 05:54 PM

http://dictionary.reference.com/dic?...&search=search
:)

xhypno 12-04-2008 07:26 PM

Quote:

Originally Posted by kingmaker2003 (Post 3363367)
I think we can get with awk ... I got the answer but works with 1st occurance of the <EmailAddress></EmailAddress> tag only

Code:

awk -F '</?EmailAddress>' '{print $2}' 456.xml
but i need for multiple times .... means email address tag exists for multiple times in the file ...
so need to check whole xml file for email address wherever <EmailAddress></EmailAddress> tag is present.

Take a look at egrep's multiline/return regex searching. It will allow you to parse the file for each occurrance of <></> and then pipe that to another egrep that uses -v and looks for <></>.

paulsm4 12-04-2008 11:12 PM

kingmaker2003 -

I concur with Telemachos and Chrism01. Do yourself a big favor, and learn just enough Perl to parse a little bit of your file. Then see how easy it is to call Perl from your script. Just try it - and I think you'll concur, too.

IMHO .. PSM


All times are GMT -5. The time now is 09:42 AM.