LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   how to modify xml file using sed/awk (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-modify-xml-file-using-sed-awk-916240/)

akhand jyoti 11-29-2011 02:05 PM

how to modify xml file using sed/awk
 
Hi All,
I have a xml file in which i am getting the following info..
$cat phonebook.xml

<name>akhand jyoti</name>
<add>bangalore</add>
<num>123456</num>
<name>vivek roy</name>
<add>new delhi</add>
<num>765432</num>
<name>suhana</name>
<add>saudi arab</add>
<num>768768</num>

and i want to delete the second half(what ever comes after space);the modified xml should be as..

$cat phonebook.xml

<name>akhand</name>
<add>bangalore</add>
<num>123456</num>
<name>vivek</name>
<add>new delhi</add>
<num>765432</num>
<name>suhana</name>
<add>saudi arab</add>
<num>768768</num>

I tried this two option-

$sed -i 's^\(<name>\)\([a-z]*\) \([a-z]*\)\(</name>\)^\1\2\4^' phonebook.xml

but it is neither showing any error nor modifying the file as expected.

then i tried with awk--
$cat xml_mod.awk
#!/bin/awk -f
BEGIN { FS="<|>" }
/name/ { string1=$3
string2=substr(string1,1,index(string1," ")-1)
$3=string2 }
{ print $0 }

after running the same i got
$./xml_mod.awk phonebook.xml

name akhand /name
<add>bangalore</add>
<num>123456</num>
name vivek /name
<add>new delhi</add>
<num>765432</num>
name suhana /name
<add>saudi arab</add>
<num>768768</num>

the angle brackets are missing in name..

please suggest ...
Apart from that can i directly modify the phonebook.xml rather than first to redirect the output to some temp then rename it(as we do using sed -i option)

sycamorex 11-29-2011 02:14 PM

Sorry if I'm missing something, but I can't see any difference between the original file and the one you want to achieve. What space are you talking about?

btw, please wrap your code in the code tags.

Tinkster 11-29-2011 02:43 PM

Quote:

Originally Posted by sycamorex (Post 4537617)
Sorry if I'm missing something, but I can't see any difference between the original file and the one you want to achieve. What space are you talking about?

btw, please wrap your code in the code tags.

The 2nd snippet is missing the last name in <name>.

Tinkster 11-29-2011 02:47 PM

Quote:

Originally Posted by akhand jyoti (Post 4537607)
Hi All,
I have a xml file in which i am getting the following info..
$cat phonebook.xml

<name>akhand jyoti</name>
<add>bangalore</add>
<num>123456</num>
<name>vivek roy</name>
<add>new delhi</add>
<num>765432</num>
<name>suhana</name>
<add>saudi arab</add>
<num>768768</num>

and i want to delete the second half(what ever comes after space);the modified xml should be as..

$cat phonebook.xml

<name>akhand</name>
<add>bangalore</add>
<num>123456</num>
<name>vivek</name>
<add>new delhi</add>
<num>765432</num>
<name>suhana</name>
<add>saudi arab</add>
<num>768768</num>

I tried this two option-

$sed -i 's^\(<name>\)\([a-z]*\) \([a-z]*\)\(</name>\)^\1\2\4^' phonebook.xml

but it is neither showing any error nor modifying the file as expected.

then i tried with awk--
$cat xml_mod.awk
#!/bin/awk -f
BEGIN { FS="<|>" }
/name/ { string1=$3
string2=substr(string1,1,index(string1," ")-1)
$3=string2 }
{ print $0 }

after running the same i got
$./xml_mod.awk phonebook.xml

name akhand /name
<add>bangalore</add>
<num>123456</num>
name vivek /name
<add>new delhi</add>
<num>765432</num>
name suhana /name
<add>saudi arab</add>
<num>768768</num>

the angle brackets are missing in name..

please suggest ...
Apart from that can i directly modify the phonebook.xml rather than first to redirect the output to some temp then rename it(as we do using sed -i option)


Code:

sed -r '/<name>/ s@>([^ ]+).*</@>\1</@' vivek
<name>akhand</name>
<add>bangalore</add>
<num>123456</num>
<name>vivek</name>
<add>new delhi</add>
<num>765432</num>
<name>suhana</name>
<add>saudi arab</add>
<num>768768</num>

Btw, your sed snippet works here ... maybe the data is wonky?



Cheers,
Tink


All times are GMT -5. The time now is 01:19 PM.