LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   how to pull only values from the xml tags? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-pull-only-values-from-the-xml-tags-4175510680/)

santosh0782 07-10-2014 06:22 AM

how to pull only values from the xml tags?
 
I have a file test.txt
Code:

cat test.txt

<data_block><customer>P1584 - INGENICO SOUTH COOP          </customer><Connect_Type>Internet      </Connect_Type><description>,                          </description>
<File>003 to BMS at 02:10</File>                                                                                                                                       
<File>023 to RBS at 03:10</File>                                                                                                                                       
<File>004 to BMS at 04:10</File>                                                                                                                                       
<File>007 to BMS at 05:01</File>                                                                                                                                       
<File>002 to BMS at 05:01</File>                                                                                                                                       
<File>001 to BMS at 05:01</File>                                                                                                                                       
<ACQ><ACQ_Detail>BMS - Expected  2 files between 02:00 - 05:30</ACQ_Detail><ACQ_Found>  5</ACQ_Found>                                                                 
<ACQ_TEXT>                        </ACQ_TEXT><ACQ_message>Y                            </ACQ_message></ACQ>                                                         
</data_block>

i want to retrieve few values from this xml file, removing leading and trailing spaces.

e.g.

1. to pull value in <ACQ_message> i tried:

$ sed -n "/P1584/,/<\/data_block>/p" "test.txt"|grep "<ACQ_message>"|awk -F ' ' '{print $2}'|cut -c25-35
output:
Y

what is the best way to get only value inside the <ACQ_message>? removing leading and trailing spaces, because value could be of any long characters

2. similar way i want to pull only value inside the <ACQ_Found> tag, however i tried:
$ sed -n "/P1584/,/<\/data_block>/p" "test.txt"|grep "<ACQ_Found>"|awk -F '-' '{print $3}'
output:
05:30</ACQ_Detail><ACQ_Found> 5</ACQ_Found>


could someone please help?

pan64 07-10-2014 06:34 AM

do not use sed|grep|awk|cut chains, usually it can be solved with a single sed or awk or perl or ....
Anyway probably an xml parser would be a better idea.
http://stackoverflow.com/questions/4...ng-shellscript

ndc85430 07-10-2014 06:40 AM

Yeah, I'd also go with something meant for parsing XML (like Python's ElementTree, but there are undoubtedly many choices).

santosh0782 07-29-2014 04:33 AM

Quote:

Originally Posted by pan64 (Post 5201651)
do not use sed|grep|awk|cut chains, usually it can be solved with a single sed or awk or perl or ....
Anyway probably an xml parser would be a better idea.
http://stackoverflow.com/questions/4...ng-shellscript

provided link is very helpfull, thanks a lot :-)

pan64 07-29-2014 06:29 AM

glad to help you


All times are GMT -5. The time now is 09:20 PM.