LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   [XML] How to get sub-elements (nodes + texts)? (https://www.linuxquestions.org/questions/linux-software-2/%5Bxml%5D-how-to-get-sub-elements-nodes-texts-4175725429/)

littlebigman 05-26-2023 10:26 AM

[XML] How to get sub-elements (nodes + texts)?
 
Hello,

Since I'm not getting the subscription e-mails, I can't ask in those tools' forums on Sourceforge.

I tried xidel and XMLStarlet as command line tools to parse XML files and grab everything (elements + text) below given nodes… to no avail: The tools only return the texts it finds, not the elements.

Code:

<?xml version="1.0" encoding="UTF-8"?>
<gpx>
  <metadata>
    <name>Some name</name>
  </metadata>
  <trk>
    <name>Track 1</name>
    <trkseg>
      <trkpt lat="48.81782" lon="2.24906">
        <ele>37.5</ele>
      </trkpt>
      <trkpt lat="48.81784" lon="2.24906">
        <ele>37.5</ele>
      </trkpt>
    </trkseg>
  </trk>
  <trk>
    <name>Track 2</name>
    <trkseg>
      <trkpt lat="48.81782" lon="2.24906">
        <ele>37.5</ele>
      </trkpt>
      <trkpt lat="48.81784" lon="2.24906">
        <ele>37.5</ele>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

Code:

xidel -s input.gpx -e "//trk/trkseg/*"
xml sel -t -v "//trk/trkseg/*" input.gpx

Code:

37.5
37.5
37.5
37.5

Am I doing it totally wrong, or are those tools precisely meant to ignore elements themselves?

Thank you.

--
Edit: Yes, indeed. By default, only texts within elements are output.

Code:

xidel input.gpx -e "//trkseg/*/outer-xml(.)"

teckk 05-26-2023 01:17 PM

With your example as MyFile.xml

Example using xml.etree:
Code:

#!/usr/bin/python

from xml.etree import ElementTree
from urllib import request

#Make a user agent string for urllib to use
agent = ('Mozilla/5.0 (Windows NT 10.1; Win64; x64; rv:109.0) '
        'Gecko/20100101 Firefox/113.0')
       
user_agent = {'User-Agent': agent}

class MakeList():
    def __init__(self, url):
   
        #Get the xml to parse
        req = request.Request(url, data=None, headers=user_agent)
        html = request.urlopen(req)
        tree = ElementTree.parse(html)
        root = tree.getroot()
       
        #Get tag data   
        for i in root.iter('trkpt'):
            print(i.attrib)
       
if __name__ == "__main__":

    #Local file or remote url
    url = 'file:///path/to/MyFile.xml'
    MakeList(url)

Code:

python ./MyFile.py
{'lat': '48.81782', 'lon': '2.24906'}
{'lat': '48.81784', 'lon': '2.24906'}
{'lat': '48.81782', 'lon': '2.24906'}
{'lat': '48.81784', 'lon': '2.24906'}


littlebigman 05-26-2023 02:31 PM

Yes, it works fine with Python and ET/BS, but I needed a lighter solution, with just a single binary.

Another way to solve it:

Code:

xidel input.gpx -se "//trk/trkseg/*" --printed-node-format xml
Too bad it wasn't shown at the top of the readme to make it obvious instead of at the very bottom ("Only the string value of elements is printed, unless the --printed-node-format is set to XML or HTML. (E.g. <a>bc</a> only prints "bc")").


All times are GMT -5. The time now is 06:48 PM.