Any Way to Read a SINGLE XML VALUE from a big XML File in Linux? Or...?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Any Way to Read a SINGLE XML VALUE from a big XML File in Linux? Or...?
We got data that was supposed to be CSV, but was sent in a huge XML file.
I've downloaded xmlstarlet, but I'm darned if I can get it to operate the "sel" feature to look down a path and get any sort of value. I see pieces of what should be paths, but they seem to have extraneous characters, and I don't know how to use the various <...> fields to make s decent query. For example,
I want to get: <es:mixedModeRadio>false</es:mixedModeRadio> from the below small piece of the XML file: How?
xmlstarlet sel "/<configData dnPrefix="Undefined">/<xn:SubNetwork id="ONRM_ROOT_MO_R">/<xn:SubNetwork id="MyTown">/<xn:MeContext id="LL12345">/<xn:VsDataContainer id="LL12345">"
Is there an easier way? Is there some intermediate step I'm missing?
It'll help to lookup XPath syntax (Wikipedia's page has a reasonable overview), which is the syntax for many of xmlstarlet's options (knowing XSLT, which is what the sel command uses under the hood, is also helpful). Also look at chapter 5 of the user guide, your document uses namespaces which is a common complication.
If you want to be precise, this should be the path you want:
Code:
xmlstarlet sel -t -c '/_:bulkCmConfigDataFile/_:configData[@dnPrefix="Undefined"]/xn:SubNetwork[@id="ONRM_ROOT_MO_R"]/xn:SubNetwork[@id="MyTown"]/xn:MeContext[@id="LL12345"]/xn:VsDataContainer[@id="LL12345"]/xn:attributes/es:vsDataMeContext/es:mixedModeRadio' input-file.xml
-t is just the generic start of pattern option, and -c means copy the whole node at the given XPath. Depending on the other nodes in the file you might be able to get away with less precision:
Code:
xmlstarlet sel -t -c '//xn:VsDataContainer[@id="LL12345"]//es:mixedModeRadio' input-file.xml
XPath is the general syntax for navigating an XML document and you should find many online tutorials easily, here is one to get you started.
Xmlstarlet provides good interactive help by command and a simple man page.
What you want is to select certain elements from the document, so to get help on the sel command try this:
Code:
xmlstarlet sel --help |less
XMLStarlet Toolkit: Select from XML document(s)
Usage: xmlstarlet sel <global-options> {<template>} [ <xml-file> ... ]
where
<global-options> - global options for selecting
<xml-file> - input XML document file name/uri (stdin is used if missing)
<template> - template for querying XML document with following syntax:
<global-options> are:
-Q or --quiet - do not write anything to standard output.
-C or --comp - display generated XSLT
-R or --root - print root element <xsl-select>
-T or --text - output is text (default is XML)
-I or --indent - indent output
-D or --xml-decl - do not omit xml declaration line
-B or --noblanks - remove insignificant spaces from XML tree
-E or --encode <encoding> - output in the given encoding (utf-8, unicode...)
-N <name>=<value> - predefine namespaces (name without 'xmlns:')
ex: xsql=urn:oracle-xsql
Multiple -N options are allowed.
--net - allow fetch DTDs or entities over network
--help - display help
Syntax for templates: -t|--template <options>
where <options>
-c or --copy-of <xpath> - print copy of XPATH expression
-v or --value-of <xpath> - print value of XPATH expression
-o or --output <string> - output string literal
-n or --nl - print new line
-f or --inp-name - print input file name (or URL)
-m or --match <xpath> - match XPATH expression
--var <name> <value> --break or
--var <name>=<value> - declare a variable (referenced by $name)
-i or --if <test-xpath> - check condition <xsl:if test="test-xpath">
--elif <test-xpath> - check condition if previous conditions failed
--else - check if previous conditions failed
-e or --elem <name> - print out element <xsl:element name="name">
-a or --attr <name> - add attribute <xsl:attribute name="name">
...
So the basic command syntax to display the values of selected elements would be...
Code:
xmlstarlet -t -v 'XPath-expression' source-file.xml
where in your case...
XPath-expression is something like '//es:mixedModeRadio'
Honestly, I would say: "use a real programming language that has proper, built-in support for XML ... preferably, with more than one way of doing it."
T-h-e standard way of doing it is to leverage libxml2, which is readily available for all platforms with plenty of "wrappers" for its functionality. Use XPath expressions to request what you want.
Other tools, built for use with "truly massive files," are incremental XML parsers that never attempt to convert the file to an in-memory data structure.
Nevertheless:
This is a "thing already done." Do not waste your time trying to re-invent this, with regular expressions or otherwise.
Also, do not waste your time attempting to do this "in a bash-shell script." You can, thanks to #!shebang, use any language you please to "write a shell command." Use a real programming language. You have dozens to choose from.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.