LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-15-2016, 03:02 PM   #1
gmark
LQ Newbie
 
Registered: Nov 2013
Posts: 1

Rep: Reputation: Disabled
Any Way to Read a SINGLE XML VALUE from a big XML File in Linux? Or...?


We got data that was supposed to be CSV, but was sent in a huge XML file.

I've downloaded xmlstarlet, but I'm darned if I can get it to operate the "sel" feature to look down a path and get any sort of value. I see pieces of what should be paths, but they seem to have extraneous characters, and I don't know how to use the various <...> fields to make s decent query. For example,
I want to get: <es:mixedModeRadio>false</es:mixedModeRadio> from the below small piece of the XML file: How?

xmlstarlet sel "/<configData dnPrefix="Undefined">/<xn:SubNetwork id="ONRM_ROOT_MO_R">/<xn:SubNetwork id="MyTown">/<xn:MeContext id="LL12345">/<xn:VsDataContainer id="LL12345">"

Is there an easier way? Is there some intermediate step I'm missing?

Here's a very tiny part of a very large file:

<?xml version="1.0" encoding="UTF-8"?>
<bulkCmConfigDataFile xmlns:un="utranNrm.xsd"
xmlns:es="Edward.15.25.xsd"
xmlns:xn="genericNrm.xsd" xmlns:gn="geranNrm.xsd" xmlns="configData.xsd">
<fileHeader fileFormatVersion="32.615 V4.5" vendorName="Edward"/>
<configData dnPrefix="Undefined">
<xn:SubNetwork id="ONRM_ROOT_MO_R">
<xn:SubNetwork id="MyTown">
<xn:attributes>
<xn:userDefinedNetworkType>MY_SERVERS</xn:userDefinedNetworkType>
<xn:userLabel>MyTown</xn:userLabel>
</xn:attributes>
<xn:MeContext id="LL12345">
<xn:VsDataContainer id="LL12345">
<xn:attributes>
<xn:vsDataType>vsDataMeContext</xn:vsDataType>
<xn:vsDataFormatVersion>EdwardSpecificAttributes.15.25</xn:vsDataFormatVersion>
<es:vsDataMeContext>
<es:userLabel>LL12345</es:userLabel>
<es:ipAddress>11.164.0.116</es:ipAddress>
<es:neMIMversion>vF.1.107</es:neMIMversion>
<es:lostSynchronisation>SYNCHRONISED</es:lostSynchronisation>
<es:bcrLastChange>1452424403156</es:bcrLastChange>
<es:bctLastChange>1452160614628</es:bctLastChange>
<es:multiStandardRbs6k>true</es:multiStandardRbs6k>
<es:mixedModeRadio>false</es:mixedModeRadio>
<es:mirrorMIBversion>F.1.100.S.1.6</es:mirrorMIBversion>
<es:stnNodes></es:stnNodes>
</es:vsDataMeContext>
</xn:attributes>
</xn:VsDataContainer>
<xn:ManagedElement id="1">
<xn:attributes>
<xn:locationName></xn:locationName>
<xn:userDefinedState></xn:userDefinedState>
<xn:vendorName>Edward</xn:vendorName>
<xn:userLabel>LL12345</xn:userLabel>
<xn:managedElementType>ERBS</xn:managedElementType>
<xn:swVersion>108991/23_R0DX</xn:swVersion>
<xn:managedBy>SubNetwork=ONRM_ROOT_MO_R,ManagementNode=ONRM</xn:managedBy>
 
Old 01-15-2016, 03:56 PM   #2
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,783

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
It'll help to lookup XPath syntax (Wikipedia's page has a reasonable overview), which is the syntax for many of xmlstarlet's options (knowing XSLT, which is what the sel command uses under the hood, is also helpful). Also look at chapter 5 of the user guide, your document uses namespaces which is a common complication.

If you want to be precise, this should be the path you want:
Code:
xmlstarlet sel -t -c '/_:bulkCmConfigDataFile/_:configData[@dnPrefix="Undefined"]/xn:SubNetwork[@id="ONRM_ROOT_MO_R"]/xn:SubNetwork[@id="MyTown"]/xn:MeContext[@id="LL12345"]/xn:VsDataContainer[@id="LL12345"]/xn:attributes/es:vsDataMeContext/es:mixedModeRadio' input-file.xml
-t is just the generic start of pattern option, and -c means copy the whole node at the given XPath. Depending on the other nodes in the file you might be able to get away with less precision:
Code:
xmlstarlet sel -t -c '//xn:VsDataContainer[@id="LL12345"]//es:mixedModeRadio' input-file.xml
 
Old 01-15-2016, 03:57 PM   #3
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,268
Blog Entries: 24

Rep: Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195
Welcome to LQ!

Xmlstarlet is the right tool, but your usage syntax is not correct for extracting that data.

You will need to learn two things: XPath syntax and Xmlstarlet command options.

XPath is the general syntax for navigating an XML document and you should find many online tutorials easily, here is one to get you started.

Xmlstarlet provides good interactive help by command and a simple man page.

What you want is to select certain elements from the document, so to get help on the sel command try this:

Code:
xmlstarlet sel --help |less

XMLStarlet Toolkit: Select from XML document(s)
Usage: xmlstarlet sel <global-options> {<template>} [ <xml-file> ... ]
where
  <global-options> - global options for selecting
  <xml-file> - input XML document file name/uri (stdin is used if missing)
  <template> - template for querying XML document with following syntax:

<global-options> are:
  -Q or --quiet             - do not write anything to standard output.
  -C or --comp              - display generated XSLT
  -R or --root              - print root element <xsl-select>
  -T or --text              - output is text (default is XML)
  -I or --indent            - indent output
  -D or --xml-decl          - do not omit xml declaration line
  -B or --noblanks          - remove insignificant spaces from XML tree
  -E or --encode <encoding> - output in the given encoding (utf-8, unicode...)
  -N <name>=<value>         - predefine namespaces (name without 'xmlns:')
                              ex: xsql=urn:oracle-xsql
                              Multiple -N options are allowed.
  --net                     - allow fetch DTDs or entities over network
  --help                    - display help

Syntax for templates: -t|--template <options>
where <options>
  -c or --copy-of <xpath>   - print copy of XPATH expression
  -v or --value-of <xpath>  - print value of XPATH expression
  -o or --output <string>   - output string literal
  -n or --nl                - print new line
  -f or --inp-name          - print input file name (or URL)
  -m or --match <xpath>     - match XPATH expression
  --var <name> <value> --break or
  --var <name>=<value>      - declare a variable (referenced by $name)
  -i or --if <test-xpath>   - check condition <xsl:if test="test-xpath">
  --elif <test-xpath>       - check condition if previous conditions failed
  --else                    - check if previous conditions failed
  -e or --elem <name>       - print out element <xsl:element name="name">
  -a or --attr <name>       - add attribute <xsl:attribute name="name">
 ...
So the basic command syntax to display the values of selected elements would be...

Code:
xmlstarlet -t -v 'XPath-expression' source-file.xml

where in your case...

XPath-expression is something like '//es:mixedModeRadio'
Hope that helps to get you started!
 
Old 01-17-2016, 10:51 AM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,665
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
Honestly, I would say: "use a real programming language that has proper, built-in support for XML ... preferably, with more than one way of doing it."

T-h-e standard way of doing it is to leverage libxml2, which is readily available for all platforms with plenty of "wrappers" for its functionality. Use XPath expressions to request what you want.

Other tools, built for use with "truly massive files," are incremental XML parsers that never attempt to convert the file to an in-memory data structure.

Nevertheless:
  1. This is a "thing already done." Do not waste your time trying to re-invent this, with regular expressions or otherwise.
  2. Also, do not waste your time attempting to do this "in a bash-shell script." You can, thanks to #!shebang, use any language you please to "write a shell command." Use a real programming language. You have dozens to choose from.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
using XML::Twig and DBI for storing a xml-file into a myql-db sayhello_to_the_world Programming 3 05-26-2014 10:54 AM
how to add xml-stylesheet tag in a XML File using libxml2 ? peacemission Programming 6 05-26-2012 02:20 AM
read an xml file shamjs Programming 3 03-29-2012 12:24 PM
[SOLVED] How can I use the command line to split a single-lined XML into a multi-line XML xexers Linux - Software 3 12-09-2010 07:25 AM
looking for software to convert multiple csv files to a single xml file Rocket-boy Linux - Software 6 10-28-2009 10:03 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration