LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-28-2021, 09:08 AM   #16
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled

Quote:
Originally Posted by crts View Post
No, an xpath only selects nodes, you cannot delete nodes via xpath.
Actually, depends on the tools used.
Code:
xmlstarlet ed -d '//*[@lang!="fin"]' input.xml
I don't recommend tools from LT-XML 2 for this, nevetheless lxreplace also works (more or less):
Code:
ln -s /usr/share/xmltv/xmltv.dtd
lxreplace -q '//*[@lang!="fin"]' -d input.xml|
xmllint --format -
xml_grep provided by Perl module XML::Twig supports only a subset of XPath and uses a slightly different syntax:
Code:
xml_grep -v '//*[@lang and @lang!="fin"]' input.xml|
xml_pp
And of course, you always can do it via XQuery. I'm pretty sure that Xidel, XQilla, Saxon or Zorba can handle this. E.g.
Code:
xidel --xml input.xml -se 'x:replace-nodes(//*[@lang!="fin"],())'|
xmllint --format -
Here is a quick and dirty solution using xml2 and sed:
Code:
xml2 <input.xml|
sed -n '/programme\/[^@]/!p;/@lang=fin$/,/=/p'|
2xml|
xmllint --format -
The same using xml-sed from xml-coreutils. Unfortunately, it's rather poorly documented, so perhaps this could be done more efficiently.
Code:
xml-sed '
  s/@lang=/@x=0&/x
  s/@x=0@lang=fin/@x=1@lang=fin/x
  s/.*@x=0@.*//z
  s/@x=1//x
  ' input.xml|xmllint --format -
Or using hxremove from HTML-XML-utils and filtering by CSS selectors
Code:
<input.xml hxremove programme ':not([lang=fin])'|
xmllint --format -
There is also yq, a Python wrapper around jq for YAML (not to be confused with another yq written in Go). It provides the command xq for XML processing. So you also can filter XML using jq expression:
Code:
xq -x 'del(.tv.programme[][][]?|select(."@lang"!="fin"))' input.xml

Last edited by shruggy; 04-28-2021 at 02:15 PM.
 
Old 04-29-2021, 04:04 AM   #17
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Quote:
Originally Posted by shruggy View Post
Actually, depends on the tools used.
Code:
xmlstarlet ed -d '//*[@lang!="fin"]' input.xml
Actually, it is as I said. The xpath itself cannot delete nodes. The xpath used in xmlstarlet still only returns a selection of nodes. If one wants to delete this selection from the XML document then one needs to use another tool (e.g., an XSLT processor), just as I suggested. I am not familiar with xmlstarlet but if it can delete nodes based on an xpath selection then it is to be preferred over an XSLT processor.
 
Old 04-30-2021, 11:35 AM   #18
mimorek
Member
 
Registered: Feb 2013
Distribution: Debian (jessie)
Posts: 42

Rep: Reputation: Disabled
Code:
#!/bin/bash

while read line
do
  if [[ $line =~ 'lang=' ]]
  then
    if [[ $line =~ 'lang="fin"' ]]
    then
      echo $line
    fi
  else
    echo $line
  fi
done
 
Old 10-07-2021, 07:27 AM   #19
Jtmstr09
LQ Newbie
 
Registered: Apr 2021
Posts: 3

Rep: Reputation: Disabled
Next puzzle.

What if i wanna add custom poster images to programs in xml. If i have a text file that contains title and poster strings. Can i somehow seek XML file for titles and if found then add poster string right below it?

Example text file:

Code:
<title lang="fin">Ei lähetystä</title>=<icon src="https://exampleposter.jpg"></icon>
# If this is found                       New line
So

Code:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE tv SYSTEM "xmltv.dtd">
<tv generator-info-name="TVHeadend-4.2.8-34~g24a2f59e9" source-info-name="tvh-Tvheadend">
<channel id="f7fa62af37560ce2835bf5a1ec414b2a">
  <display-name>Sky News</display-name>
  <display-name>85</display-name>
</channel>
<channel id="7c0cb8307321aa08b16e4ec05711e672">
  <display-name>HISTORY HD</display-name>
  <display-name>124</display-name>
</channel>
<programme start="20210425190000 +0300" stop="20210425230000 +0300" channel="6751678153fa02ec1bc10d516d2d1450">
  <title lang="fin">Ei lähetystä</title>
  <sub-title lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</sub-title>
  <desc lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</desc>
</programme>
<programme start="20210425230000 +0300" stop="20210426000000 +0300" channel="6751678153fa02ec1bc10d516d2d1450">
  <title lang="fin">Ei lähetystä</title>
  <sub-title lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</sub-title>
  <desc lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</desc>
</programme>
Becomes

Code:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE tv SYSTEM "xmltv.dtd">
<tv generator-info-name="TVHeadend-4.2.8-34~g24a2f59e9" source-info-name="tvh-Tvheadend">
<channel id="f7fa62af37560ce2835bf5a1ec414b2a">
  <display-name>Sky News</display-name>
  <display-name>85</display-name>
</channel>
<channel id="7c0cb8307321aa08b16e4ec05711e672">
  <display-name>HISTORY HD</display-name>
  <display-name>124</display-name>
</channel>
<programme start="20210425190000 +0300" stop="20210425230000 +0300" channel="6751678153fa02ec1bc10d516d2d1450">
  <title lang="fin">Ei lähetystä</title>
  <icon src="https://exampleposter.jpg"></icon>
  <sub-title lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</sub-title>
  <desc lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</desc>
</programme>
<programme start="20210425230000 +0300" stop="20210426000000 +0300" channel="6751678153fa02ec1bc10d516d2d1450">
  <title lang="fin">Ei lähetystä</title>
  <icon src="https://exampleposter.jpg"></icon>
  <sub-title lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</sub-title>
  <desc lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</desc>
</programme>
 
Old 10-07-2021, 07:37 AM   #20
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,617

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555
Quote:
Originally Posted by Jtmstr09 View Post
What if i wanna add custom poster images to programs in xml. If i have a text file that contains title and poster strings. Can i somehow seek XML file for titles and if found then add poster string right below it?
Show what you have tried so far.

 
Old 10-07-2021, 07:50 AM   #21
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,337
Blog Entries: 3

Rep: Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732Reputation: 3732
Quote:
Originally Posted by Jtmstr09 View Post
Between <programme> and </programme> every line that includes "lang="fin"" will be keeped and if not the line is removed.
XPaths can only select, but depending on the circumstances you might be able to craft an XPath which selects the ones you want while excluding the others. You'll need to provide some more concrete examples, as an approximate solution:

Code:
//programme//*[@lang="fin"]
That will select any element, n levels deep, within the <programme> elements, if the attribute lang is equal to "fin". If you meant for the attribute to be applied to the <programme> element itself, then the XPath is a little different.

Code:
//programme[@lang="fin"]
There's more of course, but, again, more specific examples are needed for a more specific answer. So show more of your data as well as the XPaths you have been trying.

For a quick overview of possibilities, see the XPath Cheat Sheet.

Last edited by Turbocapitalist; 10-07-2021 at 07:51 AM.
 
Old 10-07-2021, 08:54 AM   #22
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
There was a similar thread not long ago.

IIUC you want to insert different poster image data based of the name of a TV show which is in the tag <title>. I'd suggest doing this in two steps:
  1. add an empty tag <icon> after each <title>;
  2. go through the file again and for each <title> matching the name from your text file add the corresponding src attribute to the following <icon>.
Do it in a loop for all the TV shows from your list.

This will show how to do the matching for one iteration:
Code:
show1='Ei lähetystä'
poster1='https://exampleposter.jpg'
<input.xml xmlstarlet ed -a //title -t elem -n icon |
  xmlstarlet ed -a "//title[text()='$show1']/following-sibling::icon" -t attr -n src -v "$poster1"
For the text file, I'd suggest TSV format (tab separated values). Something like
Code:
p0.jpg	Ei lähetystä
p1.jpg	Neighbours
p2.jpg	Hope And Faith
p3.jpg	My Wife And Kids
p4.jpg	Friends
p5.jpg	Date Movie
p6.jpg	Chaos
p7.jpg	Girls Of The Playboy Mansion
p8.jpg	E! News
p9.jpg	The Daily 10
I guess you'll have to replace double quotes in show names with &quot; and ampersands with &amp;.

Last edited by shruggy; 10-07-2021 at 11:28 AM.
 
Old 10-14-2021, 09:05 AM   #23
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,679
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
There is an industry-standard binary library called libxml2 which is exposed by most programming languages. It is the "go-to tool" for manipulating and searching XML documents. It offers "XPath expressions" which is an excellent way to search for content without writing code.

There are other tools, such as SAX, which are useful for handling very large XML documents. This tool is also exposed by most languages. It iterates through the document without first building an in-memory data structure, calling your handler subroutines at specified points.

You should never attempt to manipulate XML (or, JSON ...) using "regular expressions." You won't get it quite right, and you will waste a lot of your time in the attempt.

"bash" is suitable only for the most-basic things: it was never intended to be a programming language. (Only the Korn Shell – ksh – attempted to build-in an actual "language.") But you can put #!shebang as the first line of your script and thereafter implement it in any "real" language of your choosing. No one will ever know.

Last edited by sundialsvcs; 10-14-2021 at 09:09 AM.
 
1 members found this post helpful.
Old 10-14-2021, 06:02 PM   #24
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,786

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by sundialsvcs View Post
There is an industry-standard binary library called libxml2 which is exposed by most programming languages. It is the "go-to tool" for manipulating and searching XML documents.
Which incidentally, is what xmlstarlet uses.

Quote:
There are other tools, such as SAX, which are useful for handling very large XML documents. This tool is also exposed by most languages.
This is confused: SAX is a style of API, not a tool nor an alternative to libxml2. Libxml2 provides it: http://xmlsoft.org/html/libxml-SAX2.html
 
1 members found this post helpful.
Old 10-15-2021, 11:43 AM   #25
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,679
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
No, I am correct.

SAX is an XML parser that is designed to work with "enormous" files that won't work as an in-memory data structure. It walks through the XML file always treating it as a file. It calls your routines at the points which you designate.

"libxml2" is the tool used by most languages, which also has the benefit that the software library which produced the file that you are now reading is probably the same one that you are now using to read it. It doesn't matter if different languages were used as long as all of them are using the same library to actually do the work.

Last edited by sundialsvcs; 10-15-2021 at 11:48 AM.
 
Old 10-15-2021, 12:15 PM   #26
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206
Although there may be a tool or an application named sax (I don't really know in this context), having implemented a few SAX parsers myself, to my mind SAX generally refers to an algorithm, a mechanism, an API - that is an approach to the problem of parsing very large XML files, but not an application or a library itself.

Last edited by astrogeek; 10-15-2021 at 02:22 PM.
 
Old 10-15-2021, 03:08 PM   #27
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
@sundialsvcs. Are you confusing SAX with Saxon perhaps? See my post #16 above.
 
Old 10-15-2021, 10:22 PM   #28
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,786

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by sundialsvcs View Post
SAX is an XML parser that is designed to work with "enormous" files that won't work as an in-memory data structure. It walks through the XML file always treating it as a file. It calls your routines at the points which you designate.
Yes, libxml2 provides a SAX parser like this (in addition to APIs which build up an in-memory structure for when that is more convenient).
 
Old 10-16-2021, 07:05 AM   #29
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,617

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

You're all equally wrong. :P

Quote:
Originally Posted by http://www.saxproject.org/
SAX is the Simple API for XML, originally a Java-only API. SAX was the first widely adopted API for XML in Java, and is a “de facto” standard. The current version is SAX 2.0.1, and there are versions for several programming language environments other than Java.
Quote:
Originally Posted by https://en.wikipedia.org/wiki/Simple_API_for_XML
SAX (Simple API for XML) is an event-driven online algorithm for parsing XML documents, with an API developed by the XML-DEV mailing list. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole—building the full abstract syntax tree of an XML document for convenience of the user—SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making a single pass through the input stream.
Quote:
Originally Posted by http://www.megginson.com/downloads/SAX/
David Megginson, principal of Megginson Technologies, led the development of the Simple API for XML (SAX), a widely-used specification that describes how XML parsers can pass information efficiently from XML documents to software applications. SAX was originally implemented in Java, but is now supported by nearly all major programming languages.
So "SAX" can refer to a specific API, and/or the generalised algorithm/specification, and/or any parser which implements that.

 
Old 10-16-2021, 10:23 AM   #30
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,679
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
Let me just step back from the mostly-irrelevant messy details and say that "there are two major ways to do it." One translates the XML into an in-memory data structure, while the other one does not. Both approaches push the handling of the XML document out of "your code, which probably has a bug in it," and into "their code, which probably doesn't." The latter strategy is most appropriate when the document is gigantic – as some XML documents certainly are.

The only "distinctly wrong approach" is to treat it as a text file and try to attack it using your own regexes. That's the "Hotel California mistake." "They stab it with their steely knives, but they just can't kill the beast."

Last edited by sundialsvcs; 10-16-2021 at 10:25 AM.
 
  


Reply

Tags
xml



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Parse XML in bash script MikeyCarter Linux - Software 1 02-16-2007 01:19 PM
Bash scripting. Strip chars from file names, etc. mooreted Programming 7 02-11-2007 08:52 PM
Bash script to strip a certain directory out of directories in a directory? rylan76 Linux - General 3 08-29-2006 11:35 AM
Add file content to a variable (bash)? LinuxSeeker Programming 4 12-19-2005 01:41 PM
Need help to strip XML & XSL tags from multiple files dfrechet Programming 9 10-12-2005 06:52 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration