LinuxQuestions.org - Bash script to strip some content from XML file.

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Bash script to strip some content from XML file. (https://www.linuxquestions.org/questions/programming-9/bash-script-to-strip-some-content-from-xml-file-591378/)

Bash script to strip some content from XML file.

I've got a large xml file containing TV listings for my mythtv box, and I want to filter it so I'm left with just the channels I recieve.

It's in this form:

Code:

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE tv SYSTEM "xmltv.dtd">

<tv lots=of tags=which give=basic_info>



<channel id="1012.dvb.guide">

        <display-name>Nat Geographic</display-name>

        <icon src="http://********/epg/icons/national_geographic.jpg" />

</channel>



#There are lots of the above channel sections, and then lots of the following programme sections.



<programme channel="1021.dvb.guide" start="20071013003000 +1300" stop="20071013012500 +1300">

        <title lang="eng">Sexiest Action Heroes</title>

        <desc>They are our sexy heroes and hell-raisers, the lucsious victors and vixens who play by their own rules and always hold the winning hand - smashing a few skulls in the process.</desc>

        <category>tvshow</category>

        <category>Reality</category>

        <rating system="SKY-NZ">

                <value>R16</value>

        </rating>

</programme>



</tv>



<!--

        [{'rating': '6', 'description': 'So then there's a bunch of footer info.....

So we have the sections;

Code:

Header stuff



Channels



Programmes



Footer stuff

So basically what I need to do is, from this, construct another file which has the header, the channels I have, the programmes on the channels I have, and then the footer stuff.

The difficult bit (as far as my bash text processing skills go) is filtering the channels and programmes. I figure I'll put the channel ID's of the channels I want in a list in a file, from which the script can get them, then I suppose the best way would be to strip everything which doesn't match, so while I'll end up doing is:

Find a text string which goes:

Code:

<channel id="$i">******</channel>

and if $i isn't in the list, remove it, then do the same kind of thing for the programme entries.

Cheers!

Try this perl script:

Code:

#!/usr/bin/perl -w

use strict;

use File::Copy;



my $l_input="progs.xml";

my $l_output="output.xml";

my $l_selection="1012.dvb.guide";

my $l_print=1;



if (open(INPUT, "$l_input"))

{

  open(OUTPUT,"> $l_output") or die "Unable to open output file";



  while (<INPUT>)

  {

      if (/^\<channel/)

      {

        ## Found an entry for channel check if it matches criteria

        if (/$l_selection/)

        {

            $l_print=1;

        } else {

            $l_print=0;

        }

      }



      if (/^\<programme/)

      {

        ## Found an entry for programme check if it matches criteria

        if (/$l_selection/)

        {

            $l_print=1;

        } else {

            $l_print=0;

        }

      }



      if (/^\<\/tv/)

      {

        $l_print=1;

      }



      if ( $l_print == 1 )

      {

        print OUTPUT "$_";

      }      

  }

}

Will print every line up until the first instance which starts with either <programme or <channel
In these instances, it checks the selection and decides if it should print the section or not.

Once it gets to a line starting </tv
it sets the print option back on.

The code is not perfect and only allows for one set of selection criteria, but its a starting point.

Thanks for that, it's very helpful. I tried and tried to make it accept more strings using arrays etc, but I don't really know perl and didn't manage to get it working.

Is there any chance you could help me to get it to accept more, filling them in at the top of the script is perfectly acceptable for this solution?

Thanks

Can you provide a bigger sample of data? maybe the full header/footer and a couple of channels. or maybe pastebin the whole file..

Also I'm guessing you want some sort of an array like this at the top of the script?
mychannels = [chan1, chan2, chan3]

Here's a much larger chunk of data (still far from the whole file):

Code:

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE tv SYSTEM "xmltv.dtd">

<tv generator-info-name="epgsnoop/0.12beta" generator-info-url="http://nice.net.nz/epgsnoop" date="20071013030543 +1300">

<channel id="1205.dvb.guide">

        <display-name>Unknown</display-name>

</channel>

<channel id="1012.dvb.guide">

        <display-name>Nat Geographic</display-name>

        <icon src="http://nice.net.nz/epg/icons/national_geographic.jpg" />

</channel>

<channel id="1021.dvb.guide">

        <display-name>E!</display-name>

        <icon src="http://nice.net.nz/epg/icons/e.jpg" />

</channel>

<channel id="1026.dvb.guide">

        <display-name>BBC World</display-name>

        <icon src="http://nice.net.nz/epg/icons/bbc_world.jpg" />

</channel>

<channel id="1031.dvb.guide">

        <display-name>TV One</display-name>

        <icon src="http://nice.net.nz/epg/icons/1.jpg" />

        <url>http://www.tv1.co.nz</url>

</channel>

<channel id="1028.dvb.guide">

        <display-name>Southland TV</display-name>

        <icon src="http://nice.net.nz/epg/icons/southland_tv.jpg" />

</channel>

<channel id="1038.dvb.guide">

        <display-name>Prime</display-name>

        <icon src="http://nice.net.nz/epg/icons/prime.jpg" />

</channel>

<channel id="1050.dvb.guide">

        <display-name>Preview</display-name>

</channel>

<channel id="1063.dvb.guide">

        <display-name>SKY Box Office</display-name>

        <icon src="http://nice.net.nz/epg/icons/sky_box_office.jpg" />

</channel>

<channel id="1071.dvb.guide">

        <display-name>SKY Box Office</display-name>

        <icon src="http://nice.net.nz/epg/icons/sky_box_office.jpg" />

</channel>

<channel id="1192.dvb.guide">

        <display-name>Weather</display-name>

        <icon src="http://nice.net.nz/epg/icons/weather_channel.jpg" />

</channel>

<channel id="1079.dvb.guide">

        <display-name>KTV 2</display-name>

        <icon src="http://nice.net.nz/epg/icons/ktv2.jpg" />

</channel>

<channel id="1086.dvb.guide">

        <display-name>CTV 6</display-name>

        <icon src="http://nice.net.nz/epg/icons/ctv6.jpg" />

</channel>

<channel id="1078.dvb.guide">

        <display-name>KTV 1</display-name>

        <icon src="http://nice.net.nz/epg/icons/ktv1.jpg" />

</channel>

<channel id="1007.dvb.guide">

        <display-name>Juice TV</display-name>

        <icon src="http://nice.net.nz/epg/icons/juice_tv.jpg" />

</channel>

<programme channel="1032.dvb.guide" start="20071016170000 +1300" stop="20071016173000 +1300">

        <title lang="eng">Neighbours</title>

        <desc>Paul's shocking revelation ends his relationship with Rebecca - but she bounces back with a bold proposition for Toadie and Rosie. Carmella negotiates a compromise with Ollie over baby planning.</desc>

        <category>tvshow</category>

        <category>General Show</category>

        <rating system="SKY-NZ">

                <value>G</value>

        </rating>

</programme>

<programme channel="1032.dvb.guide" start="20071016173000 +1300" stop="20071016180000 +1300">

        <title lang="eng">Hope And Faith</title>

        <desc>Hope and Faith try to help Sydney and end up damaging of Charley's vintage convertable.</desc>

        <category>tvshow</category>

        <category>General Show</category>

        <rating system="SKY-NZ">

                <value>G</value>

        </rating>

</programme>

<programme channel="1032.dvb.guide" start="20071016180000 +1300" stop="20071016183000 +1300">

        <title lang="eng">My Wife And Kids</title>

        <desc>Michael goes overboard when Jay requests more romantic attention from him.</desc>

        <category>tvshow</category>

        <category>General Show</category>

        <rating system="SKY-NZ">

                <value>G</value>

        </rating>

</programme>

<programme channel="1032.dvb.guide" start="20071016183000 +1300" stop="20071016190000 +1300">

        <title lang="eng">Friends</title>

        <desc>Chandler and Monica's relationship becomes less of a secret; Ross must prove to his boss that he is sane in order to return to his job.</desc>

        <category>tvshow</category>

        <category>General Show</category>

        <rating system="SKY-NZ">

                <value>G</value>

        </rating>

</programme>

<programme channel="1002.dvb.guide" start="20071015112000 +1300" stop="20071015124500 +1300">

        <title lang="eng">Date Movie</title>

        <desc>Before Julia can have her Big Fat Greek Wedding, she has to Meet the Parents, deal with The Wedding Planner and confront a woman who wants to stop her Best Friend's Wedding. Starring: Alyson Hannigan. (WS)</desc>

        <category>movie</category>

        <category>Comedy</category>

        <rating system="SKY-NZ">

                <value>M S</value>

        </rating>

</programme>

<programme channel="1002.dvb.guide" start="20071015124500 +1300" stop="20071015143000 +1300">

        <title lang="eng">Chaos</title>

        <desc>Two cops, one a rookie and one a grizzled veteran, are partnered up and must try to uncover how five bank robbers escaped from a bank during a heist. Starring: Jason Statham, Ryan Phillippe, Wesley Snipes. (WS)</desc>

        <category>movie</category>

        <category>Action</category>

        <rating system="SKY-NZ">

                <value>M VLS</value>

        </rating>

</programme>

<programme channel="1021.dvb.guide" start="20071013190000 +1300" stop="20071013193000 +1300">

        <title lang="eng">Girls Of The Playboy Mansion</title>

        <desc>Let Them Eat Birthday Cake. For Holly's birthday, the Girls decide to throw a lavish party with a Marie Antoinette theme.</desc>

        <category>tvshow</category>

        <category>Reality</category>

        <rating system="SKY-NZ">

                <value>18+ S</value>

        </rating>

</programme>

<programme channel="1021.dvb.guide" start="20071013193000 +1300" stop="20071013200000 +1300">

        <title lang="eng">E! News</title>

        <desc>The most comprehensive, up-to-the-minute reports on the day's top entertainment news.</desc>

        <category>tvshow</category>

        <category>Reality</category>

        <rating system="SKY-NZ">

                <value>PG</value>

        </rating>

</programme>

<programme channel="1021.dvb.guide" start="20071013200000 +1300" stop="20071013203000 +1300">

        <title lang="eng">The Daily 10</title>

        <desc>The Daily 10 is a fast-paced, hosts-driven, topical entertainment news show with attitude that recaps the top ten entertainment stories of the moment.</desc>

        <category>tvshow</category>

        <category>Reality</category>

        <rating system="SKY-NZ">

                <value>PG</value>

        </rating>

</programme>

<programme channel="1021.dvb.guide" start="20071013203000 +1300" stop="20071013213000 +1300">

        <title lang="eng">Best Of The Girls Of The...</title>

        <desc>Playboy Mansion. Hef and the girls have a very special 'movie night' at the mansion. The happy quartet snuggle up to watch and relive some of their favorite moments from the past three seasons.</desc>

        <category>tvshow</category>

        <category>Reality</category>

        <rating system="SKY-NZ">

                <value>18+ S</value>

        </rating>

</programme>

<programme channel="1021.dvb.guide" start="20071013213000 +1300" stop="20071013220000 +1300">

        <title lang="eng">Girls Of The Playboy Mansion</title>

        <desc>Snow Place Like Home. It's Christmas at the Mansion and the staff work like elves to create a winter wonderland for Hef and the Girls - complete with a snow covered front yard!</desc>

        <category>tvshow</category>

        <category>Reality</category>

        <rating system="SKY-NZ">

                <value>18+ S</value>

        </rating>

</programme>

</tv>



<!--

        [{'rating': '6', 'description': 'Catch all the latest moves on No Mercy as history was made one more time. See who will be the next champion to hold the title when the smoke clears.', 'language': 'eng', 'start': '0xffffffffff', 'country': 'NZL', 'durationinfo': '03:26:11 (UTC)', 'title': '<EM>WWE On Demand - Scheduling Only</EM>', 'channel_id': '1097', 'duration': '0x0032611', 'startinfo': '2038-04-22 ff:ff:ff (UTC)', 'ratinginfo': 'minimum age: 9 years'}, ValueError("invalid literal for int() with base 10: 'ff'",)]

        [{'rating': '6', 'description': "Action/Sport: Rocky Balboa comes out of retirement to step into the ring for the last time and face the heavyweight champ Mason 'The Line' Dixon. Starring Sylvester Stallone, Burt Young. (WS)", 'language': 'eng', 'start': '0xffffffffff', 'country': 'NZL', 'durationinfo': '01:40:24 (UTC)', 'title': '<EM>Rocky Balboa</EM>', 'channel_id': '1097', 'duration': '0x0014024', 'startinfo': '2038-04-22 ff:ff:ff (UTC)', 'ratinginfo': 'minimum age: 9 years'}, ValueError("invalid literal for int() with base 10: 'ff'",)]

-->

Of course the one thing this doesn't do is provide a sample of channels and then programmes belonging to those same channels. Or you can just download the whole file here:
http://musther.googlepages.com/listings.xml.tar.gz

Hope you like python!

Here's my python solution:

Code:

#add the channels you want to this list

wanted = ['1035.dvb.guide',

        '1026.dvb.guide',

        ]





import sys

from xml.etree.ElementTree import ElementTree



#prints out usage message if number of arguments is wrong

if not len(sys.argv)==3:

        print "usage: %s input.xml output.xml"%sys.argv[0]

        sys.exit(1)



#reads the input xml file

input = sys.argv[1]

xml = open(input)

data = xml.read()



#get header and footer, then seek the file back to 0 to start parsing xml info

header = data[:data.find('<tv')-1]

footer = data[data.find('<!--\n'):]

del data

xml.seek(0)



#parse the data

tree = ElementTree(file=xml)

root = tree.getroot()

lroot = list(root)

for element in lroot:

        if not (element.attrib.get('id') in wanted or \

          element.attrib.get('channel') in wanted):

                  root.remove(element)



#write the data

out_xml = open(sys.argv[2],'w')

out_xml.write(header+'\n')

tree.write(out_xml)

out_xml.write('\n'+footer)

Just add whatever you want to the wanted list. Give the script an input and an output and it should work.
If you don't want to type in the ".dvb.guide" for every entry then change the wanted code to be like this:

Code:

wanted = ['1035',

        '1026',

        ]

wanted = [x+'.dvb.guide' for x in wanted]

Hope that does what you wanted...and I hope you have python, or are capable of getting it.

PS: Always wanted to learn a bit about elementtree, and now I did :)

Edit: Ignore this script, Disillusionist script is much better, using element tree for this was overkill, and the performance suffers because of it.

To change the perl script to use an array for multiple values is just a few small changes

1. change the l_selection into an array.
2. define a scalar variable l_choice to hold individual contents of the array @l_selection
3. modify the search statement to use the new scalar
4. set the default of $l_print to 0 outside the inner loop

Full listing here:

Code:

#!/usr/bin/perl -w

use strict;

use File::Copy;



my $l_input="progs.xml";

my $l_output="output.xml";

my @l_selection=qw( 1032.dvb.guide 1026.dvb.guide 1192.dvb.guide );

my $l_choice;

my $l_print=1;



if (open (INPUT, "$l_input"))

{

  open (OUTPUT, "> $l_output") or die "Failed to open output file!\n";



  while (<INPUT>)

  {

      if (/\<channel/)

      {

        $l_print=0;

        foreach $l_choice (@l_selection) {

            if (/$l_choice/)

            {

              $l_print=1;

            }

        }

      }



      if (/\<programme/)

      {

        $l_print=0;

        foreach $l_choice (@l_selection) {

            if (/$l_choice/)

            {

              $l_print=1;

            }

        }

      }



      if (/\<\/tv/)

      {

        $l_print=1;

      }



      if ( $l_print == 1 )

      {

        print OUTPUT "$_";

      }



  }

}

Angrybanana - Nice python script!

Thank you both very much, I've tried the python - perfect, and I'll fiddle with the perl later too (I might as well learn something while I'm doing this).

I've just noticed angrybanana's note (I'm glad to be able to give you the opportunity to learn about element treees!), so when I've had a fiddle with Disillusionist's perl, I'll be using that.

Thank you both again.

Whenever parsing XML, I'd advise against trying to parse the XML structure yourself using regular expressions and so on. The reason is that this sort of approach often makes assumptions about where line breaks and other non-syntax whitespace is, and while your program may work for some or most examples, it will likely break when the input is subtly different because of whitespace.

There are good XML parser libraries for most languages, and so it's a good idea to use them. Perl has the XML::Parser module which is very flexible but a little tricky to use. There are also a bunch of easier to use modules built on top of this, which can use much easier to get to grips with, although don't offer all the flexibility.

There are a few programs which will allow you to manipulate XML from bash scripts, although they are more limited than a proper parsing library. xmlstarlet and xpath spring to mind. I had a brief go at using xmlstarlet, and found a nice mechanism to remove sections with a given ID, but not to remove all sections but a list of known IDs... I think this is just a little too complex for such a program, although I would love to be corrected if someone knows how to do it.

For the record, here's now to remove a named channel from your XML:

Code:

xmlstarlet ed -P -d "/tv/channel[@id='1035.dvb.guide']" input_file.xml > modified_file.xml

Angrybanana's python script looks like the right approach to me.

Oh, one more thing - I just found an XMLTV module for Perl, and a bunch of command-line utilities. It looks like what you want to do it already implemented in the program tv_grep.

Here's a non 3am 1/2 asleep version of my code :)

I looked over the code and found the performance issue. This script will work MUCH faster. This will deal with the issues matthewg42 mentioned of blind parsing.
If this script does what you want I would say use this.

Code:

#add the channels you want to this list

wanted = ('1035.dvb.guide',

        '1026.dvb.guide',

        '1021.dvb.guide',

        '1050.dvb.guide',

        '1071.dvb.guide',

        '1025.dvb.guide',

        )



import sys

from xml.etree.ElementTree import ElementTree



#prints out usage message if number of arguments is wrong

if not len(sys.argv)==3:

        print "usage: %s input.xml output.xml"%sys.argv[0]

        sys.exit(0)



#reads the input xml file

input = sys.argv[1]

xml = open(input)

data = xml.read()



#get header and footer, then seek the file back to 0 to start parsing xml info

header = data[:data.find('<tv')-1]

footer = data[data.find('<!--\n'):]

del data

xml.seek(0)



#parse the data

tree = ElementTree(file=xml)

root = tree.getroot()

wanted_channels = [x for x in root.findall('channel') \

                if x.attrib.get('id') in wanted]

wanted_programme = [x for x in root.findall('programme') \

                if x.attrib.get('channel') in wanted]



#replace children with the ones we want

root[:] = wanted_channels + wanted_programme



#write the data

out_xml = open(sys.argv[2],'w')

out_xml.write(header+'\n')

tree.write(out_xml)

out_xml.write('\n'+footer)

One thing I don't like about my script is how it handles the header/footer. If they are static, I'd feel a lot better if they were added in that way. Currently, they're being searched for in a way that could cause problems in the future if the format changes.

I also noticed that the footer is a comment... Do you need the footer?
If you don't need the footer, I'd make a small change to the script that'd make it a lot better.

Hope this helps.

PS: If you're wondering what the performance issue was, I think it was iterating over the elements, removing one, then reiterating again to remove the next one and so on. Now it's simply gets all the wanted ones and sets the children to that in one go.

EDIT: So we don't have to post back and forth, Here's a version that ignores the footer and uses a static header that you put into the script.

Code:

#add the channels you want to this list

wanted = ('1035.dvb.guide',

        '1026.dvb.guide',

        '1021.dvb.guide',

        '1050.dvb.guide',

        '1071.dvb.guide',

        '1025.dvb.guide',

        )



#Put your custom header here 

header = """<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE tv SYSTEM "xmltv.dtd">"""



import sys

from xml.etree.ElementTree import ElementTree



#prints out usage message if number of arguments is wrong

if not len(sys.argv)==3:

        print "usage: %s input.xml output.xml"%sys.argv[0]

        sys.exit(0)



#reads the input xml file

input = sys.argv[1]

xml = open(input)



#parse the data

tree = ElementTree(file=xml)

root = tree.getroot()

wanted_channels = [x for x in root.findall('channel') \

                if x.attrib.get('id') in wanted]

wanted_programme = [x for x in root.findall('programme') \

                if x.attrib.get('channel') in wanted]



#replace children with the ones we want

root[:] = wanted_channels + wanted_programme



#write the data

out_xml = open(sys.argv[2],'w')

out_xml.write(header+'\n')

tree.write(out_xml)

Hi. This is old thread but i have similar problem as op. I'm trying to filter xml file that has multiple languages. I've done lots of trial and error but no succes so far. My setup can't handle multiple languages so i need to filter unneeded ones (colored red). below is my example:

Code:

<?xml version="1.0" encoding="utf-8" ?>

<!DOCTYPE tv SYSTEM "xmltv.dtd">

<tv generator-info-name="TVHeadend-4.2.8-34~g24a2f59e9" source-info-name="tvh-Tvheadend">

<channel id="f7fa62af37560ce2835bf5a1ec414b2a">

  <display-name>Sky News</display-name>

  <display-name>85</display-name>

</channel>

<channel id="7c0cb8307321aa08b16e4ec05711e672">

  <display-name>HISTORY HD</display-name>

  <display-name>124</display-name>

</channel>

<programme start="20210425190000 +0300" stop="20210425230000 +0300" channel="6751678153fa02ec1bc10d516d2d1450">

  <title lang="eng">Airing Break</title>

  <title lang="fin">Ei lähetystä</title>

  <title lang="ger">Sendepause</title>

  <title lang="ita">Airing Break</title>

  <title lang="rus">Airing Break</title>

  <title lang="swe">Sändningsuppehåll</title>

  <sub-title lang="eng">The channel is currently not airing.</sub-title>

  <sub-title lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</sub-title>

  <sub-title lang="ger">Momentan wird nichts ausgestrahlt.</sub-title>

  <sub-title lang="ita">Airing Break</sub-title>

  <sub-title lang="rus">Airing Break</sub-title>

  <sub-title lang="swe">Kanalen sänds för närvarande inte.</sub-title>

  <sub-title lang="tur">Airing Break</sub-title>

  <desc lang="eng">The channel is currently not airing.</desc>

  <desc lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</desc>

  <desc lang="ger">Momentan wird nichts ausgestrahlt.</desc>

  <desc lang="ita">Airing Break</desc>

  <desc lang="rus">Airing Break</desc>

  <desc lang="swe">Kanalen sänds för närvarande inte.</desc>

  <desc lang="tur">Airing Break</desc>

</programme>

<programme start="20210425230000 +0300" stop="20210426000000 +0300" channel="6751678153fa02ec1bc10d516d2d1450">

  <title lang="eng">Airing Break</title>

  <title lang="fin">Ei lähetystä</title>

  <title lang="ger">Sendepause</title>

  <title lang="ita">Airing Break</title>

  <title lang="rus">Airing Break</title>

  <title lang="swe">Sändningsuppehåll</title>

  <sub-title lang="eng">The channel is currently not airing.</sub-title>

  <sub-title lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</sub-title>

  <sub-title lang="ger">Momentan wird nichts ausgestrahlt.</sub-title>

  <sub-title lang="ita">Airing Break</sub-title>

  <sub-title lang="rus">Airing Break</sub-title>

  <sub-title lang="swe">Kanalen sänds för närvarande inte.</sub-title>

  <sub-title lang="tur">Airing Break</sub-title>

  <desc lang="eng">The channel is currently not airing.</desc>

  <desc lang="fin">Kanavalla ei tällä hetkellä lähetetä ohjelmaa.</desc>

  <desc lang="ger">Momentan wird nichts ausgestrahlt.</desc>

  <desc lang="ita">Airing Break</desc>

  <desc lang="rus">Airing Break</desc>

  <desc lang="swe">Kanalen sänds för närvarande inte.</desc>

  <desc lang="tur">Airing Break</desc>

</programme>

Old but interesting problem. Here is one more solution:

Code:

# works only with at least to channel ids

keep=1012.dvb.guide,1021.dvb.guide



ch=$(eval echo @id=\{\\\'${keep//,/\\\',\\\'}\\\'\}\" or \")

pr=$(eval echo @channel=\{\\\'${keep//,/\\\',\\\'}\\\'\}\" or \")

c="${ch% or }"

p="${pr% or }"



xmllint --xpath "/tv/programme[$p] | /tv/channel[$c]" input.xml > output.xml

PS:
The output still needs to wrapped in <tv></tv> tags.

Quote:

Originally Posted by crts (Post 6245552)

Old but interesting problem. Here is one more solution:

Code:

# works only with at least to channel ids

keep=1012.dvb.guide,1021.dvb.guide



ch=$(eval echo @id=\{\\\'${keep//,/\\\',\\\'}\\\'\}\" or \")

pr=$(eval echo @channel=\{\\\'${keep//,/\\\',\\\'}\\\'\}\" or \")

c="${ch% or }"

p="${pr% or }"



xmllint --xpath "/tv/programme[$p] | /tv/channel[$c]" input.xml > output.xml

Can I adjust this code to suit my problem? Like this:

Between <programme> and </programme> every line that includes "lang="fin"" will be keeped and if not the line is removed.

Thanks.

Quote:

Originally Posted by Jtmstr09 (Post 6245567)

Can I adjust this code to suit my problem?

No, an xpath only selects nodes, you cannot delete nodes via xpath. And my solution is not 100% because you still need to put the result inside <tv></tv> tags. If you want to delete certain subnodes you will probably need an xslt processor or do this in another language, e.g., python or java. I suggest you open a new thread for your problem and maybe reference this one.