LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Perl Search and Replace XML tags conditionally (https://www.linuxquestions.org/questions/programming-9/perl-search-and-replace-xml-tags-conditionally-839549/)

rammyp_1979 10-21-2010 08:42 AM

Perl Search and Replace XML tags conditionally
 
Hi there,

I am a newbie to Perl and I am trying to write a script that conditionally searches for particular tag and then replaces some elements within it.

Here is the example XML

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>

The script I want to write would search for Sequence element and change the datatype to Integer eg the output should be:-

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="Integer" PRECISION ="19"/>
</TRANSFORMATION>

I tried doing a find and replace but clearly that would do everything. I need to only do it the Sequence transformation tags. Your expert advice would be appreciated!!!

Sergei Steshenko 10-21-2010 10:09 AM

Quote:

Originally Posted by rammyp_1979 (Post 4134788)
Hi there,

I am a newbie to Perl and I am trying to write a script that conditionally searches for particular tag and then replaces some elements within it.

Here is the example XML

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>

The script I want to write would search for Sequence element and change the datatype to Integer eg the output should be:-

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="Integer" PRECISION ="19"/>
</TRANSFORMATION>

I tried doing a find and replace but clearly that would do everything. I need to only do it the Sequence transformation tags. Your expert advice would be appreciated!!!

Don't reinvent the wheel, start from http://search.cpan.org/search?query=XML+parser&mode=all . Specifically remember that XML is not a line-oriented language.

theNbomr 10-21-2010 10:53 AM

Your XML fragment is incomplete, so it is difficult to do any comprehensive testing, but I can offer this advice: don't try to contrive a Perl parser for your XML data. Use one of the numerous ready-made modules for the purpose. Which one you choose will depend somewhat on the nature of your data and your application. The easiest one to use is, not surprisingly, XML::Simple, which is a SAX style parser. The potential downside to this parser is that the order of sibling XML elements may be lost in the parse, and that it does not provide a method to write the acquired XML data back to a file.
The other two prominent XML parsers for Perl are the expat-based XML::Parser and the xerces-based XML::Xerces. The APIs for these two are distinctly different, and the XML::Xerces module also provides a method for writing your XML data back to a file.
A sample Perl script that uses the XML::Simple module:
Code:

#! /bin/perl -w
#
#        LQrammyp_1979.pl
#
#        Reads LQrammyp_979.xml by default
#
use strict;

use XML::Simple;
use Data::Dumper;

my $xmlData;

  my $xmlReader = XML::Simple->new();
  $xmlData = $xmlReader->XMLin();
  # print Dumper( %{$xmlData} );

  foreach my $xmlKey ( keys %{$xmlData} ){
    print "Element: $xmlKey\n" ;
    if( $xmlKey eq "TRANSFORMATION" ){
      foreach my $transformation  ( @{$xmlData->{$xmlKey}} ){
        print "DESCR: '$transformation->{DESCRIPTION}'\n";
        print "TYPE :'$transformation->{TYPE}'\n";
        print "\tXFORM DTYPE:'$transformation->{TRANSFORMFIELD}->{DATATYPE}'\n";
        print "\tXFORM PREC :'$transformation->{TRANSFORMFIELD}->{PRECISION}'\n";
        if( $transformation->{TYPE} eq "Sequence" ){
          $transformation->{TRANSFORMFIELD}->{DATATYPE}="bigint";
        }                               
      }
    }
  }

This code demonstrates reading the XML (sample data modified, to create valid XML file) data, displaying the content, and modifying the relevant parts. You can create a XML file writer by doing somewhat the reverse of the process demonstrated here. It should be noted that, even though your XML data was posted completely un-formatted here, that the data was still parse-able, which is the basis for my argument against trying to write a parseer that assumes anything about the formatting of the input XML data.

--- rod.

EDIT: As I composed my reply, Sergei was posting his response, and I see that he has given principally the same advice.

rammyp_1979 10-21-2010 04:37 PM

Thanks for this. Unfortunately when I use the XMLOut function some of the tags are not coming out correctly. For example
in my original XML File I had

<PARTITION DESCRIPTION ="" NAME ="Partition #1"/>

this now translates to :-

<PARTITION NAME="Partition #1" DESCRIPTION="" />

Any ideas? should I use something other than XML:Simple?

theNbomr 10-21-2010 05:20 PM

This is what I meant when I said "the order of sibling XML elements may be lost in the parse". While this is still perfectly valid XML, some applications are sensitive to order, and XML::Simple will not work where this is the case.

--- rod.

Sergei Steshenko 10-21-2010 06:15 PM

Quote:

Originally Posted by theNbomr (Post 4135231)
This is what I meant when I said "the order of sibling XML elements may be lost in the parse". While this is still perfectly valid XML, some applications are sensitive to order, and XML::Simple will not work where this is the case.

--- rod.

Or, in Perlish English: XML is like Perl hash - by default after manipulation order of (key, value) pairs is not guaranteed.

theNbomr 10-21-2010 06:34 PM

Not sure what you mean by "after manipulation", but as I understand it, Perl hashes are always unordered. Since XML::Simple stores data in hashes, the order of retrieval, and presumably therefore, the order of writing, is not preserved.

--- rod.

Sergei Steshenko 10-21-2010 06:52 PM

Quote:

Originally Posted by theNbomr (Post 4135284)
Not sure what you mean by "after manipulation", but as I understand it, Perl hashes are always unordered. Since XML::Simple stores data in hashes, the order of retrieval, and presumably therefore, the order of writing, is not preserved.

--- rod.

If you fill a Perl hash and then just iterate over it not changing/deleting/adding keys and not changing values, the order of keys is the same in each iteration.

If you change in a Perl hash something, you can't guarantee preservation of order.

OTOH, there are modules implementing hashes with constant keys order (IIRC). It probably won't help the OP since most likely the XML parsers don't use such hashes.

rammyp_1979 10-22-2010 04:20 AM

Quote:

Originally Posted by theNbomr (Post 4135231)
This is what I meant when I said "the order of sibling XML elements may be lost in the parse". While this is still perfectly valid XML, some applications are sensitive to order, and XML::Simple will not work where this is the case.

--- rod.

Thanks Rod. I guess my application is sensitive to order. Is there any other XML parsers out there that would do this? Or can we mainpulate XML Simple in anyway to poduce the same result?

Sergei Steshenko 10-22-2010 04:33 AM

Quote:

Originally Posted by rammyp_1979 (Post 4135652)
...I guess my application is sensitive to order. ...

Why not fix the application ?

rammyp_1979 10-22-2010 07:15 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4135656)
Why not fix the application ?

If only :( Unfortunately the application is a 3rd party tool that I do not have control over.

kurumi 10-22-2010 07:34 AM

Code:

#!/usr/bin/env ruby
file=ARGV[0]
require 'rexml/document'
include REXML
s=""
File.open(file) do |doc|
  xml = Document.new(doc)
  xml.elements.each("ROOT/TRANSFORMATION") do |elem|
    if elem.attributes["TYPE"] == "Sequence"
      formatter = REXML::Formatters::Default.new
      elem.elements["TRANSFORMFIELD"].attributes["DATATYPE"] = "Integer"
      formatter.write(xml,s)
    end
  end
end
print s

Code:

$ cat xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
</ROOT>

$ ruby parse.rb xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION='' TYPE='Sequence'>
<TRANSFORMFIELD DATATYPE='Integer' PRECISION='19'/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION='' TYPE='Expression'>
<TRANSFORMFIELD DATATYPE='bigint' PRECISION='19'/>
</TRANSFORMATION>
</ROOT>


rammyp_1979 10-22-2010 08:06 AM

Quote:

Originally Posted by kurumi (Post 4135815)
Code:

#!/usr/bin/env ruby
file=ARGV[0]
require 'rexml/document'
include REXML
s=""
File.open(file) do |doc|
  xml = Document.new(doc)
  xml.elements.each("ROOT/TRANSFORMATION") do |elem|
    if elem.attributes["TYPE"] == "Sequence"
      formatter = REXML::Formatters::Default.new
      elem.elements["TRANSFORMFIELD"].attributes["DATATYPE"] = "Integer"
      formatter.write(xml,s)
    end
  end
end
print s

Code:

$ cat xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
</ROOT>

$ ruby parse.rb xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION='' TYPE='Sequence'>
<TRANSFORMFIELD DATATYPE='Integer' PRECISION='19'/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION='' TYPE='Expression'>
<TRANSFORMFIELD DATATYPE='bigint' PRECISION='19'/>
</TRANSFORMATION>
</ROOT>


Interesting. Can this be done in ksh - I dont have ruby

theNbomr 10-22-2010 08:22 AM

Quote:

Originally Posted by rammyp_1979 (Post 4135652)
Thanks Rod. I guess my application is sensitive to order. Is there any other XML parsers out there that would do this? Or can we mainpulate XML Simple in anyway to poduce the same result?

If you have sufficient knowledge of the required XML data and the way it is organized, you could create your own writer. It would mean extracting the known data from the Perl data structures in the desired order, and writing them back out to the XML file that way.
The order of tag attributes is usually not a factor for XML-using applications, so if it has been shown that your application is sensitive in that way, it would not be surprising if it is also sensitive to other formatting issues, such as contained whitespace. Sounds like your application may have been written in just the manner that I and Sergei cautioned against. Pity.

--- rod.

Sergei Steshenko 10-22-2010 09:01 AM

Quote:

Originally Posted by rammyp_1979 (Post 4135793)
If only :( Unfortunately the application is a 3rd party tool that I do not have control over.

File a bug against the application.

crts 10-22-2010 09:11 AM

Hi,

seeing that
- your application is on bad terms with the perl-parser
- you do not have ruby
- your application might be very formatting sensitive , thus
- maybe also on bad terms with ruby

well, maybe you could use something simpler approach, like
Code:

$ cat file
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>

$ sed -r "/<TRANSFORMATION.*TYPE =\"Sequence\"/,/<\/TRANSFORMATION>/ {/TRANSFORMFIELD/ s/(DATATYPE[[:blank:]]*=[[:blank:]]*\")[^\"]*/\1INTEGER/}" file
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="INTEGER" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="INTEGER" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="INTEGER" PRECISION ="19"/>
</TRANSFORMATION>

Maybe you do not need sed and you can use the regex with perl, not sure about that since I am not familiar with perl. I am also not sure if the '[[:blank:]]*' are needed or if some more are needed. Since you did not use [CODE][/CODE] tags to post your sample data there might be some formatting issues that went unnoticed.
If you need other tags changed you might have to exchange the keywords with variables.


All times are GMT -5. The time now is 02:47 PM.