LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-21-2010, 08:42 AM   #1
rammyp_1979
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Rep: Reputation: 0
Perl Search and Replace XML tags conditionally


Hi there,

I am a newbie to Perl and I am trying to write a script that conditionally searches for particular tag and then replaces some elements within it.

Here is the example XML

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>

The script I want to write would search for Sequence element and change the datatype to Integer eg the output should be:-

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="Integer" PRECISION ="19"/>
</TRANSFORMATION>

I tried doing a find and replace but clearly that would do everything. I need to only do it the Sequence transformation tags. Your expert advice would be appreciated!!!
 
Old 10-21-2010, 10:09 AM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by rammyp_1979 View Post
Hi there,

I am a newbie to Perl and I am trying to write a script that conditionally searches for particular tag and then replaces some elements within it.

Here is the example XML

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>

The script I want to write would search for Sequence element and change the datatype to Integer eg the output should be:-

<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="Integer" PRECISION ="19"/>
</TRANSFORMATION>

I tried doing a find and replace but clearly that would do everything. I need to only do it the Sequence transformation tags. Your expert advice would be appreciated!!!
Don't reinvent the wheel, start from http://search.cpan.org/search?query=XML+parser&mode=all . Specifically remember that XML is not a line-oriented language.
 
Old 10-21-2010, 10:53 AM   #3
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Your XML fragment is incomplete, so it is difficult to do any comprehensive testing, but I can offer this advice: don't try to contrive a Perl parser for your XML data. Use one of the numerous ready-made modules for the purpose. Which one you choose will depend somewhat on the nature of your data and your application. The easiest one to use is, not surprisingly, XML::Simple, which is a SAX style parser. The potential downside to this parser is that the order of sibling XML elements may be lost in the parse, and that it does not provide a method to write the acquired XML data back to a file.
The other two prominent XML parsers for Perl are the expat-based XML::Parser and the xerces-based XML::Xerces. The APIs for these two are distinctly different, and the XML::Xerces module also provides a method for writing your XML data back to a file.
A sample Perl script that uses the XML::Simple module:
Code:
#! /bin/perl -w
#
#	LQrammyp_1979.pl
#
#	Reads LQrammyp_979.xml by default
#
use strict;

use XML::Simple;
use Data::Dumper;

my $xmlData;

  my $xmlReader = XML::Simple->new();
  $xmlData = $xmlReader->XMLin();
  # print Dumper( %{$xmlData} );

  foreach my $xmlKey ( keys %{$xmlData} ){
    print "Element: $xmlKey\n" ;
    if( $xmlKey eq "TRANSFORMATION" ){
      foreach my $transformation  ( @{$xmlData->{$xmlKey}} ){
        print "DESCR: '$transformation->{DESCRIPTION}'\n";
        print "TYPE :'$transformation->{TYPE}'\n";
        print "\tXFORM DTYPE:'$transformation->{TRANSFORMFIELD}->{DATATYPE}'\n";
        print "\tXFORM PREC :'$transformation->{TRANSFORMFIELD}->{PRECISION}'\n";
        if( $transformation->{TYPE} eq "Sequence" ){
          $transformation->{TRANSFORMFIELD}->{DATATYPE}="bigint";
        }				
      }
    }
  }
This code demonstrates reading the XML (sample data modified, to create valid XML file) data, displaying the content, and modifying the relevant parts. You can create a XML file writer by doing somewhat the reverse of the process demonstrated here. It should be noted that, even though your XML data was posted completely un-formatted here, that the data was still parse-able, which is the basis for my argument against trying to write a parseer that assumes anything about the formatting of the input XML data.

--- rod.

EDIT: As I composed my reply, Sergei was posting his response, and I see that he has given principally the same advice.

Last edited by theNbomr; 10-21-2010 at 11:24 AM.
 
Old 10-21-2010, 04:37 PM   #4
rammyp_1979
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks for this. Unfortunately when I use the XMLOut function some of the tags are not coming out correctly. For example
in my original XML File I had

<PARTITION DESCRIPTION ="" NAME ="Partition #1"/>

this now translates to :-

<PARTITION NAME="Partition #1" DESCRIPTION="" />

Any ideas? should I use something other than XML:Simple?
 
Old 10-21-2010, 05:20 PM   #5
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
This is what I meant when I said "the order of sibling XML elements may be lost in the parse". While this is still perfectly valid XML, some applications are sensitive to order, and XML::Simple will not work where this is the case.

--- rod.
 
Old 10-21-2010, 06:15 PM   #6
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by theNbomr View Post
This is what I meant when I said "the order of sibling XML elements may be lost in the parse". While this is still perfectly valid XML, some applications are sensitive to order, and XML::Simple will not work where this is the case.

--- rod.
Or, in Perlish English: XML is like Perl hash - by default after manipulation order of (key, value) pairs is not guaranteed.
 
Old 10-21-2010, 06:34 PM   #7
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Not sure what you mean by "after manipulation", but as I understand it, Perl hashes are always unordered. Since XML::Simple stores data in hashes, the order of retrieval, and presumably therefore, the order of writing, is not preserved.

--- rod.
 
Old 10-21-2010, 06:52 PM   #8
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by theNbomr View Post
Not sure what you mean by "after manipulation", but as I understand it, Perl hashes are always unordered. Since XML::Simple stores data in hashes, the order of retrieval, and presumably therefore, the order of writing, is not preserved.

--- rod.
If you fill a Perl hash and then just iterate over it not changing/deleting/adding keys and not changing values, the order of keys is the same in each iteration.

If you change in a Perl hash something, you can't guarantee preservation of order.

OTOH, there are modules implementing hashes with constant keys order (IIRC). It probably won't help the OP since most likely the XML parsers don't use such hashes.
 
Old 10-22-2010, 04:20 AM   #9
rammyp_1979
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by theNbomr View Post
This is what I meant when I said "the order of sibling XML elements may be lost in the parse". While this is still perfectly valid XML, some applications are sensitive to order, and XML::Simple will not work where this is the case.

--- rod.
Thanks Rod. I guess my application is sensitive to order. Is there any other XML parsers out there that would do this? Or can we mainpulate XML Simple in anyway to poduce the same result?
 
Old 10-22-2010, 04:33 AM   #10
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by rammyp_1979 View Post
...I guess my application is sensitive to order. ...
Why not fix the application ?
 
Old 10-22-2010, 07:15 AM   #11
rammyp_1979
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Sergei Steshenko View Post
Why not fix the application ?
If only Unfortunately the application is a 3rd party tool that I do not have control over.
 
Old 10-22-2010, 07:34 AM   #12
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
Code:
#!/usr/bin/env ruby 
file=ARGV[0]
require 'rexml/document'
include REXML
s=""
File.open(file) do |doc|
  xml = Document.new(doc)
  xml.elements.each("ROOT/TRANSFORMATION") do |elem|
    if elem.attributes["TYPE"] == "Sequence"
      formatter = REXML::Formatters::Default.new
      elem.elements["TRANSFORMFIELD"].attributes["DATATYPE"] = "Integer"
      formatter.write(xml,s)
    end
  end
end
print s
Code:
$ cat xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
</ROOT>

$ ruby parse.rb xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION='' TYPE='Sequence'>
<TRANSFORMFIELD DATATYPE='Integer' PRECISION='19'/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION='' TYPE='Expression'>
<TRANSFORMFIELD DATATYPE='bigint' PRECISION='19'/>
</TRANSFORMATION>
</ROOT>
 
Old 10-22-2010, 08:06 AM   #13
rammyp_1979
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by kurumi View Post
Code:
#!/usr/bin/env ruby 
file=ARGV[0]
require 'rexml/document'
include REXML
s=""
File.open(file) do |doc|
  xml = Document.new(doc)
  xml.elements.each("ROOT/TRANSFORMATION") do |elem|
    if elem.attributes["TYPE"] == "Sequence"
      formatter = REXML::Formatters::Default.new
      elem.elements["TRANSFORMFIELD"].attributes["DATATYPE"] = "Integer"
      formatter.write(xml,s)
    end
  end
end
print s
Code:
$ cat xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Sequence">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION ="" TYPE ="Expression">
<TRANSFORMFIELD DATATYPE ="bigint" PRECISION ="19"/>
</TRANSFORMATION>
</ROOT>

$ ruby parse.rb xmlfile
<ROOT>
<TRANSFORMATION DESCRIPTION='' TYPE='Sequence'>
<TRANSFORMFIELD DATATYPE='Integer' PRECISION='19'/>
</TRANSFORMATION>
<TRANSFORMATION DESCRIPTION='' TYPE='Expression'>
<TRANSFORMFIELD DATATYPE='bigint' PRECISION='19'/>
</TRANSFORMATION>
</ROOT>
Interesting. Can this be done in ksh - I dont have ruby
 
Old 10-22-2010, 08:22 AM   #14
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by rammyp_1979 View Post
Thanks Rod. I guess my application is sensitive to order. Is there any other XML parsers out there that would do this? Or can we mainpulate XML Simple in anyway to poduce the same result?
If you have sufficient knowledge of the required XML data and the way it is organized, you could create your own writer. It would mean extracting the known data from the Perl data structures in the desired order, and writing them back out to the XML file that way.
The order of tag attributes is usually not a factor for XML-using applications, so if it has been shown that your application is sensitive in that way, it would not be surprising if it is also sensitive to other formatting issues, such as contained whitespace. Sounds like your application may have been written in just the manner that I and Sergei cautioned against. Pity.

--- rod.
 
Old 10-22-2010, 09:01 AM   #15
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by rammyp_1979 View Post
If only Unfortunately the application is a 3rd party tool that I do not have control over.
File a bug against the application.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl search and replace J_Szucs Programming 3 08-18-2010 02:51 AM
Using key to match against source.txt file to add xml tags to names in Perl ginny2010 Programming 3 06-22-2010 01:16 PM
search and replace text between tags pchoudhary Linux - Newbie 3 06-22-2009 01:14 PM
perl replace script for xml ruffles Programming 8 06-17-2008 05:30 PM
problem in perl replace command with slash (/) in search/replace string ramesh_ps1 Red Hat 4 09-10-2003 01:04 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration