LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   xml parser error in perl 5.8.9 (https://www.linuxquestions.org/questions/programming-9/xml-parser-error-in-perl-5-8-9-a-815401/)

jothish1 06-21-2010 06:30 AM

xml parser error in perl 5.8.9
 
Hi all,

I have a set of XML files where one will be referenced in another. For example A->B->C->D etc.

The intent of the script is to take information from all of these XMLs and then work on that data.

The script worked fine in perl version 5.8.8 but was giving a peculiar error when run in perl-5.8.9

The error is "xml declaration not at start of external entity at line 37, column 11, byte 1307 at /home/tools/perl-5.12.1/Linux-64bit/lib/site_perl/5.12.1/x86_64-linux/XML /Parser.pm line 187"

The error seems to be coming because of the first line in each of the XMLs, which is <?xml version="1.0" encoding="ISO-8859-1"?>, which I learned provides information regarding the type of encoding used.

So where am I going wrong ??

Thanks for your help

Regards,
joe

sample top level xml file:



<?xml version="1.0" encoding="ISO-8859-1"?>
<top xmlns:xi="http://www.w3.org/2001/XInclude">
<module name="TOP">
<moduleref name="module_a"/>
<moduleref name="module_b"/>
<moduleref name="module_c"/>
<data>data</data>
<data>data</data>
</module>
<xi:include href="$WORK_ROOT/MOD_A/module_a.xml"/>
<xi:include href="$WORK_ROOT/MOD_B/module_b.xml"/>
<xi:include href="$WORK_ROOT/MOD_C/module_c.xml"/>
</top>
[download]



sample module xml



<?xml version="1.0" encoding="ISO-8859-1"?>
<record xmlns:xi="http://www.w3.org/2001/XInclude">
<module type="record" name="module_a">
<data>data</data>
<data>data</data>
</module>
</record>
[download]



Now for the code i use to process these xml files,

This code expands the top level xml file,



use XML::DOM;
use XML::SAX;
use XML::SAX::Writer;
use FindBin;
use lib "$FindBin::Bin";
use PATHREF;
use IO::File;
use File::Find;
use File::Copy;
use Cwd;
use strict;

my $input_file = "top.xml";
my $output_file = "output.xml";

my $output = new IO::File ">$output_file";
print "Expanding include tags...\n";
my $parser = XML::SAX::ParserFactory->parser(
Handler =>XML::Filter::XInclude->new(
Handler => XML::SAX::Writer->new(Output=>$output)
)
);
$parser->parse_uri($input_file);
close($output);
[download]



This code is used to parse the expanded top.xml where i get the error,



$parser = new XML::DOM::Parser;
my $doc = $parser->parsefile("$output_file");
[download]



Note : 1.the module PATHREF is actually the module XML::Filter::XInclude modified to process ENV variable $WORK_ROOT
2.also i ran the script both in 5.8.9 and 5.12.1 but still the same error




PS: This info might help,
i tried removing the content <?xml version="1.0" encoding="ISO-8859-1"?> from the expanded top.xml and there were no errors!!! so does that mean i need not give this info in each of the XMLs or do i have to remove this after expansion everytime to avoid the error ?

paulsm4 06-21-2010 11:15 PM

Q: have you removed all blank lines preceding your <?xml ... ?> XML declaration tag?

Sergei Steshenko 06-21-2010 11:33 PM

Quote:

Originally Posted by paulsm4 (Post 4010954)
Q: have you removed all blank lines preceding your <?xml ... ?> XML declaration tag?

Just curious - is XML blank lines sensitive ? Sounds odd to me.
...
Regarding the OP's problem - I didn't read the module documentation; quite possibly there is a setting that would resolve the problem.

Anyway, since the error message comes with source code file name and line number, it's is possible to deduce what caused the message, and this is what I would do had I been the OP.

paulsm4 06-22-2010 12:19 AM

Hi, Sergei -

In general, yes: XML can handle blank lines (and whitespace/indentation between tags, etc etc).

But that's not always the case "in the real world". I think there's an excellent chance that the parser somehow thinks the document has already "started" by the time it gets to the <?xml ?> declaration. I don't know how or why (the OP hasn't told us enough information), but I honestly think there's a very good chance that deleting the 37 blank lines before the declaration (and any apparently "blank" spaces on the declaration line itself) might fix the problem.

For example:
Quote:

http://groups.google.com/group/cake-...6a019730ca046d
Quote:

Fixed it. CakePHP users beware of leaving spaces in your model,
controller, view etc.. This error was a pain to track down. All it
takes is one space and your fun bus is up on blocks for RSS... Only
took a couple of days to track down...........

IMHO .. PSM

jothish1 06-23-2010 06:54 AM

I've not removed the blank lines and still this script was working without issues in perl-5.8.8
The problem arose when I migrated to perl-5.8.9.


All times are GMT -5. The time now is 07:01 PM.