Perl: handling of UTF-8 in XML and HTML
Not sure where to start.
Overview: I have a big file of Western-encoded messages, you could call it some sorta non-standard blog, marked-up with HTML. Am now trying to clean up things, storing the messages as XML (more specific: as RSS 2.0), and displaying them as HTML. The encoding should change to UTF-8. Most of the things work, just the UTF-8 encoding of special entities for the XML drives me nuts. I have been deploying the XML::RSS Perl module, which might or might not be a good idea after testing it. E.g. sometimes the encode_output switch is being ignored depending on which server I execute the script. It also seems XML::RSS does not correctly support the common way of encoding/decoding UTF-8 entities. In those cases where the mentioned "encode_output" of XML::RSS does work it produces something like this for the lower-case 'a' with two dots on top: Code:
ä I have read _thousands_ of websites on the topic, and it _seems_ that the encoding for the above example should have been: Code:
쎤 Question 1: Are both ways above correct when encoding UTF-8 in XML? Question 2: Is using XML::RSS a bad idea? Any alternatives? Question 3: How to best encode those entities in HTML for output? Question 4: Could it be that RSS readers better support decimal encoding, e.g. Code:
쎤 Code:
쎤 Thanks for any hints!! Am stuck here and feel oblivious. |
That's some fairly specialized qns. While you're waiting here you may want to also ask at www.perlmonks.org. Its where the Perl gurus hang out.
But do post the soln here when you get it so we all benefit. |
too old Perl
Btw, it slowly turned out this was all due to an old version of Perl and the corresponding old Perl modules. Things have probably been fixed in newer Perl releases.
As I had to run things on some provider's server, I could not influence the Perl version deployed. So I gave up on making this work. I might just switch to PHP with my software which is usually better updated by hosting providers. This type of things (UTF encoding etc) usually work smoother in PHP in my experience. |
Quote:
perl-5.10.0 is relocatable/portable (when you build it correspondingly), i.e. you could build your own version of perl-5.10.0 in whatever directory you have write permission to and use it either from the place it's been built or from any other directory you could copy it to. |
Thanks Sergei, an option one could consider. And in any case, I still am a Perl fan.
|
Quote:
|
All times are GMT -5. The time now is 09:22 PM. |