ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm trying to get XML to store HTML data. I figured that if I told it in the DTD that it was #PCDATA, it would just take it as text and wouldn't try to interpret it. Unfortunately, it thinks I'm bringing up tags I didn't mention in the DTD.
What do I do now? If I were to use a syntax like <html>(etc...) the file would be bigger, and I would have to do a bunch of text crunching. Why can't it just take it as #PCDATA?
Well, yeah, the're tags, but can't I tell it they're not? What's a DTD for becides to tell it what things are? And what do you mean by throw out the DTD?
what exactly are you trying to do? if it's just your own XML data and your own XML parser, you don't really need the DTD. the DTD is just saying "this document is valid XML as long is it follows the rules of XML .... here are the elements which are valid for this XML document". that's why you can't throw random tags in there. On the other hand, if you omit the DTD then you're just saying "this document is valid XML as long as it follows the rules of XML .... the tags are arbitrary"
i've never used PHP & XML together. just stick with your original workaround. is the html you're tryong to store static? if so the following perl code might help (i purposely avoided modules so you won't need to install anything else):
PHP Code:
#!/usr/bin/perl
open (T, "the_html_file.html"); my $dirty = join <T>, ''; $dirty =~ s/<(?:BACKSLASHw)/</g; $dirty =~ s/(?:BACKSLASHw)>/>/g; print $dirty; close (T);
where BACKSLASH is \
now that's totally untested, but it should work fine. if your html data is dynamic, you can use the same regex's in php. try to keep the DTD, it's good practice.
let me know how it turned out for you.
Last edited by lackluster; 08-20-2003 at 10:35 PM.
"A CDATA section is used when a significant amount of data should be passed on to the calling application without any XML parsing."
<![CDATA[
<html></html>
]]>
CDATA sections keep all whitespace, all bad characters, etc. etc. but you MUST be 100% sure that the contents do not contain "]]>" or bad things will happen.
PCDATA stands for processed character data, which means that the data goes through the xml parser, and html will cause it to raise an exception.
CDATA is not processed so what you put in is essentially what you get out. However I would suggest that you try to avoid wraping html in xml, a cleaner way is to encode only your data in xml, then use xsl to generate your markup (html) and use css for your style. If speed is a concern then implement a cache.
that is exactly the architecture I have been working with for ~2 years now, and it works extremely well. However, there are instances where you may wish to preserve someone else's preformatted HTML and you would have to parse it into XML, then back into HTML in a similar form which would prove a PITA, so CDATA sections would probably come in useful.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.