LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-20-2003, 10:17 AM   #1
Travis86
Member
 
Registered: Dec 2002
Location: The land of GMT -6
Distribution: OS X, PS2 Linux, Ubuntu, IRIX 6.5
Posts: 399

Rep: Reputation: 31
XML can't store HTML?!?


I'm trying to get XML to store HTML data. I figured that if I told it in the DTD that it was #PCDATA, it would just take it as text and wouldn't try to interpret it. Unfortunately, it thinks I'm bringing up tags I didn't mention in the DTD.

What do I do now? If I were to use a syntax like <html>(etc...) the file would be bigger, and I would have to do a bunch of text crunching. Why can't it just take it as #PCDATA?

Thanks.
 
Old 08-20-2003, 10:20 AM   #2
lackluster
Member
 
Registered: Apr 2002
Location: D.C - USA
Distribution: slackware-current
Posts: 488

Rep: Reputation: 30
because they're tags, of course. if you want to store HTML stuff, either do what you suggested with < > or throw out the DTD ..... good luck
 
Old 08-20-2003, 12:09 PM   #3
Travis86
Member
 
Registered: Dec 2002
Location: The land of GMT -6
Distribution: OS X, PS2 Linux, Ubuntu, IRIX 6.5
Posts: 399

Original Poster
Rep: Reputation: 31
Well, yeah, the're tags, but can't I tell it they're not? What's a DTD for becides to tell it what things are? And what do you mean by throw out the DTD?
 
Old 08-20-2003, 01:51 PM   #4
lackluster
Member
 
Registered: Apr 2002
Location: D.C - USA
Distribution: slackware-current
Posts: 488

Rep: Reputation: 30
what exactly are you trying to do? if it's just your own XML data and your own XML parser, you don't really need the DTD. the DTD is just saying "this document is valid XML as long is it follows the rules of XML .... here are the elements which are valid for this XML document". that's why you can't throw random tags in there. On the other hand, if you omit the DTD then you're just saying "this document is valid XML as long as it follows the rules of XML .... the tags are arbitrary"
 
Old 08-20-2003, 03:37 PM   #5
Travis86
Member
 
Registered: Dec 2002
Location: The land of GMT -6
Distribution: OS X, PS2 Linux, Ubuntu, IRIX 6.5
Posts: 399

Original Poster
Rep: Reputation: 31
I'm trying to send the XML through PHP. I think I tried it withough a DTD when I first started, but PHP requires a DTD.

However, I might try something like:

<!DOCTYPE menu [
<!ELEMENT menu ALL>
]>

hmmm.... Do you think I could still use attibutes and things then?

I'd really just like to tell it not to interpret what I've marked as #PCDATA, but alas.
 
Old 08-20-2003, 10:33 PM   #6
lackluster
Member
 
Registered: Apr 2002
Location: D.C - USA
Distribution: slackware-current
Posts: 488

Rep: Reputation: 30
i've never used PHP & XML together. just stick with your original workaround. is the html you're tryong to store static? if so the following perl code might help (i purposely avoided modules so you won't need to install anything else):

PHP Code:
#!/usr/bin/perl

open (T"the_html_file.html");
my $dirty join <T>, '';
$dirty =~ s/<(?:BACKSLASHw)/&lt;/g;
$dirty =~ s/(?:BACKSLASHw)>/&gt;/g;
print 
$dirty;
close (T); 
where BACKSLASH is \

now that's totally untested, but it should work fine. if your html data is dynamic, you can use the same regex's in php. try to keep the DTD, it's good practice.

let me know how it turned out for you.

Last edited by lackluster; 08-20-2003 at 10:35 PM.
 
Old 08-20-2003, 11:09 PM   #7
Travis86
Member
 
Registered: Dec 2002
Location: The land of GMT -6
Distribution: OS X, PS2 Linux, Ubuntu, IRIX 6.5
Posts: 399

Original Poster
Rep: Reputation: 31
I dunno. A little too dirty for me, but it might be what I'll have to do. I'll give this some more thought.
 
Old 08-23-2003, 10:00 AM   #8
german
Member
 
Registered: Jul 2003
Location: Toronto, Canada
Distribution: Debian etch, Gentoo
Posts: 312

Rep: Reputation: 30
Taken from O'reilly's Java and XML book:

"A CDATA section is used when a significant amount of data should be passed on to the calling application without any XML parsing."

<![CDATA[

<html></html>

]]>

CDATA sections keep all whitespace, all bad characters, etc. etc. but you MUST be 100% sure that the contents do not contain "]]>" or bad things will happen.

HTH

B.
 
Old 08-23-2003, 04:25 PM   #9
cludwin
Member
 
Registered: Feb 2002
Distribution: Slack
Posts: 50

Rep: Reputation: 16
Yes that is correct,

PCDATA stands for processed character data, which means that the data goes through the xml parser, and html will cause it to raise an exception.

CDATA is not processed so what you put in is essentially what you get out. However I would suggest that you try to avoid wraping html in xml, a cleaner way is to encode only your data in xml, then use xsl to generate your markup (html) and use css for your style. If speed is a concern then implement a cache.

hope this helps,
cludwin
 
Old 08-25-2003, 12:04 AM   #10
german
Member
 
Registered: Jul 2003
Location: Toronto, Canada
Distribution: Debian etch, Gentoo
Posts: 312

Rep: Reputation: 30
that is exactly the architecture I have been working with for ~2 years now, and it works extremely well. However, there are instances where you may wish to preserve someone else's preformatted HTML and you would have to parse it into XML, then back into HTML in a similar form which would prove a PITA, so CDATA sections would probably come in useful.

B.
 
Old 08-25-2003, 07:18 PM   #11
Travis86
Member
 
Registered: Dec 2002
Location: The land of GMT -6
Distribution: OS X, PS2 Linux, Ubuntu, IRIX 6.5
Posts: 399

Original Poster
Rep: Reputation: 31
Ah! Just what I was looking for. Thanks.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
where do i store my html files for my apache? philfighter Linux - Networking 8 04-22-2015 08:26 AM
text to xml to html osio Programming 5 07-28-2005 12:39 PM
How can i read an write to a HTML or xml file using C alix123 Programming 1 11-24-2004 05:07 AM
html, xml, php, mysql atheist Programming 8 06-07-2004 01:28 PM
How can I transform XML into HTML on bash? pedrosan Linux - Newbie 0 04-22-2004 02:37 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:27 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration