HTML parsing with HTML::TreeBuilder
Hello everyone,
I am working on a script that parses content from web pages and inserts the data in to a new page. The process is as follows:
Retrieve content from a web site (source) with WWW:Mechanize and parse table content in to a data structure.
Download the target web page via FTP.
Insert desired table content from data structure in to the target page.
Upload the target web page via FTP.
The problem is that I'm using HTML::TreeBuilder for both parsing the source page and inserting data in to the target page and when creating the target page the DOCTYPE is positioned after the closing body tag. (Documented bug in HTML::TreeBuilder) This is a problem for IE...
Has anyone used HTML::TreeBuilder before and found a solution to this problem?
I'm guessing that I could just insert the doctype in to the html page after it has been created, but I want to retain the doctype from the original document. I was hoping someone out there had a better solution.
Thanks.
-Shawn
Last edited by smaida; 07-11-2005 at 12:05 AM.
|