LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-10-2005, 09:58 PM   #1
smaida
Member
 
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62

Rep: Reputation: 15
HTML parsing with HTML::TreeBuilder


Hello everyone,

I am working on a script that parses content from web pages and inserts the data in to a new page. The process is as follows:

Retrieve content from a web site (source) with WWW:Mechanize and parse table content in to a data structure.
Download the target web page via FTP.
Insert desired table content from data structure in to the target page.
Upload the target web page via FTP.

The problem is that I'm using HTML::TreeBuilder for both parsing the source page and inserting data in to the target page and when creating the target page the DOCTYPE is positioned after the closing body tag. (Documented bug in HTML::TreeBuilder) This is a problem for IE...

Has anyone used HTML::TreeBuilder before and found a solution to this problem?

I'm guessing that I could just insert the doctype in to the html page after it has been created, but I want to retain the doctype from the original document. I was hoping someone out there had a better solution.

Thanks.

-Shawn

Last edited by smaida; 07-11-2005 at 12:05 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing out html with egrep binaryechoes Linux - Software 2 12-02-2005 11:49 PM
Parsing out html with egrep binaryechoes Linux - Newbie 3 12-02-2005 12:41 AM
Parsing HTML Source Code Itsu Linux - General 4 10-08-2005 01:44 AM
HTML parsing library nodger Programming 1 09-01-2005 01:42 AM
Parsing HTML using Perl smaida Programming 2 05-29-2004 01:20 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration