LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 09-07-2012, 09:04 AM   #1
slinx
Member
 
Registered: Apr 2008
Location: Cleveland, Ohio
Distribution: SuSE, CentOS, Fedora, Ubuntu
Posts: 106

Rep: Reputation: 23
Question Downloading dynamically built web pages


Hello, I am trying to download a webpage that is dynamically generated using wget.

My goal is to search through a table of generated links to product items displayed in a table, and examine the links to see if they are correctly formed.

My problem is, that wget does not download the actual HTML displayed for this table of items. When I download the page with wget, none of the links appear in the output. I'm not even sure where the item links come from, although they are supposed to be generated by something called Celebros (http://www.celebros.com/).

How can I "scrape" the page as it is rendered in a browser?

Thank you for your help.
 
Old 09-07-2012, 09:26 AM   #2
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
If you open the page with a conventional browser like Mozilla or Chrome, you can use it to show you the page source. That should give you a view of how the browser is used to render the links and other elements. It is hard to know why wget isn't working, but my first guess is that the 'links' are actually implemented in Javascript + something like AJAX. It is reasonable to imagine that the site was constructed this way to defeat scraping.

--- rod.
 
Old 09-07-2012, 01:26 PM   #3
slinx
Member
 
Registered: Apr 2008
Location: Cleveland, Ohio
Distribution: SuSE, CentOS, Fedora, Ubuntu
Posts: 106

Original Poster
Rep: Reputation: 23
Yes thanks, I do know how to do that, and I have only been able to find where the script that loads that actual search content is placed. I am going to look into Scrapy or Watir to do what I need. I'll look at linklint too.

Last edited by slinx; 09-07-2012 at 01:31 PM.
 
  


Reply

Tags
browser, html, wget


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Ajax loading pages dynamically using <a> tags Skyer Programming 8 04-20-2012 03:41 PM
Web server sees the pages, but not the folder that has all the images for the pages nortonz Linux - Server 9 05-17-2010 03:04 PM
MS Publisher html pages for new web pages do not open in firefox, any suggestions?? Bwebman Linux - Newbie 3 06-13-2009 10:35 AM
caching dynamically generated pages linuxmandrake Linux - Server 1 01-16-2009 04:53 PM
ADSL Router Web configuration pages appears instead of Personal Web Server Pages procyon Linux - Networking 4 12-20-2004 05:44 PM


All times are GMT -5. The time now is 04:44 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration