LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-27-2014, 01:01 AM   #1
mia_tech
Member
 
Registered: Dec 2007
Location: FL, USA
Distribution: CentOS 5.3, Ubuntu 9.04
Posts: 245

Rep: Reputation: 16
what would be the best way to extract info out of an html page


let's say I'm monitoring a webpage which is constantly updating its information like: for example, an ebay page reflecting current bids, and I want to check on the latest bid every 10 min when there's only one hour left. Would it be better to use the source code of the page or dumping the actual page with lynx or other text base browser, and then parse the info I'm interested in.
 
Old 03-27-2014, 01:17 AM   #2
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,223

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
I'd look into this:

http://phantomjs.org/
 
2 members found this post helpful.
Old 03-27-2014, 11:30 AM   #3
bonnydeal
Member
 
Registered: Feb 2006
Posts: 47

Rep: Reputation: 29
I would look into THIS
http://developer.ebay.com/common/api/
 
1 members found this post helpful.
Old 03-27-2014, 11:32 AM   #4
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
+1 for phantomjs. It really is the fastest way to accomplish this without starting from scratch.
 
Old 03-27-2014, 11:42 AM   #5
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,223

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
There's also this, which also let's you easily get information out of an HTML page:

http://docs.seleniumhq.org/

So: plenty of good solutions that don't require you to scrape the HTML.

Definitely try the API first though.

Last edited by dugan; 03-27-2014 at 12:00 PM.
 
Old 03-27-2014, 11:53 AM   #6
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
i've made a few site scrapers for xbmc using python:
http://hyper.dyndns-home.com/xbmc/

if you know the url to the item you can do something like this
Code:
[schneidz@hyper ebay]$ wget -q -O - 'http://www.ebay.com/itm/Google-Nexus-5-16GB-Black-Unlocked-Smartphone-/111310254176?pt=Cell_Phones&hash=item19ea9bb060' | egrep -o "(\"timeLeftInMins\":[0-9]*,|itemprop=\"price\".*[0-9])"
itemprop="price">US $255.00
"timeLeftInMins":7444,

Last edited by schneidz; 03-27-2014 at 12:03 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can wget extract links from a locally stored html page? LAPIII Linux - Software 1 11-12-2013 01:14 AM
html : force the fonts size for whole html page? Xeratul Programming 6 11-27-2012 11:54 AM
How do I output information from a PHP page to an HTML page? SentralOrigin Programming 3 01-10-2009 01:54 AM
Help me extract some info from a web page xmrkite Linux - Software 6 01-31-2008 07:11 PM
"info ls" shows man page instead of info page lorenz Slackware 8 09-21-2007 08:47 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration