LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Wget or cURL code for checking changes to a web page? (http://www.linuxquestions.org/questions/programming-9/wget-or-curl-code-for-checking-changes-to-a-web-page-691006/)

ewingtux 12-16-2008 12:52 PM

Wget or cURL code for checking changes to a web page?
 
Does anyone know what command i could use to check and be notified of any price changes on a product/web page such as http://www.amazon.com/Tales-Beedle-B.../dp/0545128285. If the price changes to say $7.50 i need it to check and notify me.

Wget or cURL seem the best but don't know where to start with what command to use.

Any help much appreciated.

MensaWater 12-16-2008 03:16 PM

You could do it with wget but it would be a little involved.

If it were me I'd do it with lynx instead:

Code:

lynx -dump http://www.amazon.com/Tales-Beedle-B.../dp/0545128285 |grep "  Price:" |awk '{print $1,$2}'
Note that in the grep there are TWO spaces before the "Price:". This insures it gets the price rather than the list price.

In my awk I'm printing both the "Price:" and the current price ($7.14 when I ran it).

You could just print the current price ($2 in awk print statement) and strip off the $ to do numeric comparison using something like bc -l.

Code:

lynx -dump http://www.amazon.com/Tales-Beedle-B.../dp/0545128285 |grep "  Price:" |awk '{print $2}' |cut -c2-
The cut statement at end strips the $ off since it is always in position 2. (You could do it with sed or awk but then you have to figure out how to escape the dollar sign since it has special meaning itself.)

jcookeman 12-16-2008 04:46 PM

Quick and nasty:

Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
import re
import sys
try:
    pricefile = open('pricefile.txt')
    initial_price = pricefile.readline()
    pricefile.close()
except IOError:
    initial_price = 0
# Setup urllib to look like Firefox on Ubuntu so those
# clever Amazon engineers don't catch on (as fast)
hdrs = {'User-Agent':'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.4) Gecko/2008111318 Ubuntu/8.04 (hardy) Firefox/3.0.4'}
req = urllib2.Request(sys.argv[1], headers=hdrs)
# Grab the page
try:
    pg = urllib2.urlopen(req)
except urllib2.HTTPError, err:
    print "%s: %s" % (sys.argv[1], err)
    sys.exit(1)
# Look through the page
try:
    while True:
        if re.search('.*("priceBlockLabelPrice").*', pg.next()):
            pricere = re.search('.*>([$]+[0-9]+\.[0-9]+)<.*', pg.next())
            price = pricere.group(1)
            break
except StopIteration:
    print "Could not find price"
    sys.exit(1)
if price != initial_price:
    print "Price has changed from %s to %s" % (initial_price, price)
else:
    print "Price is still %s" % price
try:
    pricefile = open('pricefile.txt', 'w+')
    pricefile.write(price)
    pricefile.close()
except IOError, (strerr, errno):
    print "cannot write to pricefile: [%s] %s" % (errno, strerr)
    sys.exit(1)
sys.exit(0)



All times are GMT -5. The time now is 09:21 PM.