LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-18-2010, 07:36 AM   #1
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Rep: Reputation: 30
check if a website file has changed


I have a site that I login to to check updates. It does not have RSS because users need to authenticate themselves before getting access to the page.
Is there a way to write a script that can login to the page and check whether the HTML has changed and then send me an email?
 
Old 01-18-2010, 08:40 AM   #2
nuwen52
Member
 
Registered: Feb 2009
Distribution: Debian, CentOS 5, Gentoo, FreeBSD, Fedora, Mint, Slackware64
Posts: 208

Rep: Reputation: 46
You could maybe script wget to do this? If wget asks for a log and password for the website, then you could automate this using expect. Another way to do this is to use the urllib or urllib2 in python. Depending on how you have to log into the page, these should do what you are looking for.
 
Old 01-18-2010, 08:55 AM   #3
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by nuwen52 View Post
You could maybe script wget to do this? If wget asks for a log and password for the website, then you could automate this using expect. Another way to do this is to use the urllib or urllib2 in python. Depending on how you have to log into the page, these should do what you are looking for.
I cannot get wget to successfully login to the page. I try to submit the details by POST to a login form but it doesn't pick up the correct page afterwards, just the standard page when not logged in.
I picked up the form name from the source on this page, not sure if it's the correct one or not: https://www.inthemoneystocks.com/pro...watch_list.php

Code:
[root@serve~]# wget --post-data='username=DUMMYUSER&password=DUMMYPASSWORD'--save-cookies=my-cookies.txt --keep-session-cookies https://www.inthemoneystocks.com/swing_trade_month.php
--2010-01-18 14:51:32--  https://www.inthemoneystocks.com/swing_trade_month.php
Resolving www.inthemoneystocks.com... 72.29.80.60
Connecting to www.inthemoneystocks.com|72.29.80.60|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15559 (15K) [text/html]
Saving to: `swing_trade_month.php'

100%[=======================================================================================================================================>] 15,559      63.6K/s   in 0.2s

2010-01-18 14:51:33 (63.6 KB/s) - `swing_trade_month.php' saved [15559/15559]

[root@server ~]# wget https://www.inthemoneystocks.com/pro...watch_list.php
--2010-01-18 14:52:16--  https://www.inthemoneystocks.com/pro...watch_list.php
Resolving www.inthemoneystocks.com... 72.29.80.60
Connecting to www.inthemoneystocks.com|72.29.80.60|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17008 (17K) [text/html]
Saving to: `pro_trader_watch_list.php'

100%[=======================================================================================================================================>] 17,008      72.3K/s   in 0.2s

2010-01-18 14:52:17 (72.3 KB/s) - `pro_trader_watch_list.php' saved [17008/17008]

[root@server ~]# nano pro_trader_watch_list.php
[root@server ~]#

Last edited by qwertyjjj; 01-18-2010 at 09:30 AM.
 
Old 01-18-2010, 01:24 PM   #4
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
any thoughts on the login part?
 
Old 01-18-2010, 04:17 PM   #5
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
After the initial request, it is rerouted through to login again :

Code:
[root@server ~]# wget --save-cookies cookies.txt --keep-session-cookies --post-data 'username=MYUSERN&currentpassword=MYPASS' http://www.inthemoneystocks.com/login.php
--2010-01-18 22:15:12--  http://www.inthemoneystocks.com/login.php
Resolving www.inthemoneystocks.com... 72.29.80.60
Connecting to www.inthemoneystocks.com|72.29.80.60|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16397 (16K) [text/html]
Saving to: `login.php.1'

100%[=======================================================================================================================================>] 16,397      62.9K/s   in 0.3s

2010-01-18 22:15:13 (62.9 KB/s) - `login.php.1' saved [16397/16397]

[root@server ~]# wget --load-cookies cookies.txt -p https://www.inthemoneystocks.com/pro_trader_watch_list_prem.php                                                --2010-01-18 22:15:45--  https://www.inthemoneystocks.com/pro_trader_watch_list_prem.php
Resolving www.inthemoneystocks.com... 72.29.80.60
Connecting to www.inthemoneystocks.com|72.29.80.60|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://www.inthemoneystocks.com/login.php [following]
--2010-01-18 22:15:45--  https://www.inthemoneystocks.com/login.php
Connecting to www.inthemoneystocks.com|72.29.80.60|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16254 (16K) [text/html]
Saving to: `www.inthemoneystocks.com/login.php'

100%[=======================================================================================================================================>] 16,254      68.3K/s   in 0.2s

2010-01-18 22:15:46 (68.3 KB/s) - `www.inthemoneystocks.com/login.php' saved [16254/16254]

Loading robots.txt; please ignore errors.
--2010-01-18 22:15:46--  https://www.inthemoneystocks.com/robots.txt
Reusing existing connection to www.inthemoneystocks.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 191 [text/plain]
Saving to: `www.inthemoneystocks.com/robots.txt'

100%[=======================================================================================================================================>] 191         --.-K/s   in 0s

2010-01-18 22:15:46 (6.77 MB/s) - `www.inthemoneystocks.com/robots.txt' saved [191/191]

--2010-01-18 22:15:46--  https://www.inthemoneystocks.com/main.css
Reusing existing connection to www.inthemoneystocks.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 19172 (19K) [text/css]
Saving to: `www.inthemoneystocks.com/main.css'

100%[=======================================================================================================================================>] 19,172      --.-K/s   in 0.1s

2010-01-18 22:15:47 (159 KB/s) - `www.inthemoneystocks.com/main.css' saved [19172/19172]

--2010-01-18 22:15:47--  https://www.inthemoneystocks.com/js/mootools1.11.js
Reusing existing connection to www.inthemoneystocks.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 43715 (43K) [application/javascript]
Saving to: `www.inthemoneystocks.com/js/mootools1.11.js'

100%[=======================================================================================================================================>] 43,715       183K/s   in 0.2s

2010-01-18 22:15:47 (183 KB/s) - `www.inthemoneystocks.com/js/mootools1.11.js' saved [43715/43715]

--2010-01-18 22:15:47--  https://www.inthemoneystocks.com/js/fValidator.js
Reusing existing connection to www.inthemoneystocks.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 8966 (8.8K) [application/javascript]
Saving to: `www.inthemoneystocks.com/js/fValidator.js'

100%[=======================================================================================================================================>] 8,966       --.-K/s   in 0.001s

2010-01-18 22:15:47 (10.4 MB/s) - `www.inthemoneystocks.com/js/fValidator.js' saved [8966/8966]

--2010-01-18 22:15:47--  https://www.inthemoneystocks.com/js/purchaseform.js
Reusing existing connection to www.inthemoneystocks.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 7248 (7.1K) [application/javascript]
Saving to: `www.inthemoneystocks.com/js/purchaseform.js'

100%[=======================================================================================================================================>] 7,248       --.-K/s   in 0.001s

2010-01-18 22:15:47 (8.26 MB/s) - `www.inthemoneystocks.com/js/purchaseform.js' saved [7248/7248]

--2010-01-18 22:15:47--  https://www.inthemoneystocks.com/ticker.swf
Reusing existing connection to www.inthemoneystocks.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 43872 (43K) [application/x-shockwave-flash]
Saving to: `www.inthemoneystocks.com/ticker.swf'

100%[=======================================================================================================================================>] 43,872      --.-K/s   in 0.1s

2010-01-18 22:15:47 (361 KB/s) - `www.inthemoneystocks.com/ticker.swf' saved [43872/43872]

FINISHED --2010-01-18 22:15:47--
Downloaded: 7 files, 136K in 0.7s (193 KB/s)
[root@server~]#

Last edited by qwertyjjj; 01-18-2010 at 04:18 PM.
 
Old 01-18-2010, 04:57 PM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,397

Rep: Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777Reputation: 2777
For that sort of stuff I use Perl + http://search.cpan.org/~petdance/WWW...W/Mechanize.pm
 
Old 01-18-2010, 06:07 PM   #7
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by chrism01 View Post
For that sort of stuff I use Perl + http://search.cpan.org/~petdance/WWW...W/Mechanize.pm
I just thought wget would do it simply.
Could I use PHP as well? I'm more mailiar with that than perl.
What lib do I need for PHP?
 
Old 01-18-2010, 06:29 PM   #8
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
Here is my current curl file but it doesn't seem to download the file:

Code:
#! /usr/bin/php

<?php
// INIT CURL
$ch = curl_init();

// SET URL FOR THE POST FORM LOGIN
curl_setopt($ch, CURLOPT_URL,
'https://www.inthemoneystocks.com/login.php');

// ENABLE HTTP POST
curl_setopt ($ch, CURLOPT_POST, 1);

// SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
curl_setopt ($ch, CURLOPT_POSTFIELDS,
'username=xxxxx&currentpassword=xxxxx');

// IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');

# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
# not to print out the results of its query.
# Instead, it will return the results as a string return value
# from curl_exec() instead of the usual true/false.
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);

// EXECUTE 1st REQUEST (FORM LOGIN)
$store = curl_exec ($ch);

// SET FILE TO DOWNLOAD
curl_setopt($ch, CURLOPT_URL,
'https://www.inthemoneystocks.com/pro_trader_watch_list_prem.php');

// EXECUTE 2nd REQUEST (FILE DOWNLOAD)
$content = curl_exec ($ch);

// CLOSE CURL
curl_close ($ch);

?>
I guess I need to write $content to a file? How do I do that?

Last edited by qwertyjjj; 01-18-2010 at 06:33 PM.
 
Old 01-18-2010, 08:17 PM   #9
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
I've got it into a file but I am noe using file_get_contents to check the new file every 10mins and if it changes then I email it to myself.
Unfortunately, curl or something is changing the file size by a few bytes even if nothing has changed.
Any ideas?
 
Old 01-19-2010, 07:39 AM   #10
nuwen52
Member
 
Registered: Feb 2009
Distribution: Debian, CentOS 5, Gentoo, FreeBSD, Fedora, Mint, Slackware64
Posts: 208

Rep: Reputation: 46
Is there a date or perhaps a number of hits counter on the page? Curl should just get contents unmodified, so it's probably the page itself that is changing. I would do a diff on two versions of the page and see what changed. Then you can remove those sections from your checks. Just a thought.
 
Old 01-19-2010, 08:08 AM   #11
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by nuwen52 View Post
Is there a date or perhaps a number of hits counter on the page? Curl should just get contents unmodified, so it's probably the page itself that is changing. I would do a diff on two versions of the page and see what changed. Then you can remove those sections from your checks. Just a thought.
Any ideas on how to do a diff in PHP?
 
Old 01-19-2010, 08:58 AM   #12
nuwen52
Member
 
Registered: Feb 2009
Distribution: Debian, CentOS 5, Gentoo, FreeBSD, Fedora, Mint, Slackware64
Posts: 208

Rep: Reputation: 46
Well, what I meant was do a diff manually just to see what's different. If what's different is minor, then you can program your php script to ignore those minor types of changes. Like, if the line that's different looks like "DATE: Thu Feb 27 2010", then you can tell the script to ignore that line when determining if the file changed. But, other than that...
Code:
exec("diff <file1> <file2> > /tmp/difftmp.txt");
diff_file=fopen("/tmp/difftmp.txt", "r");
and interpret the file. I don't know of a PHP library for diff.

Last edited by nuwen52; 01-19-2010 at 08:59 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to check which file is changed in the last few days? thomas2004ch Linux - Newbie 5 11-23-2009 01:22 PM
My website home page php file was changed mohtasham1983 Linux - Security 4 04-22-2009 05:12 PM
Need help with script for check for a file that has changed in the last minute EclipseAgent Programming 10 02-12-2009 07:46 PM
Check if user has changed PW fenriswolf Linux - Security 3 04-29-2003 09:01 PM
Help check speed of a website! DavidPhillips General 6 01-02-2003 07:50 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 07:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration