LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-03-2017, 10:18 PM   #1
ackerman57
LQ Newbie
 
Registered: Apr 2017
Posts: 8

Rep: Reputation: Disabled
Looking for command or code to check if a website has updated content


Greeting all here,

I don't know where to start nor what command to use if any. Most searches want you to use some browser extension for that task, but I don't want to use browser extensions. If this can't be done in linux, then I'll use those extensions. Thanks
 
Old 04-03-2017, 11:46 PM   #2
mrmazda
LQ Guru
 
Registered: Aug 2016
Location: SE USA
Distribution: openSUSE 24/7; Debian, Knoppix, Mageia, Fedora, others
Posts: 5,808
Blog Entries: 1

Rep: Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066Reputation: 2066
Web pages are typically generated dynamically any more. With them, any attempt to discover newness that would conceivably work would return true.
 
Old 04-04-2017, 12:32 AM   #3
ackerman57
LQ Newbie
 
Registered: Apr 2017
Posts: 8

Original Poster
Rep: Reputation: Disabled
Another problem is ads. Ads will change on website and report a false positive on actual content. It was worth a shot. Thanks
 
Old 04-04-2017, 12:46 AM   #4
bathory
LQ Guru
 
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 13,163
Blog Entries: 1

Rep: Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032
Quote:
Originally Posted by ackerman57 View Post
Greeting all here,

I don't know where to start nor what command to use if any. Most searches want you to use some browser extension for that task, but I don't want to use browser extensions. If this can't be done in linux, then I'll use those extensions. Thanks
You should take a look at the HEAD request

Regards
 
Old 04-04-2017, 01:08 AM   #5
ackerman57
LQ Newbie
 
Registered: Apr 2017
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by bathory View Post
You should take a look at the HEAD request

Regards
Don't know how?

BTW folks, this is not major thing I need. Don't fret too much on this :|

Last edited by ackerman57; 04-04-2017 at 01:16 AM.
 
Old 04-04-2017, 01:15 AM   #6
Jjanel
Member
 
Registered: Jun 2016
Distribution: any&all, in VBox; Ol'UnixCLI; NO GUI resources
Posts: 999
Blog Entries: 12

Rep: Reputation: 364Reputation: 364Reputation: 364Reputation: 364
wget? http://wikipedia.org/wiki/HTTP_ETag maybe
http://thp.io/2008/urlwatch from web-search: linux check if a website has updated content
http://stackoverflow.com/questions/2...s-last-updated
http://bhfsteve.blogspot.com/2013/03...ges-using.html
A simple `curl` bash script (yes, 'dynamic' [probably 100% 'common' now-a-days] won't work); replace prowl with your choice of any command: http://www.makingyouthink.com/2015/1...s-bash-script/

Last edited by Jjanel; 04-04-2017 at 01:51 AM.
 
Old 04-04-2017, 01:45 AM   #7
ackerman57
LQ Newbie
 
Registered: Apr 2017
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hi Jjanel

I will look at those links and see what happens. Thanks
 
Old 04-04-2017, 01:58 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,308
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
wget will do that. So would curl. The HTTP Request Header you want to use when forming your GET or HEAD request is the If-Modified-Since header. The HTTP Date has to conform to a specific format. (In my opinion they should have gone with a subset of ISO 8601.)

Code:
wget --header="If-Modified-Since: Tue, 04 Apr 2017 05:57:29 GMT" http://www.example.com/
wget --header="If-Modified-Since: $(date -u -d 'last week' +'%a, %d %b %Y %T GMT')" http://www.example.com/
The HEAD request helps, already mentioned, if you only need the metadata not the object itself.

About ads, most ads these days are not embedded in the web page itself, but pulled in from a set of external, unvetted servers via javascript. In all likelihood, the javascript pulling in the ads, clean or tainted, will not change. However, dynamically generated pages might not have an accurate time stamp and might show only the current time and date even if the content hasn't changed for a long time. That is common with PHP sites as well as others.
 
Old 04-04-2017, 02:11 AM   #9
bathory
LQ Guru
 
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 13,163
Blog Entries: 1

Rep: Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032
Quote:
Originally Posted by ackerman57 View Post
Don't know how?

BTW folks, this is not major thing I need. Don't fret too much on this :|
Just FYI: https://2buntu.com/articles/1493/mon...-etag-headers/
 
Old 04-04-2017, 03:15 AM   #10
ackerman57
LQ Newbie
 
Registered: Apr 2017
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
wget will do that. So would curl. The HTTP Request Header you want to use when forming your GET or HEAD request is the If-Modified-Since header. The HTTP Date has to conform to a specific format. (In my opinion they should have gone with a subset of ISO 8601.)

Code:
wget --header="If-Modified-Since: Tue, 04 Apr 2017 05:57:29 GMT" http://www.example.com/
wget --header="If-Modified-Since: $(date -u -d 'last week' +'%a, %d %b %Y %T GMT')" http://www.example.com/
The HEAD request helps, already mentioned, if you only need the metadata not the object itself.

About ads, most ads these days are not embedded in the web page itself, but pulled in from a set of external, unvetted servers via javascript. In all likelihood, the javascript pulling in the ads, clean or tainted, will not change. However, dynamically generated pages might not have an accurate time stamp and might show only the current time and date even if the content hasn't changed for a long time. That is common with PHP sites as well as others.
Quote:
Originally Posted by bathory View Post
Code:
curl -I "http://magazine.odroid.com/" 

HTTP/1.1 200 OK
Date: Tue, 04 Apr 2017 08:11:21 GMT
Server: Apache/2.4.7 (Ubuntu) SVN/1.8.8 PHP/5.5.9-1ubuntu4.21
X-Powered-By: PHP/5.5.9-1ubuntu4.21
Link: <http://magazine.odroid.com/wp-json/>; rel="https://api.w.org/"
Content-Type: text/html; charset=UTF-8
No Last-Modified and ETag headers here.

Last edited by ackerman57; 04-04-2017 at 03:17 AM.
 
Old 04-04-2017, 04:53 AM   #11
Jjanel
Member
 
Registered: Jun 2016
Distribution: any&all, in VBox; Ol'UnixCLI; NO GUI resources
Posts: 999
Blog Entries: 12

Rep: Reputation: 364Reputation: 364Reputation: 364Reputation: 364
fwiw, playing with wget http://magazine.odroid.com I noticed that I only needed to
grep -v userSettings
 
Old 04-04-2017, 06:28 PM   #12
ackerman57
LQ Newbie
 
Registered: Apr 2017
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Jjanel View Post
fwiw, playing with wget http://magazine.odroid.com I noticed that I only needed to
grep -v userSettings
http://magazine.odroid.com was used as an example. The real site is something else, but it doesn't have a Last-Modified and ETag headers either. I am going to try one of the links you gave me using bash and diff.

If that doesn't do it, then I just periodically check the site every few days as usual.

I want to thank everyone here who replied and for your suggestions.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with content filtering/website blocking Swaggy Linux - Newbie 3 09-17-2013 08:24 AM
Say I have a website which needs to get updated daily beckettisdogg Linux - Newbie 5 09-02-2009 04:15 PM
Updated dvdbackup available on my website Lenard Spencer Linux - Software 1 03-04-2009 08:11 AM
website content systems powadha Linux - Software 1 08-06-2004 07:55 AM
website content management hardigunawan Linux - Software 1 06-11-2003 01:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration