Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
04-03-2017, 10:18 PM
|
#1
|
LQ Newbie
Registered: Apr 2017
Posts: 8
Rep:
|
Looking for command or code to check if a website has updated content
Greeting all here,
I don't know where to start nor what command to use if any. Most searches want you to use some browser extension for that task, but I don't want to use browser extensions. If this can't be done in linux, then I'll use those extensions. Thanks
|
|
|
04-03-2017, 11:46 PM
|
#2
|
LQ Guru
Registered: Aug 2016
Location: SE USA
Distribution: openSUSE 24/7; Debian, Knoppix, Mageia, Fedora, others
Posts: 6,109
|
Web pages are typically generated dynamically any more. With them, any attempt to discover newness that would conceivably work would return true.
|
|
|
04-04-2017, 12:32 AM
|
#3
|
LQ Newbie
Registered: Apr 2017
Posts: 8
Original Poster
Rep:
|
Another problem is ads. Ads will change on website and report a false positive on actual content. It was worth a shot. Thanks
|
|
|
04-04-2017, 12:46 AM
|
#4
|
LQ Guru
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 13,204
|
Quote:
Originally Posted by ackerman57
Greeting all here,
I don't know where to start nor what command to use if any. Most searches want you to use some browser extension for that task, but I don't want to use browser extensions. If this can't be done in linux, then I'll use those extensions. Thanks
|
You should take a look at the HEAD request
Regards
|
|
|
04-04-2017, 01:08 AM
|
#5
|
LQ Newbie
Registered: Apr 2017
Posts: 8
Original Poster
Rep:
|
Quote:
Originally Posted by bathory
You should take a look at the HEAD request
Regards
|
Don't know how?
BTW folks, this is not major thing I need. Don't fret too much on this :|
Last edited by ackerman57; 04-04-2017 at 01:16 AM.
|
|
|
04-04-2017, 01:15 AM
|
#6
|
Member
Registered: Jun 2016
Distribution: any&all, in VBox; Ol'UnixCLI; NO GUI resources
Posts: 999
|
Last edited by Jjanel; 04-04-2017 at 01:51 AM.
|
|
|
04-04-2017, 01:45 AM
|
#7
|
LQ Newbie
Registered: Apr 2017
Posts: 8
Original Poster
Rep:
|
Hi Jjanel
I will look at those links and see what happens. Thanks
|
|
|
04-04-2017, 01:58 AM
|
#8
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,517
|
wget will do that. So would curl. The HTTP Request Header you want to use when forming your GET or HEAD request is the If-Modified-Since header. The HTTP Date has to conform to a specific format. (In my opinion they should have gone with a subset of ISO 8601.)
Code:
wget --header="If-Modified-Since: Tue, 04 Apr 2017 05:57:29 GMT" http://www.example.com/
wget --header="If-Modified-Since: $(date -u -d 'last week' +'%a, %d %b %Y %T GMT')" http://www.example.com/
The HEAD request helps, already mentioned, if you only need the metadata not the object itself.
About ads, most ads these days are not embedded in the web page itself, but pulled in from a set of external, unvetted servers via javascript. In all likelihood, the javascript pulling in the ads, clean or tainted, will not change. However, dynamically generated pages might not have an accurate time stamp and might show only the current time and date even if the content hasn't changed for a long time. That is common with PHP sites as well as others.
|
|
|
04-04-2017, 02:11 AM
|
#9
|
LQ Guru
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 13,204
|
Quote:
Originally Posted by ackerman57
Don't know how?
BTW folks, this is not major thing I need. Don't fret too much on this :|
|
Just FYI: https://2buntu.com/articles/1493/mon...-etag-headers/
|
|
|
04-04-2017, 03:15 AM
|
#10
|
LQ Newbie
Registered: Apr 2017
Posts: 8
Original Poster
Rep:
|
Quote:
Originally Posted by Turbocapitalist
wget will do that. So would curl. The HTTP Request Header you want to use when forming your GET or HEAD request is the If-Modified-Since header. The HTTP Date has to conform to a specific format. (In my opinion they should have gone with a subset of ISO 8601.)
Code:
wget --header="If-Modified-Since: Tue, 04 Apr 2017 05:57:29 GMT" http://www.example.com/
wget --header="If-Modified-Since: $(date -u -d 'last week' +'%a, %d %b %Y %T GMT')" http://www.example.com/
The HEAD request helps, already mentioned, if you only need the metadata not the object itself.
About ads, most ads these days are not embedded in the web page itself, but pulled in from a set of external, unvetted servers via javascript. In all likelihood, the javascript pulling in the ads, clean or tainted, will not change. However, dynamically generated pages might not have an accurate time stamp and might show only the current time and date even if the content hasn't changed for a long time. That is common with PHP sites as well as others.
|
Quote:
Originally Posted by bathory
|
Code:
curl -I "http://magazine.odroid.com/"
HTTP/1.1 200 OK
Date: Tue, 04 Apr 2017 08:11:21 GMT
Server: Apache/2.4.7 (Ubuntu) SVN/1.8.8 PHP/5.5.9-1ubuntu4.21
X-Powered-By: PHP/5.5.9-1ubuntu4.21
Link: <http://magazine.odroid.com/wp-json/>; rel="https://api.w.org/"
Content-Type: text/html; charset=UTF-8
No Last-Modified and ETag headers here.
Last edited by ackerman57; 04-04-2017 at 03:17 AM.
|
|
|
04-04-2017, 04:53 AM
|
#11
|
Member
Registered: Jun 2016
Distribution: any&all, in VBox; Ol'UnixCLI; NO GUI resources
Posts: 999
|
fwiw, playing with wget http://magazine.odroid.com I noticed that I only needed to
grep -v userSettings
|
|
|
04-04-2017, 06:28 PM
|
#12
|
LQ Newbie
Registered: Apr 2017
Posts: 8
Original Poster
Rep:
|
Quote:
Originally Posted by Jjanel
|
http://magazine.odroid.com was used as an example. The real site is something else, but it doesn't have a Last-Modified and ETag headers either. I am going to try one of the links you gave me using bash and diff.
If that doesn't do it, then I just periodically check the site every few days as usual.
I want to thank everyone here who replied and for your suggestions.
|
|
|
All times are GMT -5. The time now is 10:14 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|