Looking for command or code to check if a website has updated content
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Looking for command or code to check if a website has updated content
Greeting all here,
I don't know where to start nor what command to use if any. Most searches want you to use some browser extension for that task, but I don't want to use browser extensions. If this can't be done in linux, then I'll use those extensions. Thanks
I don't know where to start nor what command to use if any. Most searches want you to use some browser extension for that task, but I don't want to use browser extensions. If this can't be done in linux, then I'll use those extensions. Thanks
wget will do that. So would curl. The HTTP Request Header you want to use when forming your GET or HEAD request is the If-Modified-Since header. The HTTP Date has to conform to a specific format. (In my opinion they should have gone with a subset of ISO 8601.)
The HEAD request helps, already mentioned, if you only need the metadata not the object itself.
About ads, most ads these days are not embedded in the web page itself, but pulled in from a set of external, unvetted servers via javascript. In all likelihood, the javascript pulling in the ads, clean or tainted, will not change. However, dynamically generated pages might not have an accurate time stamp and might show only the current time and date even if the content hasn't changed for a long time. That is common with PHP sites as well as others.
wget will do that. So would curl. The HTTP Request Header you want to use when forming your GET or HEAD request is the If-Modified-Since header. The HTTP Date has to conform to a specific format. (In my opinion they should have gone with a subset of ISO 8601.)
The HEAD request helps, already mentioned, if you only need the metadata not the object itself.
About ads, most ads these days are not embedded in the web page itself, but pulled in from a set of external, unvetted servers via javascript. In all likelihood, the javascript pulling in the ads, clean or tainted, will not change. However, dynamically generated pages might not have an accurate time stamp and might show only the current time and date even if the content hasn't changed for a long time. That is common with PHP sites as well as others.
http://magazine.odroid.com was used as an example. The real site is something else, but it doesn't have a Last-Modified and ETag headers either. I am going to try one of the links you gave me using bash and diff.
If that doesn't do it, then I just periodically check the site every few days as usual.
I want to thank everyone here who replied and for your suggestions.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.