LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-25-2013, 12:04 AM   #1
patrokov
Member
 
Registered: Jan 2006
Location: Riviera Beach
Distribution: Slackware -current, ArchLinux
Posts: 59

Rep: Reputation: 15
Verifying links in a webpage or rss feed.


This a tip, not a question. It's incredibly hard to find how to do this using google, since apparently google's search engine can't tell...

Anyway this bash script takes a URL as an argument, downloads it, extracts all of the hyperlinks from it, and then uses wget in spider mode to check if the hyperlink is still good.

Very useful for checking "links pages" and rss feeds.

Code is heavily commented

Code:
#!/bin/bash

# get the basename of the URL
RSSFILE=${1##*/}

#make sure it doesn't exist (by deleting it)
rm $RSSFILE

#download the URL 
wget $1

# takes the argument from the command line and finds urls in it. 
# awk takes the fields in the line and turns every field into a record....
# Then it matches the records that start with url

# If the separator is " then there are three fields in the record that we are looking for:
# $1 contains the part to the left of the first double quote, url=
# $2 contains the url and
# $3 contains the part to the right of the second double quote, which is an empty string...
# RS=FS tells awk that the record separator is the same as the field separator

URL=$(awk 'BEGIN{RS=FS}/^href/{print $2"\n"}' FS='"' $RSSFILE)

# If href (html) turns up empty, then try again with url (used in rss)
if [ !$URL  ];
 then
        URL=$(awk 'BEGIN{RS=FS}/^url/{print $2"\n"}' FS='"' $RSSFILE)
 fi



for LINE in $URL 
do
        #if a url doesn't have 200 (OK status) print the URL and error message
        wget -nv --spider $LINE 2>&1  | grep -v "200"
done 

rm $RSSFILE

exit 0
 
Old 06-26-2013, 09:23 AM   #2
yooy
Senior Member
 
Registered: Dec 2009
Posts: 1,387

Rep: Reputation: 174Reputation: 174
isn't that included in "google webmasters tools"?
 
  


Reply

Tags
check, feed, links, rss, verify



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RSS feed in iGoogle anupamsr LQ Suggestions & Feedback 3 04-06-2008 11:49 AM
Looking for RSS & Atom Feed (News Feed)? suse2166 Linux - Software 2 11-16-2006 04:58 PM
How to maintain an RSS feed? henrikanttonen Programming 1 02-27-2006 09:35 AM
RSS feed. paul_dundee Linux - Software 2 03-18-2005 03:46 PM
0 reply RSS feed now available jeremy LQ Suggestions & Feedback 1 01-05-2004 04:22 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration