Quote:
Originally Posted by 0.o
I am trying to build a regular expression that will match the following:
http://www.linkedin.com/[a-zA-Z0-9]+
(the above URL followed by anything on the site)
Could someone point me in the right direction?
Thanks!
|
you want to get what's after 'http://www.linkedin.com/'?
if you're parsing an html file and you want
to retrieve this info from the hyperlinks found in file.html
sed 's?href=[^ >]*?\n&\n?g' file.html | grep -i 'href=' | \
grep 'http://www\.linkedin\.com' | \
sed 's?^.*
http://www\.linkedin\.com/\([^'" \t>]*\).*$?\1?' | \
sort -u
does that answer your question?