LinuxQuestions.org - grep help or sed or awk

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - grep help or sed or awk (https://www.linuxquestions.org/questions/linux-software-2/grep-help-or-sed-or-awk-835273/)

grep help or sed or awk

I am trying to scrape a certain group of web pages for links. Lets say the links I am interested in end in xyz and they begin with a /. I have tried to do this with the following grep command:

grep -o '[//]*xyz' file

It doesn't work, because all I get is xyz printed.

I think it is possible to do similar things with sed and possibly awk, but I don't know how.

Thanks in advance

ps: No, I am not doing anything immoral here.

Usually links don't end in xyz; but grep should work OK for this. Please show us a sample of an actual 'xyz' link that you'd like to match with your regex, and someone can perhaps suggest a regex to match it and similar links.

Well, I was using that as an example. What I really want is links that end with "cs0.gif" (Image Files)

Not sure how much of the link you want, however, if I have a file named 'links' containing the following:

Code:

sasha@reactor: cat links

http://www.some-site.com/blah/blarh/image-cs0.gif

http://www.some-site.com/blah/blarh/image-cs0.gif

http://www.some-site.com/blab/blarg/image-cs1.gif

http://www.some-site.com/blah/blarh/image-cs0.gif

http://www.some-site.com/blah/blarh/image-cs0.gif

http://www.someothersite.com/blah/blarh/image-cs1.gif

and I want the links described by you, then the following works:

Code:

sasha@reactor: grep -o -e 'http://.*cs0\.gif' links

http://www.some-site.com/blah/blarh/image-cs0.gif

http://www.some-site.com/blah/blarh/image-cs0.gif

http://www.some-site.com/blah/blarh/image-cs0.gif

http://www.some-site.com/blah/blarh/image-cs0.gif

sasha@reactor:

If this isn't right, please show an exact link you might come across, and exactly what you want outputted.

Cheers!

Code:

$ ruby -00 -ne 'puts $_.scan(/http.[^>"]*/);' file