LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   grep help or sed or awk (https://www.linuxquestions.org/questions/linux-software-2/grep-help-or-sed-or-awk-835273/)

dmchess 09-29-2010 04:14 PM

grep help or sed or awk
 
I am trying to scrape a certain group of web pages for links. Lets say the links I am interested in end in xyz and they begin with a /. I have tried to do this with the following grep command:

grep -o '[//]*xyz' file

It doesn't work, because all I get is xyz printed.

I think it is possible to do similar things with sed and possibly awk, but I don't know how.

Thanks in advance

ps: No, I am not doing anything immoral here.

GrapefruiTgirl 09-29-2010 04:48 PM

Usually links don't end in xyz; but grep should work OK for this. Please show us a sample of an actual 'xyz' link that you'd like to match with your regex, and someone can perhaps suggest a regex to match it and similar links.

dmchess 09-29-2010 05:49 PM

Well, I was using that as an example. What I really want is links that end with "cs0.gif" (Image Files)

GrapefruiTgirl 09-29-2010 05:58 PM

Not sure how much of the link you want, however, if I have a file named 'links' containing the following:
Code:

sasha@reactor: cat links
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blab/blarg/image-cs1.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.someothersite.com/blah/blarh/image-cs1.gif

and I want the links described by you, then the following works:
Code:

sasha@reactor: grep -o -e 'http://.*cs0\.gif' links
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
sasha@reactor:

If this isn't right, please show an exact link you might come across, and exactly what you want outputted.

Cheers!

kurumi 09-29-2010 06:53 PM

Code:

$ ruby -00 -ne 'puts $_.scan(/http.[^>"]*/);' file


All times are GMT -5. The time now is 06:41 AM.