LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 09-29-2010, 04:14 PM   #1
dmchess
Member
 
Registered: Jan 2005
Posts: 71

Rep: Reputation: 20
grep help or sed or awk


I am trying to scrape a certain group of web pages for links. Lets say the links I am interested in end in xyz and they begin with a /. I have tried to do this with the following grep command:

grep -o '[//]*xyz' file

It doesn't work, because all I get is xyz printed.

I think it is possible to do similar things with sed and possibly awk, but I don't know how.

Thanks in advance

ps: No, I am not doing anything immoral here.
 
Old 09-29-2010, 04:48 PM   #2
GrapefruiTgirl
Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Usually links don't end in xyz; but grep should work OK for this. Please show us a sample of an actual 'xyz' link that you'd like to match with your regex, and someone can perhaps suggest a regex to match it and similar links.
 
Old 09-29-2010, 05:49 PM   #3
dmchess
Member
 
Registered: Jan 2005
Posts: 71

Original Poster
Rep: Reputation: 20
Well, I was using that as an example. What I really want is links that end with "cs0.gif" (Image Files)
 
Old 09-29-2010, 05:58 PM   #4
GrapefruiTgirl
Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Not sure how much of the link you want, however, if I have a file named 'links' containing the following:
Code:
sasha@reactor: cat links
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blab/blarg/image-cs1.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.someothersite.com/blah/blarh/image-cs1.gif
and I want the links described by you, then the following works:
Code:
sasha@reactor: grep -o -e 'http://.*cs0\.gif' links
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
http://www.some-site.com/blah/blarh/image-cs0.gif
sasha@reactor:
If this isn't right, please show an exact link you might come across, and exactly what you want outputted.

Cheers!
 
Old 09-29-2010, 06:53 PM   #5
kurumi
Member
 
Registered: Apr 2010
Posts: 223

Rep: Reputation: 45
Code:
$ ruby -00 -ne 'puts $_.scan(/http.[^>"]*/);' file

Last edited by kurumi; 09-29-2010 at 06:54 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Help using awk,sed and grep shakes82 Programming 34 07-07-2010 11:12 PM
[SOLVED] awk, sed, grep and paragraphs ThinkLinux Linux - Newbie 3 04-09-2010 01:22 PM
help with grep/sed/awk nikunjbadjatya Programming 8 02-17-2010 07:29 PM
Using Grep/Awk/Sed to get a substring from a command johnjust Programming 5 01-12-2010 08:02 PM
awk/sed to grep the text ahpin Linux - Software 3 10-17-2007 12:34 AM


All times are GMT -5. The time now is 12:08 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration