LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-16-2012, 02:47 PM   #1
Advice Pro
Member
 
Registered: Mar 2009
Location: Virginia, US
Distribution: Ubuntu 10.10 & Debian 6.0.3,
Posts: 343

Rep: Reputation: 7
How do I extract a url with text?


I'm try to extract google search results. I think wget only extracts url's, but maybe curl, sed, or other tools can do this.
 
Old 02-16-2012, 02:59 PM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,417

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Can you explain this in more detail please?
 
Old 02-16-2012, 04:27 PM   #3
Advice Pro
Member
 
Registered: Mar 2009
Location: Virginia, US
Distribution: Ubuntu 10.10 & Debian 6.0.3,
Posts: 343

Original Poster
Rep: Reputation: 7
An Example, a search for the word cherries, renders"

Cherry Wikipedia, the free encyclopedia

Cherries. America's "Super Fruit"

Cherry Health and Cherry Nutrition


I'd like to extract the text with the url embedded.

Last edited by Advice Pro; 02-16-2012 at 04:28 PM.
 
Old 02-16-2012, 04:28 PM   #4
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,225

Rep: Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521Reputation: 2521
so what then ?
there 3 links

what do you want to do ?
 
Old 02-16-2012, 04:30 PM   #5
Advice Pro
Member
 
Registered: Mar 2009
Location: Virginia, US
Distribution: Ubuntu 10.10 & Debian 6.0.3,
Posts: 343

Original Poster
Rep: Reputation: 7
Quote:
Originally Posted by John VV View Post
what do you want to do ?
Quote:
Originally Posted by Advice Pro View Post
I'd like to extract the text with the url embedded.
qdqd
 
Old 02-16-2012, 04:47 PM   #6
devUnix
Member
 
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 584

Rep: Reputation: 59
Use "curl" to get the returned page that contains your search results. Store its contents in a variable or to a file and then pipe the contents to "grep" to extract only hyper links:

You have given "Cherry" above. I right-clicked and view the source and copied those lines then saved them to a file for this example:

Code:
[demo@localhost ~]$ cat data
		<div id="post_message_4604415"><!-- google_ad_section_start -->An Example, a search for the word cherries, renders&quot;<br />
<br />
<a href="http://en.wikipedia.org/wiki/Cherry" target="_blank">Cherry Wikipedia, the free encyclopedia</a><br />
<br />
<a href="http://www.choosecherries.com" target="_blank">Cherries. America's &quot;Super Fruit&quot;</a><br />
<br />
<a href="http://www.choosecherries.com/health/main.aspx" target="_blank">Cherry Health and Cherry Nutrition</a><br />
<br />
<br />
I'd like to extract the text with the url embedded.<!-- google_ad_section_end --></div>

		<!-- / message -->
Code:
cat data | grep -oE '<a href.*</a>' 
<a href="http://en.wikipedia.org/wiki/Cherry" target="_blank">Cherry Wikipedia, the free encyclopedia</a>
<a href="http://www.choosecherries.com" target="_blank">Cherries. America's &quot;Super Fruit&quot;</a>
<a href="http://www.choosecherries.com/health/main.aspx" target="_blank">Cherry Health and Cherry Nutrition</a>
So, you get URLS/Links/Hyperlinks with the text.

Another example:

Code:
echo "$curled" | grep -oE '<a href.*</a>'
Note: The above example assumed that you're storing the output of "curl" command to the named variable, i.e. "curled".

You can further use "sed" to instert "<li>" and "</li>" tags to embed each line so that you have a list of Links.

Code:
cat data | grep -oE '<a href.*</a>' | sed -e 's/^/<li>/' -e 's/$/<\/li>/'
<li><a href="http://en.wikipedia.org/wiki/Cherry" target="_blank">Cherry Wikipedia, the free encyclopedia</a></li>
<li><a href="http://www.choosecherries.com" target="_blank">Cherries. America's &quot;Super Fruit&quot;</a></li>
<li><a href="http://www.choosecherries.com/health/main.aspx" target="_blank">Cherry Health and Cherry Nutrition</a></li>
Now, you can put "<ol>" and "</ol>" or "<ul>" and "</ul>" to surround the entire output to suit your fancy requirement.

Cheers!

Last edited by devUnix; 02-16-2012 at 04:55 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract URL from Firefox address bar in Linux profkhaos Programming 15 11-29-2010 01:27 PM
How to extract particular text in a text file maverick_cat Linux - Newbie 3 07-22-2008 03:44 AM
Extract certain text info from text file xmrkite Linux - Software 30 02-26-2008 12:06 PM
Extract URL's from tons of files xmrkite Linux - Software 5 10-09-2007 05:56 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:13 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration