Download your favorite Linux distribution at LQ ISO.
Go Back > Forums > Linux Forums > Linux - Software
User Name
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.


  Search this Thread
Old 11-15-2004, 01:41 PM   #1
LQ Newbie
Registered: Oct 2004
Location: United States
Distribution: Debian
Posts: 29

Rep: Reputation: 15
Finding Text in an html file

I am trying to put together a series of sctipts to work for me and I have been stuck on this one for some time now. I need a command or serries of commands that will pull an e-mail address out of an html file.

grep "mailto:" file.html
#gives me this output

<td width="275"><font size="2" color="#000000"face="Verdana, Arial, Helvetica, sans-serif">pappy paperson<BR>29 smith<BR>Irvine, CA 92620<BR>US<BR><a href="" style=" "></a><BR></font></td>

I would like to narrow this down to only read the actual e-mail address to drop into a new file. I have limited C++ knowledge from windows but have never been able to convert my program to linux. Any help that you could offer would be greatly appreciated. Thank you.

Last edited by Xaque208; 11-15-2004 at 01:43 PM.
Old 11-15-2004, 04:26 PM   #2
Registered: Dec 2003
Location: NC, US
Distribution: Novell Linux Eval (2.6.5)
Posts: 240

Rep: Reputation: 30
if you're on a linux box, do a man on grep and learn about regular expression, it will limit it down to exactly what you need. maybe something like

cat html | grep ^mailto:[a-zA-Z0-9@]+'.'com$

do a google on regular expression, you will find a lot of good and clear explainations, the one example i gave might not be exactly right, but it's supposed to find everything that starts with "mailto:" and end with ".com"
Old 11-15-2004, 04:58 PM   #3
Registered: Oct 2004
Distribution: Fedora 7, OpenSuse 10.2
Posts: 108

Rep: Reputation: 15
You need a regexp to extract the substring you want, this might work:

MAIL=$(grep "mailto:" file.html)
expr "$MAIL" : '.*mailto\:\(.*\)\" s.*' >mail.txt
Old 11-15-2004, 11:32 PM   #4
LQ Newbie
Registered: Oct 2004
Location: United States
Distribution: Debian
Posts: 29

Original Poster
Rep: Reputation: 15

Thanks for your help, I got what I was looking for.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract spesific text from an HTML file mister_0101 Programming 6 07-24-2005 05:50 PM
reading and printing out a text file in html mrobertson Programming 1 07-09-2005 12:47 AM
Extract text from a html file gsphanikumar6 Linux - Newbie 2 08-20-2004 02:11 PM
Parsing Text from a html file. Rezon Programming 6 10-18-2003 01:09 AM
how can I convert a text file to a html one? kevin_liu Linux - Software 2 07-16-2003 07:09 AM > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:38 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration