LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-15-2004, 12:41 PM   #1
Xaque208
LQ Newbie
 
Registered: Oct 2004
Location: United States
Distribution: Debian
Posts: 29

Rep: Reputation: 15
Finding Text in an html file


Hello,
I am trying to put together a series of sctipts to work for me and I have been stuck on this one for some time now. I need a command or serries of commands that will pull an e-mail address out of an html file.

grep "mailto:" file.html
#gives me this output

<td width="275"><font size="2" color="#000000"face="Verdana, Arial, Helvetica, sans-serif">pappy paperson<BR>29 smith<BR>Irvine, CA 92620<BR>US<BR><a href="mailtoappy@pappy.com" style=" ">pappy@pappy.com</a><BR></font></td>


I would like to narrow this down to only read the actual e-mail address to drop into a new file. I have limited C++ knowledge from windows but have never been able to convert my program to linux. Any help that you could offer would be greatly appreciated. Thank you.

Last edited by Xaque208; 11-15-2004 at 12:43 PM.
 
Old 11-15-2004, 03:26 PM   #2
feetyouwell
Member
 
Registered: Dec 2003
Location: NC, US
Distribution: Novell Linux Eval (2.6.5)
Posts: 240

Rep: Reputation: 30
if you're on a linux box, do a man on grep and learn about regular expression, it will limit it down to exactly what you need. maybe something like

cat html | grep ^mailto:[a-zA-Z0-9@]+'.'com$

do a google on regular expression, you will find a lot of good and clear explainations, the one example i gave might not be exactly right, but it's supposed to find everything that starts with "mailto:" and end with ".com"
 
Old 11-15-2004, 03:58 PM   #3
LasseW
Member
 
Registered: Oct 2004
Distribution: Fedora 7, OpenSuse 10.2
Posts: 108

Rep: Reputation: 15
You need a regexp to extract the substring you want, this might work:

MAIL=$(grep "mailto:" file.html)
expr "$MAIL" : '.*mailto\:\(.*\)\" s.*' >mail.txt
 
Old 11-15-2004, 10:32 PM   #4
Xaque208
LQ Newbie
 
Registered: Oct 2004
Location: United States
Distribution: Debian
Posts: 29

Original Poster
Rep: Reputation: 15
Thanks

Thanks for your help, I got what I was looking for.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract spesific text from an HTML file mister_0101 Programming 6 07-24-2005 04:50 PM
reading and printing out a text file in html mrobertson Programming 1 07-08-2005 11:47 PM
Extract text from a html file gsphanikumar6 Linux - Newbie 2 08-20-2004 01:11 PM
Parsing Text from a html file. Rezon Programming 6 10-18-2003 12:09 AM
how can I convert a text file to a html one? kevin_liu Linux - Software 2 07-16-2003 06:09 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration