LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 03-14-2010, 01:52 PM   #1
flea89
LQ Newbie
 
Registered: Mar 2010
Posts: 15

Rep: Reputation: 0
Looking for an alternative 'cleaner' solution


Well i have the following script which takes as input parameter a webpage
and extract the emails from it.
The results are correct but i was wondering whether i could rewrite it
in a better, 'cleaner' manner.
I think my for loop is a bit ...primitive and also each email is printed
twice
Code:
#!/bin/bash


echo $1

echo "The script starts now."

echo "Hi, $USER!"

wget $1 -O webpage.txt

awk '
{
  for (i=1;i<=NF;i++) {
       if ( $i ~ /[[:alpha:]]@[[:alpha:]]/ )  {
      print $i
       }
  }
}' webpage.txt
My main concern is the part inside awk '{...}'
Any ideas ?
 
Old 03-14-2010, 01:56 PM   #2
AlucardZero
Senior Member
 
Registered: May 2006
Location: USA
Distribution: Debian
Posts: 4,652

Rep: Reputation: 536Reputation: 536Reputation: 536Reputation: 536Reputation: 536Reputation: 536
grep -o "[[:alpha:]]@[[:alpha:]]" webpage.txt

however your regex won't pick up the TLD so you'll have to expand it.
 
Old 03-14-2010, 02:07 PM   #3
flea89
LQ Newbie
 
Registered: Mar 2010
Posts: 15

Original Poster
Rep: Reputation: 0
i tried your code an my output is
g@c
g@c
o@c
o@c
s@c
s@c
s@c
s@c
s@c
s@c
r@c
r@c

which means for some reason the emails arent printed correctly... Does this have to do with the regex you mentioned? which i dont know what it is by the way
 
Old 03-14-2010, 03:27 PM   #4
AlucardZero
Senior Member
 
Registered: May 2006
Location: USA
Distribution: Debian
Posts: 4,652

Rep: Reputation: 536Reputation: 536Reputation: 536Reputation: 536Reputation: 536Reputation: 536
Yes, your regex is wrong. It's the "[[:alpha:]]@[[:alpha:]]" thing. Google for "email regex" to find a better one.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
I Need an IE solution or alternative for school Whiskey Karma Linux - Software 7 01-19-2010 09:46 AM
Firefox Cleaner mlpa Linux - Software 2 08-01-2009 06:14 PM
Text Cleaner nlavon Linux - Software 2 06-10-2008 07:32 AM
LXer: GPLv3: What is an alternative solution? LXer Syndicated Linux News 0 09-27-2006 06:21 AM
alternative easy filtering solution goldeneyexs Linux - Software 0 06-17-2004 08:00 AM


All times are GMT -5. The time now is 03:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration