LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-19-2004, 06:12 AM   #1
kepler
LQ Newbie
 
Registered: Jan 2004
Distribution: Gentoo, RedHat, SUSE, Debian
Posts: 10

Rep: Reputation: 0
bash script, parsing email addresses


Hello All,
Hope someone might be able to point me in the right direction with this problem. I'm trying to generate a report using a bash script that lists out a pile of email addresses and the amount of times they appear in a log file (Spammers) but I only want to search for these email addresses by the 'domain name'.
eg. I have the following email addresses in my log file:

..
cs6710132-199.houston.rrbbww.com
adsl-66-127-81-138.dsl.sntc01.pacabell342.net
earthping.co.uk
..

and I want to only cut out the last two or three parts of the domain name
i.e:

houston.rrbbww.com
sntc01.pacabell342.net
earthping.co.uk

I've tried using the cut command like so:

cut -d. -f2-5 logfile > result

which doesn't really work as it assumes that leaving out the first set of characters before the first period is enough... which it ain't! I need to work from the end and work backwards. Any ideas??

K.
 
Old 01-19-2004, 06:38 AM   #2
leonscape
Senior Member
 
Registered: Aug 2003
Location: UK
Distribution: Debian SID / KDE 3.5
Posts: 2,313

Rep: Reputation: 48
You could try loading the string into the script and use the bash string handling. A regex maybe useful ( or maybe not ).
 
Old 01-19-2004, 06:48 AM   #3
kepler
LQ Newbie
 
Registered: Jan 2004
Distribution: Gentoo, RedHat, SUSE, Debian
Posts: 10

Original Poster
Rep: Reputation: 0
Thanks for the quick reply.

I'm new to all this so can you explain the bash string handler to me? or just a name of the command or something.

The recomp looks a bit complex for me to use... going by it's man page.

The amount of records I'm sorting through is roughy 1.3 million, so I can't hold it all in memory, I must dump everything into files during the whole process.
 
Old 01-19-2004, 07:30 AM   #4
leonscape
Senior Member
 
Registered: Aug 2003
Location: UK
Distribution: Debian SID / KDE 3.5
Posts: 2,313

Rep: Reputation: 48
Okay. First you might want to look at the Advanced Bash-Scripting Guide which has a lot of info about this.

Awk stuff is probably what your looking for.
 
Old 01-20-2004, 10:29 AM   #5
kepler
LQ Newbie
 
Registered: Jan 2004
Distribution: Gentoo, RedHat, SUSE, Debian
Posts: 10

Original Poster
Rep: Reputation: 0
Fair enough.
I read through the awk page and tried using awk -F. '{print $fieldno}' to seperate out the email addresses into different fields. However since the amount of actually 'fields' vary from address to address I'm kinda back to square one. Is there something handy that will allow me to go directly to the last field for each email address and work backwards from there.
 
Old 01-20-2004, 02:48 PM   #6
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
Use IFS and read
Code:
#! /bin/sh
email="somebody@some.domain.name.com"
IFS=@
echo $email | read uname dmname
IFS=.
echo $dmname | read var1 var 2 var 3 var4 var5
unset IFS
Anytime there are fewer than 5 "parts" the trailing variables will be null.
dmname has the name of the domain, plus the hostname sometimes.
var1...var5 parse out each component so you can use them.
 
Old 01-26-2004, 06:47 AM   #7
kepler
LQ Newbie
 
Registered: Jan 2004
Distribution: Gentoo, RedHat, SUSE, Debian
Posts: 10

Original Poster
Rep: Reputation: 0
Well thanks for all the advice, I've managed a way to do what I need to do. However the code isn't the best and can crash out under certain circumstances but here it is:

cat emaillisting | awk -F. '{print $(( NF - 1 )) "." $NF}' > domainsfile

(NF = Number of Fields)

will output the last two parts of the domain name, by changing the print section to:

{print $(( NF - 2 )) "." $(( NF - 1 )) "." $NF}

will output the last three parts etc. etc.

Though this can spit out an error if the original email address has no periods, which is the case with a lot of these spammers.

K.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing a File in a Bash Script TGWDNGHN Programming 4 12-02-2005 02:38 PM
Write a script to send an email from bash kpelczar Linux - Software 5 02-09-2005 04:19 PM
Bash script to email setuid root files deoren LinuxQuestions.org Member Success Stories 1 01-30-2005 09:56 AM
Bash script to alert by email 3 times then stop. pmpc00 Linux - General 2 11-04-2004 07:23 AM
bash script to email updated ip ericnmu Linux - Networking 1 10-01-2004 10:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:55 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration