LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-10-2010, 01:05 PM   #1
S1GNZ
LQ Newbie
 
Registered: Apr 2010
Posts: 6

Rep: Reputation: 0
Count domains with AWK


Hi all,

I hope I could get some help with AWK.

What I need to achieve is counting all domains in an e-mail log file. The values are in field 7 ($7), and I need to count each domain separately.

This is what I have so far:

awk '$7=/@./ {som = som+1}; END {print som}' seip1_1_.log

But this results in counting 'all' the domains which appear behind the @.

I want this result:

@domain.com 22
@otherdomain.net 12
@somethingelse.org 5
@other.biz 3

Where the numbers are the amount of results of that domain. The AWK needs to be a oneliner.

I really hope someone can help me with this!

Thanks in advance!!

Sincerely yours,

S1GNZ
 
Old 04-10-2010, 01:19 PM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Is this what you are looking for?

awk '{ countArray[$7]++ } END { for (j in countArray) print j,countArray[j] }' infile

Hope this helps.

Last edited by druuna; 04-10-2010 at 01:21 PM.
 
Old 04-10-2010, 07:28 PM   #3
S1GNZ
LQ Newbie
 
Registered: Apr 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by druuna View Post
Hi,

Is this what you are looking for?

awk '{ countArray[$7]++ } END { for (j in countArray) print j,countArray[j] }' infile

Hope this helps.
Almost..! Almost..! (end of the reply is a list)

Now I get a nice list of all the e-mail addresses and how many e-mails that has been send from these addresses. But I need only the domains, now that's just a small thing to change in the code but I don't really know how to do that.

What I got is this:

awk 'BEGIN {$8~/^@./} {countArray[$8]++ } END { for (j in countArray) print j,countArray[j] }' seip1_1_.log

Because if you change countArray[$8]++ to countArray[$8~/^@./] it counts all the files with the specified filter. And if you change the way it prints, does it still counts correctly, or do they add up correctly?

Thanks, and thanks in advance again .

Sincerely yours,

S1GNZ

p.s, it might be handy to post a part of the outcome

<KAAG@Myanmar.com> 2
<HOOFDDORP@Canada.com> 2
<WIERINGERWAARD@Canada.com> 2
<ZAANDAM@Filipijnen.com> 2
<RIJNSATERWOUDE@Ascension.com> 3
<WARMOND@Colombia.com> 2
<GROET@Mali.com> 2
<WIERINGERWERF@Canada.com> 2
<OPPERDOES@Ivoorkust.com> 2
<HOOGKARSPEL@Canada.com> 2
<ZUIDOOSTBEEMSTER@Canada.com> 2
<OVERVEEN@Ghana.com> 2
<EGMOND-BINNEN@Rhodesie.com> 2
<BERGEN_NH@Mali.com> 1
<ZUID-SCHARWOUDE@Canada.com> 2
<MARKENBINNEN@Canada.com> 2
<STARNMEER@Canada.com> 2
<VOLENDAM@Malawi.com> 264
remote.LUCHTHAVEN_SCHIPHOL@Zwitserland.com 24
<UITDAM@Cuba.com> 34
<OTERLEEK@Canada.com> 2
<ZWAAGDIJK@Rhodesie.com> 2
<AVENHORN@Canada.com> 2
<KOUDEKERK_AAN_DEN_RIJN@Canada.com> 2
<HOOGMADE@Canada.com> 2
<ZWAANSHOEK@Canada.com> 2
<DEN_HOORN_TEXEL@Canada.com> 2
<HOOGWOUD@Canada.com> 2
<OUDE_WETERING@Rhodesie.com> 2
<WATERINGEN@Kroatie.com> 2
<ZUIDSCHERMER@Canada.com> 2
<RIJNSBURG@Frankrijk.com> 5
<SCHAGERBRUG@Albanie.com> 4
<DIRKSHORN@Canada.com> 2
<RIJNSBURG@Slovenie.com> 1
<HEILOO@Mali.com> 2
<ZUIDERMEER@Canada.com> 2
<OUDE_NIEDORP@Monaco.com> 16
<SCHARWOUDE@Zambia.com> 4
remote.LEIMUIDERBRUG@Vaticaanstad.com 2
<SANTPOORT_ZUID@Australisch_Nieuwguinea.com> 2
<UITGEEST@Canada.com> 2
<OUDENDIJK_NH@Canada.com> 2
remote.SPIJKERBOOR_NH@Zwitserland.com 4
<AALSMEERDERBRUG@Marokko.com> 4
remote.VALKENBURG_ZH@Zwitserland.com 1
<KATWIJK_ZH@Jemen.com> 2
<HEEMSTEDE@Canada.com> 2
<HEEMSKERK@Zweden.com> 2
<DEN_HELDER@Canada.com> 2
<T_VELD@Oostenrijk.com> 1
<WATERGANG@Togo.com> 8
remote.MONNICKENDAM@Zwitserland.com 132
<OUDESCHILD@Canada.com> 2
<DEN_OEVER@Canada.com> 2
<SCHARDAM@Canada.com> 2
<RIJPWETERING@Rusland.com> 10
<DEN_BURG@Canada.com> 2
<OUDESLUIS@Canada.com> 2
<SCHAGEN@Canada.com> 2
<LUTJEWINKEL@Canada.com> 2
<ZWAAG@Mali.com> 1
<ILPENDAM@Oostenrijk.com> 1
remote.ZUIDERWOUDE@Zwitserland.com 2
<HILVERSUM@Iran.com> 2
remote.MUIDERBERG@Zwitserland.com 12
remote.ZWANENBURG@Zwitserland.com 17
<LUTJEBROEK@Canada.com> 2
<KATWOUDE@Cuba.com> 4
remote.KORTENHOEF@Zwitserland.com 1


Last edited by S1GNZ; 04-10-2010 at 07:34 PM.
 
Old 04-11-2010, 12:24 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Actually it would be more helpful to post some of the input.
However, I did also notice that we have now changed from $7 to $8??
 
Old 04-11-2010, 03:44 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi again,

Without the appropriate information this will go nowhere.

Please post a few relevant lines of the input file and the desired output for those posted lines.
 
Old 04-11-2010, 06:44 AM   #6
S1GNZ
LQ Newbie
 
Registered: Apr 2010
Posts: 6

Original Poster
Rep: Reputation: 0
This is a part of the file as an example

Code:
d k 1004083501.83190500 1004083501.156831500 1004083501.323597500 2950 <AMSTERDAM@Canada.com> local.AMSTERDAM_ZUIDOOST@Frankrijk.com 9238 81 

m 1004083501.83190500 1004083501.331906500 2950 1 0 0 <AMSTERDAM@Canada.com> 9238 81 

d k 1004083501.266650500 1004083501.323587500 1004083501.569195500 3076 <AMSTERDAM@Canada.com> remote.DIEMEN@Zwitserland.com 9246 512 

m 1004083501.266650500 1004083501.583383500 3076 1 0 0 <AMSTERDAM@Canada.com> 9246 512 

d k 1004083592.200293500 1004083592.265569500 1004083592.432499500 2295 <DUIVENDRECHT@Rhodesie.com> local.SCHIPHOL@Frankrijk.com 9293 81 

m 1004083592.200293500 1004083592.440690500 2295 1 0 0 <DUIVENDRECHT@Rhodesie.com> 9293 81 

d k 1004083592.375353500 1004083592.432487500 1004083592.728287500 2425 <DUIVENDRECHT@Rhodesie.com> remote.LUCHTHAVEN_SCHIPHOL@Zwitserland.com 9297 512
Now I'm sorry for the change from field 7 to 8, I haven't noticed it was the sender and the receiver.

But the outcome I would like to have is:

edit, I now know the outcome I need to have

Domain Received Send
Canada.com 50 0
Malawi.com 61 0
Volendam.com 0 32


etc..

Thanks once again

Last edited by S1GNZ; 04-11-2010 at 08:14 AM.
 
Old 04-11-2010, 07:36 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

You changed the output again. I'm not sure why you need/want 2 numbers behind the domain (Canada.com 50 0 vs Canada.com 50). I'm going to assume that you want to see the name of the domain and the amount (ie: Canada.com 50).

awk '/^[a-z]/ { gsub(/.*@/,"",$8) ; gsub(/>/,"",$8) ; countArray[$8]++ } END { for (j in countArray) print j,countArray[j] }' infile

/^[a-z]/ => only lines that start with a-z

gsub(/.*@/,"",$8) => strip all up to and including the @ from $8

gsub(/>/,"",$8) => strip the > (if present) from $8

countArray[$8]++ => increase counter for specific array

END { for (j in countArray) print j,countArray[j] } => print what is found.

Sample run:
Code:
$ cat infile
d k 1004083501.83190500 1004083501.156831500 1004083501.323597500 2950 <AMSTERDAM@Canada.com> local.AMSTERDAM_ZUIDOOST@Frankrijk.com 9238 81 

m 1004083501.83190500 1004083501.331906500 2950 1 0 0 <AMSTERDAM@Canada.com> 9238 81 

d k 1004083501.266650500 1004083501.323587500 1004083501.569195500 3076 <AMSTERDAM@Canada.com> remote.DIEMEN@Zwitserland.com 9246 512 

m 1004083501.266650500 1004083501.583383500 3076 1 0 0 <AMSTERDAM@Canada.com> 9246 512 

d k 1004083592.200293500 1004083592.265569500 1004083592.432499500 2295 <DUIVENDRECHT@Rhodesie.com> local.SCHIPHOL@Frankrijk.com 9293 81 

m 1004083592.200293500 1004083592.440690500 2295 1 0 0 <DUIVENDRECHT@Rhodesie.com> 9293 81 

d k 1004083592.375353500 1004083592.432487500 1004083592.728287500 2425 <DUIVENDRECHT@Rhodesie.com> remote.LUCHTHAVEN_SCHIPHOL@Zwitserland.com 9297 512

$ awk '/^[a-z]/ { gsub(/.*@/,"",$8) ; gsub(/>/,"",$8) ; countArray[$8]++ } END { for (j in countArray) print j,countArray[j] }' infile
Zwitserland.com 2
Canada.com 2
Frankrijk.com 2
Rhodesie.com 1
Hope this is what you where looking for.
 
Old 04-11-2010, 08:00 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
You can place your gsubs together as well:

Code:
awk '!/^$/{gsub(/.*@|>/,"",$8);_[$8]++}END{for (i in _)print i, _[i]}' in.txt
 
Old 04-11-2010, 08:03 AM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Quote:
You can place your gsubs together as well:
I didn't know that. It's always nice to learn something!

Changing /^[a-z]/ to !/^$/ is also an improvement.
 
Old 04-11-2010, 08:22 AM   #10
S1GNZ
LQ Newbie
 
Registered: Apr 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Ok I made a typo .

This is for school and the outcome needs to be:

Domain Received Send
Canada.com 50 0
Malawi.com 61 0
Volendam.com 0 32

And in the file field $7 is the sender and field $8 is the receiver, and basically I need to achieve the above.

Can you also use split instead of gsub? Because we need to use arrays and splits if I'm correct.

But this already helps us a bunch!

Thanks,

Sincerely yours,

S1GNZ


p.s

maybe this works without a oneliner because we can also use a script.

Last edited by S1GNZ; 04-11-2010 at 08:24 AM.
 
Old 04-11-2010, 09:03 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
As this is homework I think we have provided you with all the tools you will need.
Should be easy enough to get your field 7 details if you look at the code.

btw. we are already using arrays. If you require the use of split then look at
the details in your text on that and you should be sweet.
 
Old 12-11-2010, 03:42 PM   #12
teun-arno
LQ Newbie
 
Registered: Dec 2010
Posts: 1

Rep: Reputation: 0
Smile late reply on this post

Hello

Getting the domains from a mail server loggings : ( For the input files see one of the above mails with "cat infile"

Maybe I got the columns the other way around but it seems to work.


Maybe this will help somebody in future : ( and maybe somebody has some comments about this solution )

#!/usr/bin/awk -f
######################################################
#Domein Ontvangen ($8=to) Verzonden ($7=from)
#Canada.com 50
#Malawi.com 61 0
#Volendam.com 0 32
########################################################

BEGIN{
printf ("%30.30s %-10.10s %-10.10s\n", "Domain" , "Ontvangen" , "Verzonden" )
}

$1 == "d" && split ( $8, teller, "@" ) && split ( teller[2], result, "." ) { ++Domains_to[result[1]] }
$1 == "d" && split ( $7, teller, "@" ) && split ( teller[2], result, "." ) { ++Domains_from[result[1]] }


END{

# which domains are in Domains_to and Domains_from

for ( domain in Domains_to ) {
all_domains[domain] = 1
}
for (domain in Domains_from ) {
all_domains[domain] = 1
}


for ( domain in all_domains) {
printf ( "%30.30s" , domain ".com" )
if ( domain in Domains_to )
printf ("%10.10s", Domains_to[domain] )
else
printf ("%10.10s" , "0")


if ( domain in Domains_from )
printf ("%10.10s\n", Domains_from[domain] )
else
printf ("%10.10s\n" , "0")


}
}


Output :
# ./filter2_4.awk seip1_1.log
Domain Ontvangen Verzonden
Angola.com 0 2
Hongarije.com 3 0
Burma.com 0 6
Irak.com 0 2
Taiwan.com 0 10
Ghana.com 0 4
Iran.com 0 2
Tunesie.com 12 16
Nieuwzeeland.com 39 0
Ethiopie.com 0 47
Zwitserland.com 418 0
Mauritius.com 2 0
Zuidvietnam.com 0 2
Zaire.com 0 2
Kameroen.com 0 2
Togo.com 0 8
Albanie.com 0 8
Vaticaanstad.com 2 2
Liberia.com 0 8
Laos.com 0 2
Chili.com 43 0
Ivoorkust.com 0 2
Cyprus.com 0 2
Kashmir.com 0 4
Saudi-Arabie.com 0 2
Singapore.com 0 2
Paraguay.com 0 4
Kroatie.com 0 2
Denemarken.com 0 2
Mali.com 0 28
Rusland.com 0 10
Slovenie.com 0 1
Suriname.com 7 12
Zuidafrika.com 0 96
Rhodesie.com 0 48
Brunei.com 0 2
Belgie.com 55 0
Jemen.com 0 4
Botswana.com 24 0
Noordjemen.com 1 0
Australisch_Nieuwguinea.com 0 2
Zambia.com 0 4
Canada.com 0 227
Colombia.com 0 8
Armenie.com 0 2
Ascension.com 0 4
Marokko.com 0 13
Frankrijk.com 357 12
Myanmar.com 0 2
Monaco.com 0 36
Cuba.com 0 38
Malawi.com 0 265
Filipijnen.com 0 4
Zweden.com 0 2
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
DBD::mysql::st execute failed: Column count doesn't match value count at row 1 shifter Programming 2 02-24-2010 07:42 PM
AWK count kj6loh Programming 1 09-07-2009 09:50 PM
AWK: print field to end, and character count? ridertech Linux - Newbie 1 05-07-2004 05:07 PM
Should posts in general count on your post count? Joey.Dale General 16 01-27-2004 01:31 AM
count bytes with awk alaios Linux - General 8 05-13-2003 06:41 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:05 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration