LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   count the unique ip addresses for each five minute interval? (https://www.linuxquestions.org/questions/programming-9/count-the-unique-ip-addresses-for-each-five-minute-interval-4175479489/)

socalheel 10-03-2013 01:00 PM

count the unique ip addresses for each five minute interval?
 
i was tasked to review our maillogs to determine if any unique ip address has exceeded 30 connections in a five minute interval.

can you guys/gals suggest the best way to tackle this?

here is a sample of my log file:

Code:

Oct 1 22:30:22 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:23 unknown[217.153.227.194]
Oct 1 22:30:23 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:25 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:26 173-14-15-134-colorado.hfc.comcastbusiness.net[173.14.15.134]
Oct 1 22:30:27 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:29 71-212-228-139.hlrn.qwest.net[71.212.228.139]
Oct 1 22:30:31 mail.adminco.com[50.193.158.201]
Oct 1 22:30:32 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:34 adsl-75-36-216-68.dsl.pltn13.sbcglobal.net[75.36.216.68]
Oct 1 22:30:44 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:46 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:47 rrcs-24-43-8-15.west.biz.rr.com[24.43.8.15]
Oct 1 22:30:47 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:50 unknown[166.181.67.138]
Oct 1 22:30:50 nw1-dsl-74-215-142-200.fuse.net[74.215.142.200]
Oct 1 22:30:51 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 nccptvaprcprd06.corp.lpl.com[10.22.92.126]
Oct 1 22:30:53 unknown[216.164.161.6]
Oct 1 22:30:55 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:56 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:58 unknown[162.17.88.6]
Oct 1 22:30:59 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:31:02 unknown[172.56.17.30]
Oct 1 22:31:04 mail.adrimar.org[50.78.121.129]
Oct 1 22:31:05 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:05 unknown[172.56.17.30]
Oct 1 22:31:06 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:08 unknown[172.56.17.30]
Oct 1 22:31:11 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:15 unknown[195.39.240.4]


danielbmartin 10-03-2013 01:13 PM

Quote:

Originally Posted by socalheel (Post 5039459)
i was tasked to review our maillogs to determine if any unique ip address has exceeded 30 connections in a five minute interval.

Please elaborate. Does the 30-connection threshold apply to any five-minute interval (such as 22:31:17-22:36:16) or only to crisp five-minute blocks (such as 22:30:00-22:34:59 and 22:35:00-22:39:59)? This is not a nit-picking detail; the complexity and processing time for the former case could be considerably more.

Daniel B. Martin

schneidz 10-03-2013 01:24 PM

you can cron this every minute:
Code:

t0=`date +'%b%e %R'`
t1=`date +'%b%e %R' --date="1 minute ago"`
t2=`date +'%b%e %R' --date="2 minutes ago"`
t3=`date +'%b%e %R' --date="3 minutes ago"`
t4=`date +'%b%e %R' --date="4 minutes ago"`
t5=`date +'%b%e %R' --date="5 minutes ago"`


egrep "($t0|$t1|$t2|$t3|$t4|$t5)" /var/log/httpd/access_log | awk -F \[ '{print $2}' | sort | uniq -c | sort -n | awk '$1 > 30  {print $0}'


socalheel 10-03-2013 01:27 PM

Quote:

Originally Posted by danielbmartin (Post 5039468)
Please elaborate. Does the 30-connection threshold apply to any five-minute interval (such as 22:31:17-22:36:16) or only to crisp five-minute blocks (such as 22:30:00-22:34:59 and 22:35:00-22:39:59)? This is not a nit-picking detail; the complexity and processing time for the former case could be considerably more.

Daniel B. Martin


that's a good question.

for this purpose, it's going to be a fixed five minute period, i.e. 12:00:00-12:04:59.

danielbmartin 10-03-2013 03:08 PM

With this InFile ...
Code:

Oct 1 22:30:34 adsl-75-36-216-68.dsl.pltn13.sbcglobal.net[75.36.216.68]
Oct 1 22:30:44 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:46 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:47 rrcs-24-43-8-15.west.biz.rr.com[24.43.8.15]
Oct 1 22:30:47 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:50 unknown[166.181.67.138]
Oct 1 22:30:50 nw1-dsl-74-215-142-200.fuse.net[74.215.142.200]
Oct 1 22:30:51 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 nccptvaprcprd06.corp.lpl.com[10.22.92.126]
Oct 1 22:30:53 unknown[216.164.161.6]
Oct 1 22:30:55 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:56 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:58 unknown[162.17.88.6]
Oct 1 22:30:59 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:31:02 unknown[172.56.17.30]
Oct 1 22:31:04 mail.adrimar.org[50.78.121.129]
Oct 1 22:31:05 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:05 unknown[172.56.17.30]
Oct 1 22:31:06 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:08 unknown[172.56.17.30]
Oct 1 22:31:11 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:15 unknown[195.39.240.4]

... this code ...
Code:

tr " []" ":" <$InFile  \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}'  \
|sort                \
|uniq -c            \
|sort -rk1,1        \
|awk '$1>3 {print}'  \
>$OutFile

... produced this OutFile ...
Code:

      8 Oct 1 22:30 99.120.86.202
      5 Oct 1 22:30 50.158.213.87

Daniel B. Martin

socalheel 10-04-2013 08:27 AM

that's awesome daniel - thank you.

would it be too much for you to explain that code? it will help me learn the hows and whys you did that.

danielbmartin 10-04-2013 09:58 AM

Quote:

Originally Posted by socalheel (Post 5039996)
would it be too much for you to explain that code? it will help me learn the hows and whys you did that.

Starting with this InFile ...
Code:

Oct 1 22:30:22 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:23 unknown[217.153.227.194]
Oct 1 22:30:23 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:25 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:26 173-14-15-134-colorado.hfc.comcastbusiness.net[173.14.15.134]
Oct 1 22:30:27 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:29 71-212-228-139.hlrn.qwest.net[71.212.228.139]
Oct 1 22:30:31 mail.adminco.com[50.193.158.201]
Oct 1 22:30:32 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:34 adsl-75-36-216-68.dsl.pltn13.sbcglobal.net[75.36.216.68]
Oct 1 22:30:44 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:46 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:47 rrcs-24-43-8-15.west.biz.rr.com[24.43.8.15]
Oct 1 22:30:47 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:50 unknown[166.181.67.138]
Oct 1 22:30:50 nw1-dsl-74-215-142-200.fuse.net[74.215.142.200]
Oct 1 22:30:51 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 nccptvaprcprd06.corp.lpl.com[10.22.92.126]
Oct 1 22:30:53 unknown[216.164.161.6]
Oct 1 22:30:55 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:56 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:58 unknown[162.17.88.6]
Oct 1 22:30:59 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:31:02 unknown[172.56.17.30]
Oct 1 22:31:04 mail.adrimar.org[50.78.121.129]
Oct 1 22:31:05 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:05 unknown[172.56.17.30]
Oct 1 22:31:06 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:08 unknown[172.56.17.30]
Oct 1 22:31:11 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:15 unknown[195.39.240.4]

... we do a tr which replaces blanks and bracket symbols with colons to simplify subsequent parsing.
Code:

tr " []" ":" <$InFile  \
>$OutFile

This is the intermediate result ...
Code:

Oct:1:22:30:22:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:23:unknown:217.153.227.194:
Oct:1:22:30:23:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:25:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:26:173-14-15-134-colorado.hfc.comcastbusiness.net:173.14.15.134:
Oct:1:22:30:27:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:29:71-212-228-139.hlrn.qwest.net:71.212.228.139:
Oct:1:22:30:31:mail.adminco.com:50.193.158.201:
Oct:1:22:30:32:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:34:adsl-75-36-216-68.dsl.pltn13.sbcglobal.net:75.36.216.68:
Oct:1:22:30:44:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:46:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:47:rrcs-24-43-8-15.west.biz.rr.com:24.43.8.15:
Oct:1:22:30:47:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:50:unknown:166.181.67.138:
Oct:1:22:30:50:nw1-dsl-74-215-142-200.fuse.net:74.215.142.200:
Oct:1:22:30:51:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:52:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:52:nccptvaprcprd06.corp.lpl.com:10.22.92.126:
Oct:1:22:30:53:unknown:216.164.161.6:
Oct:1:22:30:55:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:56:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:58:unknown:162.17.88.6:
Oct:1:22:30:59:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:31:02:unknown:172.56.17.30:
Oct:1:22:31:04:mail.adrimar.org:50.78.121.129:
Oct:1:22:31:05:mobile-166-147-069-112.mycingular.net:166.147.69.112:
Oct:1:22:31:05:unknown:172.56.17.30:
Oct:1:22:31:06:mobile-166-147-069-112.mycingular.net:166.147.69.112:
Oct:1:22:31:08:unknown:172.56.17.30:
Oct:1:22:31:11:mobile-166-147-069-112.mycingular.net:166.147.69.112:
Oct:1:22:31:15:unknown:195.39.240.4:

Now we do an awk which copies useful portions of each line, loses the colons, and "plugs in" a colon between hour number and minute number for cosmetic value. Most important: $4=5*int($4/5) rounds the minute value down to 0 or 5. It does this by doing an integer divide by 5 and a multiply by 5.
Code:

tr " []" ":" <$InFile  \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}'  \
>$OutFile

This is the intermediate result ...
Code:

Oct 1 22:30 50.158.213.87
Oct 1 22:30 217.153.227.194
Oct 1 22:30 50.158.213.87
Oct 1 22:30 50.158.213.87
Oct 1 22:30 173.14.15.134
Oct 1 22:30 50.158.213.87
Oct 1 22:30 71.212.228.139
Oct 1 22:30 50.193.158.201
Oct 1 22:30 50.158.213.87
Oct 1 22:30 75.36.216.68
Oct 1 22:30 99.120.86.202
Oct 1 22:30 99.120.86.202
Oct 1 22:30 24.43.8.15
Oct 1 22:30 99.120.86.202
Oct 1 22:30 166.181.67.138
Oct 1 22:30 74.215.142.200
Oct 1 22:30 99.120.86.202
Oct 1 22:30 99.120.86.202
Oct 1 22:30 10.22.92.126
Oct 1 22:30 216.164.161.6
Oct 1 22:30 99.120.86.202
Oct 1 22:30 99.120.86.202
Oct 1 22:30 162.17.88.6
Oct 1 22:30 99.120.86.202
Oct 1 22:30 172.56.17.30
Oct 1 22:30 50.78.121.129
Oct 1 22:30 166.147.69.112
Oct 1 22:30 172.56.17.30
Oct 1 22:30 166.147.69.112
Oct 1 22:30 172.56.17.30
Oct 1 22:30 166.147.69.112
Oct 1 22:30 195.39.240.4

Now we do a sort and a uniq with count. The sort is required to enable the uniq to work. The count field identifies the most-frequent appearances of distinct ip addresses.
Code:

tr " []" ":" <$InFile  \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}'  \
|sort                  \
|uniq -c              \
>$OutFile

This is the intermediate result ...
Code:

      1 Oct 1 22:30 10.22.92.126
      1 Oct 1 22:30 162.17.88.6
      3 Oct 1 22:30 166.147.69.112
      1 Oct 1 22:30 166.181.67.138
      3 Oct 1 22:30 172.56.17.30
      1 Oct 1 22:30 173.14.15.134
      1 Oct 1 22:30 195.39.240.4
      1 Oct 1 22:30 216.164.161.6
      1 Oct 1 22:30 217.153.227.194
      1 Oct 1 22:30 24.43.8.15
      5 Oct 1 22:30 50.158.213.87
      1 Oct 1 22:30 50.193.158.201
      1 Oct 1 22:30 50.78.121.129
      1 Oct 1 22:30 71.212.228.139
      1 Oct 1 22:30 74.215.142.200
      1 Oct 1 22:30 75.36.216.68
      8 Oct 1 22:30 99.120.86.202

Now we do a sort on the left-most field (that's the count of occurrences) in descending order.
Code:

tr " []" ":" <$InFile  \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}'  \
|sort                  \
|uniq -c              \
|sort -rk1,1          \
>$OutFile

This is the intermediate result ...
Code:

      8 Oct 1 22:30 99.120.86.202
      5 Oct 1 22:30 50.158.213.87
      3 Oct 1 22:30 172.56.17.30
      3 Oct 1 22:30 166.147.69.112
      1 Oct 1 22:30 75.36.216.68
      1 Oct 1 22:30 74.215.142.200
      1 Oct 1 22:30 71.212.228.139
      1 Oct 1 22:30 50.78.121.129
      1 Oct 1 22:30 50.193.158.201
      1 Oct 1 22:30 24.43.8.15
      1 Oct 1 22:30 217.153.227.194
      1 Oct 1 22:30 216.164.161.6
      1 Oct 1 22:30 195.39.240.4
      1 Oct 1 22:30 173.14.15.134
      1 Oct 1 22:30 166.181.67.138
      1 Oct 1 22:30 162.17.88.6
      1 Oct 1 22:30 10.22.92.126

Now we do an awk to keep only those lines with a frequency count above a chosen threshold. I chose 3 for this example but you would use another value (such as 30) with real-world data.
Code:

tr " []" ":" <$InFile  \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}'  \
|sort                  \
|uniq -c              \
|sort -rk1,1          \
|awk '$1>3 {print}'    \
>$OutFile

This is the finished product...
Code:

      8 Oct 1 22:30 99.120.86.202
      5 Oct 1 22:30 50.158.213.87

Daniel B. Martin

socalheel 10-04-2013 10:04 AM

thank you so much daniel. you have no idea how much this helps me understand text manipulation.

danielbmartin 10-04-2013 10:16 AM

Quote:

Originally Posted by socalheel (Post 5040051)
thank you so much daniel. you have no idea how much this helps me understand text manipulation.

Here is a technique which may help you develop and debug this kind of code. Use tee to materialize the intermediate results in a series of work files. Then inspect each of the work files to see if you got the expected intermediate results.
Code:

tr " []" ":" <$InFile  \
|tee $Work1            \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}'  \
|tee $Work2            \
|sort                  \
|uniq -c              \
|tee $Work3            \
|sort -rk1,1          \
|tee $Work4            \
|awk '$1>3 {print}'    \
>$OutFile

When the code works as intended, remove all the tees and delete the work files.

Daniel B. Martin

grail 10-04-2013 12:05 PM

Here is a slightly cleaner version:
Code:

#!/usr/bin/awk -f

BEGIN { FS = "[][ :]"
        PROCINFO["sorted_in"] = "@val_num_desc"
        min = 3
    } 

NR == 1 { s = $4 }

$4 >= s + 5 {
    if($4 in s)
        s = $4 c++
    else
        s = $4
}

{ a[s][$(NF-1)]++ }

END{
    for(i in a)
        for(j in a[i])
            if(a[i][j] > min)
                print a[i][j],j
}

You run it like so (after making it executable):
Code:

$ ./script.awk file
8 99.120.86.202
5 50.158.213.87

This simply outputs the counts and the ips, so if you want the time portion as well you may need to play a little further, but it gives you something to think about :)

danielbmartin 10-08-2013 10:15 AM

I took my previous solution and hammered it into a one-liner.

This code ...
Code:

awk 'BEGIN {FS = "[][ :]"}
          {a[$1" "$2" "$3":"5*int($4/5)" "$7]++}
        END{for (j in a) {if (a[j]>3) print a[j],j}}' $InFile >$OutFile

... produced this OutFile ...
Code:

8 Oct 1 22:30 99.120.86.202
5 Oct 1 22:30 50.158.213.87

Daniel B. Martin

grail 10-08-2013 03:44 PM

One thing to be cautious of Daniel is that using 'j in a' by default has no sort order so your current output would be deemed lucky

danielbmartin 10-08-2013 04:26 PM

Quote:

Originally Posted by grail (Post 5042336)
One thing to be cautious of Daniel is that using 'j in a' by default has no sort order so your current output would be deemed lucky

The problem statement didn't require the output to be sorted. It didn't even require the number of observations to be printed. It asked only for a list of distinct ip addresses which were "heavy hitters."

Daniel B. Martin


All times are GMT -5. The time now is 10:18 PM.