danielbmartin |
10-04-2013 09:58 AM |
Quote:
Originally Posted by socalheel
(Post 5039996)
would it be too much for you to explain that code? it will help me learn the hows and whys you did that.
|
Starting with this InFile ...
Code:
Oct 1 22:30:22 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:23 unknown[217.153.227.194]
Oct 1 22:30:23 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:25 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:26 173-14-15-134-colorado.hfc.comcastbusiness.net[173.14.15.134]
Oct 1 22:30:27 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:29 71-212-228-139.hlrn.qwest.net[71.212.228.139]
Oct 1 22:30:31 mail.adminco.com[50.193.158.201]
Oct 1 22:30:32 c-50-158-213-87.hsd1.il.comcast.net[50.158.213.87]
Oct 1 22:30:34 adsl-75-36-216-68.dsl.pltn13.sbcglobal.net[75.36.216.68]
Oct 1 22:30:44 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:46 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:47 rrcs-24-43-8-15.west.biz.rr.com[24.43.8.15]
Oct 1 22:30:47 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:50 unknown[166.181.67.138]
Oct 1 22:30:50 nw1-dsl-74-215-142-200.fuse.net[74.215.142.200]
Oct 1 22:30:51 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:52 nccptvaprcprd06.corp.lpl.com[10.22.92.126]
Oct 1 22:30:53 unknown[216.164.161.6]
Oct 1 22:30:55 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:56 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:30:58 unknown[162.17.88.6]
Oct 1 22:30:59 99-120-86-202.lightspeed.livnmi.sbcglobal.net[99.120.86.202]
Oct 1 22:31:02 unknown[172.56.17.30]
Oct 1 22:31:04 mail.adrimar.org[50.78.121.129]
Oct 1 22:31:05 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:05 unknown[172.56.17.30]
Oct 1 22:31:06 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:08 unknown[172.56.17.30]
Oct 1 22:31:11 mobile-166-147-069-112.mycingular.net[166.147.69.112]
Oct 1 22:31:15 unknown[195.39.240.4]
... we do a tr which replaces blanks and bracket symbols with colons to simplify subsequent parsing.
Code:
tr " []" ":" <$InFile \
>$OutFile
This is the intermediate result ...
Code:
Oct:1:22:30:22:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:23:unknown:217.153.227.194:
Oct:1:22:30:23:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:25:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:26:173-14-15-134-colorado.hfc.comcastbusiness.net:173.14.15.134:
Oct:1:22:30:27:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:29:71-212-228-139.hlrn.qwest.net:71.212.228.139:
Oct:1:22:30:31:mail.adminco.com:50.193.158.201:
Oct:1:22:30:32:c-50-158-213-87.hsd1.il.comcast.net:50.158.213.87:
Oct:1:22:30:34:adsl-75-36-216-68.dsl.pltn13.sbcglobal.net:75.36.216.68:
Oct:1:22:30:44:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:46:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:47:rrcs-24-43-8-15.west.biz.rr.com:24.43.8.15:
Oct:1:22:30:47:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:50:unknown:166.181.67.138:
Oct:1:22:30:50:nw1-dsl-74-215-142-200.fuse.net:74.215.142.200:
Oct:1:22:30:51:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:52:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:52:nccptvaprcprd06.corp.lpl.com:10.22.92.126:
Oct:1:22:30:53:unknown:216.164.161.6:
Oct:1:22:30:55:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:56:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:30:58:unknown:162.17.88.6:
Oct:1:22:30:59:99-120-86-202.lightspeed.livnmi.sbcglobal.net:99.120.86.202:
Oct:1:22:31:02:unknown:172.56.17.30:
Oct:1:22:31:04:mail.adrimar.org:50.78.121.129:
Oct:1:22:31:05:mobile-166-147-069-112.mycingular.net:166.147.69.112:
Oct:1:22:31:05:unknown:172.56.17.30:
Oct:1:22:31:06:mobile-166-147-069-112.mycingular.net:166.147.69.112:
Oct:1:22:31:08:unknown:172.56.17.30:
Oct:1:22:31:11:mobile-166-147-069-112.mycingular.net:166.147.69.112:
Oct:1:22:31:15:unknown:195.39.240.4:
Now we do an awk which copies useful portions of each line, loses the colons, and "plugs in" a colon between hour number and minute number for cosmetic value. Most important: $4=5*int($4/5) rounds the minute value down to 0 or 5. It does this by doing an integer divide by 5 and a multiply by 5.
Code:
tr " []" ":" <$InFile \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}' \
>$OutFile
This is the intermediate result ...
Code:
Oct 1 22:30 50.158.213.87
Oct 1 22:30 217.153.227.194
Oct 1 22:30 50.158.213.87
Oct 1 22:30 50.158.213.87
Oct 1 22:30 173.14.15.134
Oct 1 22:30 50.158.213.87
Oct 1 22:30 71.212.228.139
Oct 1 22:30 50.193.158.201
Oct 1 22:30 50.158.213.87
Oct 1 22:30 75.36.216.68
Oct 1 22:30 99.120.86.202
Oct 1 22:30 99.120.86.202
Oct 1 22:30 24.43.8.15
Oct 1 22:30 99.120.86.202
Oct 1 22:30 166.181.67.138
Oct 1 22:30 74.215.142.200
Oct 1 22:30 99.120.86.202
Oct 1 22:30 99.120.86.202
Oct 1 22:30 10.22.92.126
Oct 1 22:30 216.164.161.6
Oct 1 22:30 99.120.86.202
Oct 1 22:30 99.120.86.202
Oct 1 22:30 162.17.88.6
Oct 1 22:30 99.120.86.202
Oct 1 22:30 172.56.17.30
Oct 1 22:30 50.78.121.129
Oct 1 22:30 166.147.69.112
Oct 1 22:30 172.56.17.30
Oct 1 22:30 166.147.69.112
Oct 1 22:30 172.56.17.30
Oct 1 22:30 166.147.69.112
Oct 1 22:30 195.39.240.4
Now we do a sort and a uniq with count. The sort is required to enable the uniq to work. The count field identifies the most-frequent appearances of distinct ip addresses.
Code:
tr " []" ":" <$InFile \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}' \
|sort \
|uniq -c \
>$OutFile
This is the intermediate result ...
Code:
1 Oct 1 22:30 10.22.92.126
1 Oct 1 22:30 162.17.88.6
3 Oct 1 22:30 166.147.69.112
1 Oct 1 22:30 166.181.67.138
3 Oct 1 22:30 172.56.17.30
1 Oct 1 22:30 173.14.15.134
1 Oct 1 22:30 195.39.240.4
1 Oct 1 22:30 216.164.161.6
1 Oct 1 22:30 217.153.227.194
1 Oct 1 22:30 24.43.8.15
5 Oct 1 22:30 50.158.213.87
1 Oct 1 22:30 50.193.158.201
1 Oct 1 22:30 50.78.121.129
1 Oct 1 22:30 71.212.228.139
1 Oct 1 22:30 74.215.142.200
1 Oct 1 22:30 75.36.216.68
8 Oct 1 22:30 99.120.86.202
Now we do a sort on the left-most field (that's the count of occurrences) in descending order.
Code:
tr " []" ":" <$InFile \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}' \
|sort \
|uniq -c \
|sort -rk1,1 \
>$OutFile
This is the intermediate result ...
Code:
8 Oct 1 22:30 99.120.86.202
5 Oct 1 22:30 50.158.213.87
3 Oct 1 22:30 172.56.17.30
3 Oct 1 22:30 166.147.69.112
1 Oct 1 22:30 75.36.216.68
1 Oct 1 22:30 74.215.142.200
1 Oct 1 22:30 71.212.228.139
1 Oct 1 22:30 50.78.121.129
1 Oct 1 22:30 50.193.158.201
1 Oct 1 22:30 24.43.8.15
1 Oct 1 22:30 217.153.227.194
1 Oct 1 22:30 216.164.161.6
1 Oct 1 22:30 195.39.240.4
1 Oct 1 22:30 173.14.15.134
1 Oct 1 22:30 166.181.67.138
1 Oct 1 22:30 162.17.88.6
1 Oct 1 22:30 10.22.92.126
Now we do an awk to keep only those lines with a frequency count above a chosen threshold. I chose 3 for this example but you would use another value (such as 30) with real-world data.
Code:
tr " []" ":" <$InFile \
|awk -F: '{$4=5*int($4/5); print $1,$2,$3":"$4,$7}' \
|sort \
|uniq -c \
|sort -rk1,1 \
|awk '$1>3 {print}' \
>$OutFile
This is the finished product...
Code:
8 Oct 1 22:30 99.120.86.202
5 Oct 1 22:30 50.158.213.87
Daniel B. Martin
|