AWK - How to parse a Web log file to count column and the last occurrence of that column
I got the file (web_test.log) with some lines, like this one below (of course there is like milion of those lines):
Code:
subdomain.domain.com - - [01/Jun/2017:00:00:06 -0900] "GET /www/var/index.html HTTP/1.0" 200 323985 Code:
subdomain.domain.com - 2 - 01/Jun/2017:00:00:14 Code:
awk '{print $1 " " $4;}' web_test.log | sed 's/\[//' | awk '{IP_ADDRESS[$1]++; } END { for (i in IP_ADDRESS) print i,IP_ADDRESS[i]}' OFS=" - " Code:
diffrentsubdomain.domain.com - 1 Code:
awk '/'"nextsubdomain.domain.com"'/ { lines[last] = $0;} END { print "Last Occurrence: " lines[last] }' web_test.log Code:
Last Occurrence: nextsubdomain.domain.com - - [01/Jun/2017:00:39:22 -0900] "GET /mystory/past/primaryschool/images/12312314.gif HTTP/1.0" 200 10211267 Is this possible? I do hope I did explain this quite clearly... |
You don't necessarily have to use a one-liner. You can write an awk script and run the script.
|
First things first - you should never pass from awk to sed (or anything else) and back to awk. It has all the regex and string processing you will ever need. So it's now simple to combine your code. Or modify the FS - it can consist of multiple characters, not just one.
And please use plain [code] tags. |
Many thanks, I am a little bit clever now, but still do not know how to make it work for me - sorry.
|
Yes, first you need to combine that awk|sed|awk chain into one single script
for example sed can be replaced by a gsub command (inside awk), or you can use a better delimiter: Code:
awk 'BEGIN { FS="[][ ]*"} ...' |
Sorry to say, but I cannot find the solution - can you point to the right track?
I can do something like: Code:
awk '{print $1,substr($4,2)}' OFS=" - " web_test.log Code:
subdomain.domain.com - 01/Jun/2017:00:00:06 Yes, I want to learn, but most likely I have to go through the awk and sed thing properly from the beginning, this one seems to me like the running - but I have to learn how to walk properly first. |
Code:
awk 'BEGIN { FS="[][ ]*"} |
I think I presumed you had more (awk) knowledge than you do - not yout fault, we all have to learn. Commands can be combined as pan64 shows; separated by a semi-colon. The {} brackets are to enclose a (related) block of commands.
The awk doco is very good, but is a reference, not a teaching book. Doing is the best teacher I find. |
Though not a direct answer to your question at the moment, you could make future parsing easy by adjusting your log format. I have sometimes done that in the past. On Apache2, there's no reason you cannot set the CustomLog directive to use a better format, say using tabs and an ISO-8601 date format:
Code:
LogFormat "%h\t%l\t%u\t%{%Y-%m-%d %H:%M:%S}t\t%r\t%>s\t%b" custom |
1 Attachment(s)
Many thanks to all you guys for you help and input on this question.
Seems to me like I have an answer - please have a look below: Test data (as cat web_test.log): Code:
subdomain.domain.com - - [01/Jun/2017:00:00:06 -0900] "GET /www/var/index.html HTTP/1.0" 200 323985 Code:
awk '{ lines[last] = $0;} { IP_ADDRESS[$1]++; IP_LAST[$1] = substr($4,2) } END { for (i in IP_ADDRESS) print i,IP_ADDRESS[i], IP_LAST[i]};' OFS=" - " web_test.log | sort -n -k3 | column -t | tail -29 Code:
diffrentsubdomain.domain.com - 1 - 01/Jun/2017:00:29:20 One more time - many thanks to all of you for your input, help, suggestions and point me to the right track. Greatly appreciated. |
that is great. But it looks like { lines[last] = $0;} is not in use, you can remove that.
|
All times are GMT -5. The time now is 09:29 AM. |