[SOLVED] AWK - How to parse a Web log file to count column and the last occurrence of that column
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
and I want to extract data like domain name (first column); how many times the domain (first column) is presented in the file and the last time of access that domain (or IP address), so the result should be like:
First things first - you should never pass from awk to sed (or anything else) and back to awk. It has all the regex and string processing you will ever need. So it's now simple to combine your code. Or modify the FS - it can consist of multiple characters, not just one.
Yes, first you need to combine that awk|sed|awk chain into one single script
for example sed can be replaced by a gsub command (inside awk), or you can use a better delimiter:
to remove the whole awk|sed|awk chain (many thanks for that!), but the combining the part one (select the necessary field), and the part two - finding the last occurrence in one line (or in a different way) - I simply do not how to do this...
Yes, I want to learn, but most likely I have to go through the awk and sed thing properly from the beginning, this one seems to me like the running - but I have to learn how to walk properly first.
awk 'BEGIN { FS="[][ ]*"}
/'"nextsubdomain.domain.com"'/ { lines[last] = $0;}
{ IP_ADDRESS[$1]++;
IP_DATE[$1] = $4;
IP_LAST[$1] = $6 }
END { for (i in IP_ADDRESS) print i,IP_ADDRESS[i], IP_DATE[i], IP_LAST[i];
print "Last Occurrence: " lines[last]}' # or you can use this
you can combine things like this. It is not tested and probably not exactly what you need, but this is a way you can follow. (take it as an example)
I think I presumed you had more (awk) knowledge than you do - not yout fault, we all have to learn. Commands can be combined as pan64 shows; separated by a semi-colon. The {} brackets are to enclose a (related) block of commands.
The awk doco is very good, but is a reference, not a teaching book. Doing is the best teacher I find.
Though not a direct answer to your question at the moment, you could make future parsing easy by adjusting your log format. I have sometimes done that in the past. On Apache2, there's no reason you cannot set the CustomLog directive to use a better format, say using tabs and an ISO-8601 date format:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.