ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I am not an awk specialist. In short I want to write a script (bash/awk) which reads the squid access output file, sorts unique it using only the destination "hosts" and then prints out:
how many times destination host was requested, % of the requests, destination host name, destination host IP.
This all without writing in tmp files.
Say the input is:
1196377810.470 405 10.4.1.119 TCP_MISS/200 410 POST http://shttp.msg.yahoo.com/notify/ - DEFAULT_PARENT/localhost text/plain
1196377805.218 6260 10.1.50.237 TCP_MISS/502 1419 GET http://einstein.aei.mpg.de/download/3e7/h1_0646.35_S5R2 - ANY_PARENT/localhost text/html
1196377808.611 2651 10.1.50.237 TCP_MISS/502 1429 GET http://einstein.astro.gla.ac.uk/download/28d/l1_0646.40_S5R2 - ANY_PARENT/localhost text/html
1196377808.666 25343 10.1.24.144 TCP_MISS/200 226 GET http://18.104.22.168/din.aspx? - DIRECT/22.214.171.124 application/octet-stream
Yes I want to sort the requests from the one with more requests to the one with less requests
For the practical use perl could be ok, but I would really appreciate if someone gives me an hint how to do the same with bash/awk. The code that I have pasted might not be elegant but is extremelly performing (80MB in 2 sec) and I would really like to "extend it".
I have tried your code: for very small files (4 lines/4 different host requetsd) it works, but already on a file with 100 lines it breaks up after short with:
Bad arg length for Socket::inet_ntoa, length is 0, should be 4 at -e line 8, <> line 449197.
END failed--call queue aborted, <> line 449197.
(and it misses one column:
column 1= how many times destination host was requested