ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am not an awk specialist. In short I want to write a script (bash/awk) which reads the squid access output file, sorts unique it using only the destination "hosts" and then prints out:
how many times destination host was requested, % of the requests, destination host name, destination host IP.
This all without writing in tmp files.
Say the input is:
Code:
1196377810.470 405 10.4.1.119 TCP_MISS/200 410 POST http://shttp.msg.yahoo.com/notify/ - DEFAULT_PARENT/localhost text/plain
1196377805.218 6260 10.1.50.237 TCP_MISS/502 1419 GET http://einstein.aei.mpg.de/download/3e7/h1_0646.35_S5R2 - ANY_PARENT/localhost text/html
1196377808.611 2651 10.1.50.237 TCP_MISS/502 1429 GET http://einstein.astro.gla.ac.uk/download/28d/l1_0646.40_S5R2 - ANY_PARENT/localhost text/html
1196377808.666 25343 10.1.24.144 TCP_MISS/200 226 GET http://41.246.102.212/din.aspx? - DIRECT/41.246.102.212 application/octet-stream
Yes I want to sort the requests from the one with more requests to the one with less requests
For the practical use perl could be ok, but I would really appreciate if someone gives me an hint how to do the same with bash/awk. The code that I have pasted might not be elegant but is extremelly performing (80MB in 2 sec) and I would really like to "extend it".
I have tried your code: for very small files (4 lines/4 different host requetsd) it works, but already on a file with 100 lines it breaks up after short with:
Code:
Bad arg length for Socket::inet_ntoa, length is 0, should be 4 at -e line 8, <> line 449197.
END failed--call queue aborted, <> line 449197.
(and it misses one column:
column 1= how many times destination host was requested
Using awk to do this is really ugly, and FAR more expensive. You'll be calling host, and having to parse the output from within awk, but that's not easy, as the output
The perl script is more efficient, and has a minor error - don't toss the baby out with the bathwater.
What is the input line where the script is exiting?
If you don't understand, or want the perl scripts above, try just this last script to do your hostname -> IP translation. Add it to the end of your pipeline:
Code:
... your stuff here ... |
perl -MSocket -lane '
if ($F[2] !~ /^(\d+\.\d+\.\d+\.\d+)$/) {
$name2ip{$F[2]} = inet_ntoa(scalar gethostbyname($F[2])) if ! exists $name2ip{$F[2]};
$F[2] = $name2ip{$F[2]};
}
printf "%02.f%% %s %s\n", @F;
'
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.