ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
What have you got so far?
I've edited my response after I'd slightly misread the complexity.
Ok, to average we need either all the values for a specific averaging to be stored before we calculate the average (e.g. for 192.168.136.246's first column we store [10, 5]), or we do a rolling average calculation of the current average and how many values were used to calculate that figure (e.g. [7.5, 2]).
So you need a mapping per IP address, which maps to either:
a list of lists, array of arrays, whatever you want to call it, so we can collect all the values for each 'column' entry e.g. 192.168.136.246->[ [10, 5], [23, 7], [1, 1] ]
for each new row found with the same IP, we retrieve any existing stored info, and per 'column' list we add the new entries. e.g. after reading just the first 2 rows the above would have been 192.168.136.246->[ [10], [23], [1] ]
Once all rows have been read we can calculate the averages and maybe store it in a new mapping of IP->list of averages with one entry per column.
or:
your original IP mapping is to a class/data structure/simple 2 element array/list that has the current average and current values used count (IP->[ [a1, n1], [a2,n2], [a3, n3] ]).
again for each new row found with the same IP, we retrieve any existing stored info , but instead of just putting a new entry element on the end of a list, we calculate the previous total from multiplying the 2 stored values, then calc the new average with a count of +1, and store the new average and count ([a1, n1] and new x becomes [ a1*n1 +x / n1+1 ]).
We constantly have the average per IP per column in this data structure, once we've read the last row we're done. Also we use far less memory, usage is proportional to the number of columns times the number of ip addresses, whereas the other option grows with each extra row read.
Since the OP has been quiet, here is an example awk script.
Code:
awk '(NF>=4) { n[$1]++
v1[$1]+=$2
v2[$1]+=$3
v3[$1]+=$4
}
END { for (ip in n)
printf("%s %f %f %f (%d)\n", ip, v1[ip]/n[ip], v2[ip]/n[ip], v3[ip]/n[ip], n[ip])
}' data-file
The IP addresses are the keys to the arrays. n counts the number of entries. (Increasing an unset value will yield 1, and adding something to an unset value will yield the value itself, as per awk rules; in other words, unset is logically equal to zero.)
The END rule will be processed after all the records (lines) have been processed. ip will loop through all keys in the n array, therefore through all IP addresses. Since the v1, v2 and v3 arrays count the sums of the fields, dividing by the number of summands will yield the average.
I added the number of occurrences at the end of the line in parenthesis for illustration.
You can format the fields (e.g. %.3f instead of just %f to your needs, too).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.