LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-12-2011, 09:06 AM   #1
sebelk
Member
 
Registered: Jan 2007
Posts: 66

Rep: Reputation: 15
Average from values of fields


Hi,

I have a file with something as follows:


192.168.136.246 10 23 1


10.128.255.158 2 4 9


192.168.134.206 5 7 1

10.128.255.158 3 7 10

and so on...

The first field can repeat. I'd want to make an average of each field per IP address,

For example for 10.128.255.158 would output
10.128.255.158 2.5 5.5 9.5

Please could you help me to make a script with awk or perl?

Thanks in advance!
 
Old 08-12-2011, 09:23 AM   #2
Proud
Senior Member
 
Registered: Dec 2002
Location: England
Distribution: Used to use Mandrake/Mandriva
Posts: 2,794

Rep: Reputation: 116Reputation: 116
What have you got so far?
I've edited my response after I'd slightly misread the complexity.

Ok, to average we need either all the values for a specific averaging to be stored before we calculate the average (e.g. for 192.168.136.246's first column we store [10, 5]), or we do a rolling average calculation of the current average and how many values were used to calculate that figure (e.g. [7.5, 2]).

So you need a mapping per IP address, which maps to either:
  • a list of lists, array of arrays, whatever you want to call it, so we can collect all the values for each 'column' entry e.g. 192.168.136.246->[ [10, 5], [23, 7], [1, 1] ]
  • for each new row found with the same IP, we retrieve any existing stored info, and per 'column' list we add the new entries. e.g. after reading just the first 2 rows the above would have been 192.168.136.246->[ [10], [23], [1] ]
  • Once all rows have been read we can calculate the averages and maybe store it in a new mapping of IP->list of averages with one entry per column.

or:
  • your original IP mapping is to a class/data structure/simple 2 element array/list that has the current average and current values used count (IP->[ [a1, n1], [a2,n2], [a3, n3] ]).
  • again for each new row found with the same IP, we retrieve any existing stored info , but instead of just putting a new entry element on the end of a list, we calculate the previous total from multiplying the 2 stored values, then calc the new average with a count of +1, and store the new average and count ([a1, n1] and new x becomes [ a1*n1 +x / n1+1 ]).
  • We constantly have the average per IP per column in this data structure, once we've read the last row we're done. Also we use far less memory, usage is proportional to the number of columns times the number of ip addresses, whereas the other option grows with each extra row read.

Last edited by Proud; 08-12-2011 at 09:45 AM.
 
Old 08-12-2011, 10:44 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Yes awk could easily do this. What have you tried and where are you getting stuck?
 
Old 08-14-2011, 07:40 AM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Since the OP has been quiet, here is an example awk script.
Code:
awk '(NF>=4) { n[$1]++
               v1[$1]+=$2
               v2[$1]+=$3
               v3[$1]+=$4
             }
         END { for (ip in n)
                   printf("%s %f %f %f (%d)\n", ip, v1[ip]/n[ip], v2[ip]/n[ip], v3[ip]/n[ip], n[ip])
             }' data-file
The IP addresses are the keys to the arrays. n counts the number of entries. (Increasing an unset value will yield 1, and adding something to an unset value will yield the value itself, as per awk rules; in other words, unset is logically equal to zero.)

The END rule will be processed after all the records (lines) have been processed. ip will loop through all keys in the n array, therefore through all IP addresses. Since the v1, v2 and v3 arrays count the sums of the fields, dividing by the number of summands will yield the average.

I added the number of occurrences at the end of the line in parenthesis for illustration.
You can format the fields (e.g. %.3f instead of just %f to your needs, too).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
netfilter hook---kernel module---skb_transport_header--- tcphdr fields wrong values abhishek@LQ Linux - Networking 4 05-26-2010 06:21 AM
[SOLVED] Calculating Values of Fields Hi_This_is_Dev Linux - Networking 3 09-01-2009 10:16 AM
Shell script that assign's values to fields NsearchOf Programming 16 06-01-2009 10:00 AM
creating tr fields according to session values? ati Programming 6 05-09-2006 03:45 PM
'load average' return values from sysinfo() bulliver Programming 4 04-05-2005 10:02 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration