two line averaging

tabbygirl1990 · 10-08-2013, 10:31 PM

hi guys,

i have output data from one code that needs to get read into another, but the second code can't accept as high of data rate, so i need to average between every two lines. there are seven columns in the file and can be 100 or more rows, but the EOF will be an even number, and the second column can be ignored. it's a constant. here's an output to input example:

the output of the first code would look like

line1 42 a b c d e
line2 42 aa bb cc dd ee
line3 42 aaa bbb ccc ddd eee
line4 42 aaaa bbbb cccc dddd eeee
.
.
.

so with the above example the input file to the next code would look like

line1 42 ave(a,aa) ave(b,bb) ave(c,cc) ave(d,dd) ave(e,ee)
line2 42 ave(aaa,aaaa) ave(bbb,bbbb) ave(ccc,cccc) ave(ddd,dddd) ave(eee,eeee)
.
.
.

tabby

Firerat · 10-08-2013, 11:08 PM

I have an idea of how I would do it

what have you tried?

what OS ? ( I see a Mac badge )

tabbygirl1990 · 10-09-2013, 09:40 AM

good morning Firerat (of the Princess Bride Swamp Firerats? i love that movie!)

i couldn't sleep laying in bed last night so i typed it from my mac laptop. my desktop is RHEL 5.5 Tikanga

so far i have:

Code:

#!/usr/bin/awk -f

BEGIN   {
        max=0
        }
        {
        if($5>max) max=5        
        }
END     {
        {
BEGIN   {
        min=0
        }
        {
        if($5>min) min=5        
        }
END     {
        {
!(NR%5) {
        sum+=5        
        ++n
        }
END     {
        print "average = sum/n
        {

but this only runs on the 5th column and i haven't figured out how to extend the averaging to all columns

and

Code:

#!/usr/bin/awk -f
awk 'NR1 {sum+=$5; ++n} END  {print "average = " sum/n}' output_file.dat

but this only runs on the 5th column and i haven't figured out how to extend the averaging to all columns

thanks for your help!!!

tabby

Firerat · 10-09-2013, 10:17 AM

do you have a better example of the inputs?

can you give 10 lines, and also show the result you expect

for instance, do you want the average of all the rows for each column

input

Code:

Line1 1
Line2 2
Line3 3
Line4 4

Code:

Line1 1
Line2 2
Line3 3
Line4 4
mean  5

or ..

Code:

Line1 1
Line2 2
mean  1.5
Line3 3
Line4 4
mean  3.5

schneidz · 10-09-2013, 10:32 AM

the original poster made a good faith effort so even if this is homework i think this guidence wont be cheating (although i have no idea where the variable n above comes from -- i get division by 0 errors when i try to run it as is because it is never defined).
heres my stab at it... (i did first 2 feilds only because i got bored -- season to taste):

Code:

[schneidz@hyper ~]$ cat tabbygirl1990.txt
1 2 3 4 5 6 7
7 6 5 4 3 2 1
100 200 300 400 500 600 700
10 20 30 40 50 60 70
5 10 15 20 25 30 35G
0 1 1 2 3 5 8
[schneidz@hyper ~]$ awk 'NR % 2 == 0 {sum1+=$1;sum2+=$2} NR %2 == 1 {sum1=$1;sum2=$2}  NR % 2 == 0 {print "average-1 = " sum1/2 " -- average-2 = " sum2/2 }' tabbygirl1990.txt
average-1 = 4 -- average-2 = 4
average-1 = 55 -- average-2 = 110
average-1 = 2.5 -- average-2 = 5.5

tabbygirl1990 · 10-09-2013, 10:56 AM

here's the innie

Code:

1	42	0.19796486	0.362090835	0.354344909	0.856582877	0.735671789
2	42	0.025016951	0.12691389	0.210235925	0.417773321	0.091685902
3	42	0.610085038	0.445050311	0.756565733	0.180007685	0.216628711
4	42	0.458264832	0.359423811	0.488949963	0.073800802	0.091902447
5	42	0.268522443	0.648344889	0.983886158	0.436349095	0.949035235
6	42	0.176264501	0.059806075	0.860509502	0.488146158	0.240509861
7	42	0.89882842	0.004340198	0.959885061	0.083707755	0.636907775
8	42	0.175407396	0.752946341	0.037497858	0.738027088	0.59901326
9	42	0.929893486	0.110036987	0.109945346	0.788329303	0.303932011
10	42	0.788359742	0.356803805	0.954558374	0.93942156	0.474722704

and the outie

Code:

1	42	ave(col3,row1&row2)	ave(col4,row1&row2)	ave(col5,row1&row2)	ave(col6,row1&row2)	ave(col7,row1&row2)
2	42	ave(col3,row3&row4)	ave(col4,row3&row4)	ave(co5,row3&row4)	ave(col6,row3&row4)	ave(col7,row3&row4)
3	42	ave(col3,row5&row6)	ave(col4,row5&row6)	ave(col5,row5&row6)	ave(col6,row5&row6)	ave(col7,row5&row6)
4	42	ave(col3,row7&row8)	ave(col4,row7&row8)	ave(col5,row7&row8)	ave(col6,row7&row8)	ave(col7,row7&row8)
5	42	ave(col3,row9&row10)	ave(col4,row9&row10)	ave(col5,row9&row10)	ave(col6,row9&row10)    ave(col7,row9&row10)

thanks soooo much firerat!!!

tabby

colucix · 10-09-2013, 11:09 AM

Maybe you need something like this:

Code:

#!/usr/bin/awk -f
NR % 2 {
  for (i=3;i<=NF;i++)
    _[i]=$i
  getline
  printf "%d\t%d", ++c, $2
  for (i=3;i<=NF;i++)
    printf "\t%f", ($i+_[i])/2
  print ""
}

tabbygirl1990 · 10-09-2013, 12:05 PM

hi schneidz,

i tried your awk command and i got syntax errors, and i don't understand what's

Code:

--

so i modified it to

Code:

awk 'NR % 2 == 0 {sum1+=$1;sum2+=$2} NR %2 == 1 {sum1=$1;sum2=$2}  NR % 2 == 0 {print "average-1 = " sum1/2} {print "average-2 = " sum2/2}' tabbygirl1990.dat

what i got out was

Code:

average-2 = 21
average-1 = 1.5
average-2 = 42
average-2 = 63
average-1 = 5
average-2 = 84
average-2 = 105
average-1 = 10.5
average-2 = 126
average-2 = 147
average-1 = 18
average-2 = 168
average-2 = 189
average-1 = 27.5
average-2 = 210

which is yesterday's makeup

schneidz · 10-09-2013, 12:44 PM

can you please copy-paste the command and the error you are getting ?

Firerat · 10-09-2013, 01:07 PM

./Script.sh /path/to/tabbygirl1990.dat

Code:

#!/bin/bash
Input="$1"
tick=1
LineNo=1
while read Line;do
    case $tick in
        1)
           X=($Line)
           tick=2
        ;;
        2)
           Y=($Line)
           tick=1
           printf "%s" "Line${LineNo} ${X[1]}"
           for i in {2..6};do
               awk '{printf "\t%.11f",($1 + $2)/2}' <<< "${X[i]} ${Y[i]}"
           done
           printf '\n'
           LineNo=$(($LineNo+1))
        ;;
    esac
done < $Input

Input

Code:

1	42	0.19796486	0.362090835	0.354344909	0.856582877	0.735671789
2	42	0.025016951	0.12691389	0.210235925	0.417773321	0.091685902
3	42	0.610085038	0.445050311	0.756565733	0.180007685	0.216628711
4	42	0.458264832	0.359423811	0.488949963	0.073800802	0.091902447
5	42	0.268522443	0.648344889	0.983886158	0.436349095	0.949035235
6	42	0.176264501	0.059806075	0.860509502	0.488146158	0.240509861
7	42	0.89882842	0.004340198	0.959885061	0.083707755	0.636907775
8	42	0.175407396	0.752946341	0.037497858	0.738027088	0.59901326
9	42	0.929893486	0.110036987	0.109945346	0.788329303	0.303932011
10	42	0.788359742	0.356803805	0.954558374	0.93942156	0.474722704

Output

Code:

Line1 42	0.11149090550	0.24450236250	0.28229041700	0.63717809900	0.41367884550
Line2 42	0.53417493500	0.40223706100	0.62275784800	0.12690424350	0.15426557900
Line3 42	0.22239347200	0.35407548200	0.92219783000	0.46224762650	0.59477254800
Line4 42	0.53711790800	0.37864326950	0.49869145950	0.41086742150	0.61796051750
Line5 42	0.85912661400	0.23342039600	0.53225186000	0.86387543150	0.38932735750

But,, the numbers don't really make much sense
0.19796486 and 0.025016951 are very different, thus leading to mean of 0.11149090550

tabbygirl1990 · 10-09-2013, 02:22 PM

colucix does the trick

can you explain what the _[i] is doing? i know the "i" is an iterator but the rest of it i haven't seen before

also there are two printf and one print statement, i understand what the last print is doing but what are the printf statments doing in this script, i mean i know what a printf is just not what they are doing in the script

thanks guys!!!

tabby

colucix · 10-10-2013, 03:07 AM

1. The

Code:

_[i]

notation is simply the i-th element of the array _ (often I use a single underscore as variable name for brevity).

2. The first printf statement prints out the new line number using a C notation to increment the variable c by one, before it's valued is used

Code:

++c

take in mind that an unitialized variable in awk has value 0.

3. The second printf statement prints out the average of the i-th field, as per your requirement. It is the body of the second for loop, which is executed from the 3rd field to the last one.

tabbygirl1990 · 10-10-2013, 09:40 AM

thanks!

i thought that the _[] was some kind of special character/operator on i

i know that ++ is standard C notation for iterate over (although i'm not sure of the diff between ++i or i++ i'll try to find out), but i hadn't seen the little c before so I didn't know that it is a variable

so are the printf statements, kinds like storing/holding the data in the for loops

thanks soooo much,

tabby

tabbygirl1990 · 10-15-2013, 10:54 AM

back again guys,

now i'd like to create averaged lines of data from their nearest neighbors, so like here's an input file

Code:

1	42	0.0	0.4	0.3	0.8	0.7
2	42	0.3	0.1	0.2	0.4	0.1

in the output file the newline of averaged number is line #2, line #1 is the same as the input, and line #3 is line #2 from above, if this thing was run on a 100line file, i think it would give back 198 lines, right?

Code:

1	42	0.0	0.4	0.3	0.8	0.7
2       42      0.15    0.25    0.25    0.6     0.4
3	42	0.3	0.1	0.2	0.4	0.1

ok so here i don't know how to create the new in-between line? and how do i "store" the memory of the two nearest neighbors lines so i can average them? when i know how to do those two things, i should hope i could write it???

tabby

tabbygirl1990 · 10-15-2013, 11:24 AM

maybe a better example input file cause it has more lines

Code:

1	42	0.0	0.4	0.3	0.8	0.7
2	42	0.3	0.1	0.2	0.4	0.1
3       42      0.0     0.1     0.0     0.2     0.4
4       42      0.7     0.1     0.0     0.0     0.8
5       42      0.3     0.2     0.3     0.8     0.1

in the output file the newline of averaged number is line #2, line #1 is the same as the input, and line #3 is line #2 from above, if this thing was run on a 100line file, i think it would give back 198 lines, right?

Code:

1	42	0.0	0.4	0.3	0.8	0.7
an odd index line
2	42	0.3	0.1	0.2	0.4	0.1
an odd index line
3       42      0.0     0.1     0.0     0.2     0.4
an odd index line
4       42      0.7     0.1     0.0     0.0     0.8
an odd index line
5       42      0.3     0.2     0.3     0.8     0.1

so my first thought would be to go through the file once creating the in-between lines, these would all be odd indexed lines, then some way refernce the script only to work on odd indexed lines, right?