LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   two line averaging (https://www.linuxquestions.org/questions/linux-newbie-8/two-line-averaging-4175480103/)

tabbygirl1990 10-08-2013 10:31 PM

two line averaging
 
hi guys,

i have output data from one code that needs to get read into another, but the second code can't accept as high of data rate, so i need to average between every two lines. there are seven columns in the file and can be 100 or more rows, but the EOF will be an even number, and the second column can be ignored. it's a constant. here's an output to input example:

the output of the first code would look like

line1 42 a b c d e
line2 42 aa bb cc dd ee
line3 42 aaa bbb ccc ddd eee
line4 42 aaaa bbbb cccc dddd eeee
.
.
.


so with the above example the input file to the next code would look like

line1 42 ave(a,aa) ave(b,bb) ave(c,cc) ave(d,dd) ave(e,ee)
line2 42 ave(aaa,aaaa) ave(bbb,bbbb) ave(ccc,cccc) ave(ddd,dddd) ave(eee,eeee)
.
.
.

tabby

Firerat 10-08-2013 11:08 PM

I have an idea of how I would do it

what have you tried?

what OS ? ( I see a Mac badge )

tabbygirl1990 10-09-2013 09:40 AM

good morning Firerat (of the Princess Bride Swamp Firerats? i love that movie!)

i couldn't sleep laying in bed last night so i typed it from my mac laptop. my desktop is RHEL 5.5 Tikanga

so far i have:

Code:

#!/usr/bin/awk -f

BEGIN  {
        max=0
        }
        {
        if($5>max) max=5       
        }
END    {
        {
BEGIN  {
        min=0
        }
        {
        if($5>min) min=5       
        }
END    {
        {
!(NR%5) {
        sum+=5       
        ++n
        }
END    {
        print "average = sum/n
        {

but this only runs on the 5th column and i haven't figured out how to extend the averaging to all columns

and

Code:

#!/usr/bin/awk -f
awk 'NR1 {sum+=$5; ++n} END  {print "average = " sum/n}' output_file.dat

but this only runs on the 5th column and i haven't figured out how to extend the averaging to all columns

thanks for your help!!!

tabby

Firerat 10-09-2013 10:17 AM

do you have a better example of the inputs?

can you give 10 lines, and also show the result you expect


for instance, do you want the average of all the rows for each column

input
Code:

Line1 1
Line2 2
Line3 3
Line4 4

Code:

Line1 1
Line2 2
Line3 3
Line4 4
mean  5

or ..
Code:

Line1 1
Line2 2
mean  1.5
Line3 3
Line4 4
mean  3.5


schneidz 10-09-2013 10:32 AM

the original poster made a good faith effort so even if this is homework i think this guidence wont be cheating (although i have no idea where the variable n above comes from -- i get division by 0 errors when i try to run it as is because it is never defined).
heres my stab at it... (i did first 2 feilds only because i got bored -- season to taste):
Code:

[schneidz@hyper ~]$ cat tabbygirl1990.txt
1 2 3 4 5 6 7
7 6 5 4 3 2 1
100 200 300 400 500 600 700
10 20 30 40 50 60 70
5 10 15 20 25 30 35G
0 1 1 2 3 5 8
[schneidz@hyper ~]$ awk 'NR % 2 == 0 {sum1+=$1;sum2+=$2} NR %2 == 1 {sum1=$1;sum2=$2}  NR % 2 == 0 {print "average-1 = " sum1/2 " -- average-2 = " sum2/2 }' tabbygirl1990.txt
average-1 = 4 -- average-2 = 4
average-1 = 55 -- average-2 = 110
average-1 = 2.5 -- average-2 = 5.5


tabbygirl1990 10-09-2013 10:56 AM

here's the innie :)

Code:

1        42        0.19796486        0.362090835        0.354344909        0.856582877        0.735671789
2        42        0.025016951        0.12691389        0.210235925        0.417773321        0.091685902
3        42        0.610085038        0.445050311        0.756565733        0.180007685        0.216628711
4        42        0.458264832        0.359423811        0.488949963        0.073800802        0.091902447
5        42        0.268522443        0.648344889        0.983886158        0.436349095        0.949035235
6        42        0.176264501        0.059806075        0.860509502        0.488146158        0.240509861
7        42        0.89882842        0.004340198        0.959885061        0.083707755        0.636907775
8        42        0.175407396        0.752946341        0.037497858        0.738027088        0.59901326
9        42        0.929893486        0.110036987        0.109945346        0.788329303        0.303932011
10        42        0.788359742        0.356803805        0.954558374        0.93942156        0.474722704

and the outie

Code:

1        42        ave(col3,row1&row2)        ave(col4,row1&row2)        ave(col5,row1&row2)        ave(col6,row1&row2)        ave(col7,row1&row2)
2        42        ave(col3,row3&row4)        ave(col4,row3&row4)        ave(co5,row3&row4)        ave(col6,row3&row4)        ave(col7,row3&row4)
3        42        ave(col3,row5&row6)        ave(col4,row5&row6)        ave(col5,row5&row6)        ave(col6,row5&row6)        ave(col7,row5&row6)
4        42        ave(col3,row7&row8)        ave(col4,row7&row8)        ave(col5,row7&row8)        ave(col6,row7&row8)        ave(col7,row7&row8)
5        42        ave(col3,row9&row10)        ave(col4,row9&row10)        ave(col5,row9&row10)        ave(col6,row9&row10)    ave(col7,row9&row10)

thanks soooo much firerat!!!

tabby

colucix 10-09-2013 11:09 AM

Maybe you need something like this:
Code:

#!/usr/bin/awk -f
NR % 2 {
  for (i=3;i<=NF;i++)
    _[i]=$i
  getline
  printf "%d\t%d", ++c, $2
  for (i=3;i<=NF;i++)
    printf "\t%f", ($i+_[i])/2
  print ""
}


tabbygirl1990 10-09-2013 12:05 PM

hi schneidz,

i tried your awk command and i got syntax errors, and i don't understand what's
Code:

--
so i modified it to

Code:

awk 'NR % 2 == 0 {sum1+=$1;sum2+=$2} NR %2 == 1 {sum1=$1;sum2=$2}  NR % 2 == 0 {print "average-1 = " sum1/2} {print "average-2 = " sum2/2}' tabbygirl1990.dat
what i got out was

Code:

average-2 = 21
average-1 = 1.5
average-2 = 42
average-2 = 63
average-1 = 5
average-2 = 84
average-2 = 105
average-1 = 10.5
average-2 = 126
average-2 = 147
average-1 = 18
average-2 = 168
average-2 = 189
average-1 = 27.5
average-2 = 210

which is yesterday's makeup :)

schneidz 10-09-2013 12:44 PM

can you please copy-paste the command and the error you are getting ?

Firerat 10-09-2013 01:07 PM

./Script.sh /path/to/tabbygirl1990.dat
Code:

#!/bin/bash
Input="$1"
tick=1
LineNo=1
while read Line;do
    case $tick in
        1)
          X=($Line)
          tick=2
        ;;
        2)
          Y=($Line)
          tick=1
          printf "%s" "Line${LineNo} ${X[1]}"
          for i in {2..6};do
              awk '{printf "\t%.11f",($1 + $2)/2}' <<< "${X[i]} ${Y[i]}"
          done
          printf '\n'
          LineNo=$(($LineNo+1))
        ;;
    esac
done < $Input

Input
Code:

1        42        0.19796486        0.362090835        0.354344909        0.856582877        0.735671789
2        42        0.025016951        0.12691389        0.210235925        0.417773321        0.091685902
3        42        0.610085038        0.445050311        0.756565733        0.180007685        0.216628711
4        42        0.458264832        0.359423811        0.488949963        0.073800802        0.091902447
5        42        0.268522443        0.648344889        0.983886158        0.436349095        0.949035235
6        42        0.176264501        0.059806075        0.860509502        0.488146158        0.240509861
7        42        0.89882842        0.004340198        0.959885061        0.083707755        0.636907775
8        42        0.175407396        0.752946341        0.037497858        0.738027088        0.59901326
9        42        0.929893486        0.110036987        0.109945346        0.788329303        0.303932011
10        42        0.788359742        0.356803805        0.954558374        0.93942156        0.474722704

Output
Code:

Line1 42        0.11149090550        0.24450236250        0.28229041700        0.63717809900        0.41367884550
Line2 42        0.53417493500        0.40223706100        0.62275784800        0.12690424350        0.15426557900
Line3 42        0.22239347200        0.35407548200        0.92219783000        0.46224762650        0.59477254800
Line4 42        0.53711790800        0.37864326950        0.49869145950        0.41086742150        0.61796051750
Line5 42        0.85912661400        0.23342039600        0.53225186000        0.86387543150        0.38932735750

But,, the numbers don't really make much sense
0.19796486 and 0.025016951 are very different, thus leading to mean of 0.11149090550

tabbygirl1990 10-09-2013 02:22 PM

colucix does the trick

can you explain what the _[i] is doing? i know the "i" is an iterator but the rest of it i haven't seen before

also there are two printf and one print statement, i understand what the last print is doing but what are the printf statments doing in this script, i mean i know what a printf is just not what they are doing in the script

thanks guys!!!

tabby

colucix 10-10-2013 03:07 AM

1. The
Code:

_[i]
notation is simply the i-th element of the array _ (often I use a single underscore as variable name for brevity).

2. The first printf statement prints out the new line number using a C notation to increment the variable c by one, before it's valued is used
Code:

++c
take in mind that an unitialized variable in awk has value 0.

3. The second printf statement prints out the average of the i-th field, as per your requirement. It is the body of the second for loop, which is executed from the 3rd field to the last one.

tabbygirl1990 10-10-2013 09:40 AM

thanks!

i thought that the _[] was some kind of special character/operator on i

i know that ++ is standard C notation for iterate over (although i'm not sure of the diff between ++i or i++ i'll try to find out), but i hadn't seen the little c before so I didn't know that it is a variable

so are the printf statements, kinds like storing/holding the data in the for loops

thanks soooo much,

tabby

tabbygirl1990 10-15-2013 10:54 AM

back again guys,

now i'd like to create averaged lines of data from their nearest neighbors, so like here's an input file
Code:

1        42        0.0        0.4        0.3        0.8        0.7
2        42        0.3        0.1        0.2        0.4        0.1

in the output file the newline of averaged number is line #2, line #1 is the same as the input, and line #3 is line #2 from above, if this thing was run on a 100line file, i think it would give back 198 lines, right?

Code:

1        42        0.0        0.4        0.3        0.8        0.7
2      42      0.15    0.25    0.25    0.6    0.4
3        42        0.3        0.1        0.2        0.4        0.1

ok so here i don't know how to create the new in-between line? and how do i "store" the memory of the two nearest neighbors lines so i can average them? when i know how to do those two things, i should hope i could write it???

tabby

tabbygirl1990 10-15-2013 11:24 AM

maybe a better example input file cause it has more lines

Code:

1        42        0.0        0.4        0.3        0.8        0.7
2        42        0.3        0.1        0.2        0.4        0.1
3      42      0.0    0.1    0.0    0.2    0.4
4      42      0.7    0.1    0.0    0.0    0.8
5      42      0.3    0.2    0.3    0.8    0.1

in the output file the newline of averaged number is line #2, line #1 is the same as the input, and line #3 is line #2 from above, if this thing was run on a 100line file, i think it would give back 198 lines, right?


Code:

1        42        0.0        0.4        0.3        0.8        0.7
an odd index line
2        42        0.3        0.1        0.2        0.4        0.1
an odd index line
3      42      0.0    0.1    0.0    0.2    0.4
an odd index line
4      42      0.7    0.1    0.0    0.0    0.8
an odd index line
5      42      0.3    0.2    0.3    0.8    0.1

so my first thought would be to go through the file once creating the in-between lines, these would all be odd indexed lines, then some way refernce the script only to work on odd indexed lines, right?


All times are GMT -5. The time now is 11:50 PM.