LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   createing sum column (https://www.linuxquestions.org/questions/linux-newbie-8/createing-sum-column-4175415926/)

udiubu 07-10-2012 10:51 AM

createing sum column
 
Dear all,

This is a simple "sum" command that could be done for little files in excel as well.

I have a one column file with sequences of 3 and 6:

3
3
6
3
6
6

I need to create a second column which starts with value "6" in the first line. From the second line on, the resulting value needs to be the sum of the previous value in newly created column two plus the correspondent value in column one:

3 6
3 9
6 12
3 18
6 21
6 27
33

Any suggestions would be highly appreciated!

I thank you very much for your attention.

Sincerely,

Udiubu

montel 07-10-2012 11:33 AM

This wouldnt be able to be run in excel, but i have written it in bash and this works:

Code:

#!/bin/bash
x=6
while read line; do
    echo "$line $x" >> "newFile"
    x=`expr $x + $line`
done < "file"

file:
3
3
6
9
3
3
6
6

newFile:
3 6
3 9
6 12
9 18
3 27
3 30
6 33
6 39

David the H. 07-10-2012 11:52 AM

In awk:

Code:

awk 'BEGIN{ a=6 } { print $1 , a ; a += $1 } END{ print a }' file

But this... gah!
Code:

x=`expr $x + $line`
$(..) is highly recommended over `..`

"expr is a relic of ancient Rome. Do not wield it."

As long as the input consists of integers, just use one of the built-in arithmetic expression patterns instead. If the number can be floating point, you'll need to use an external tool like awk or bc.
Code:

x=$(( x + line ))

x=$( echo "$x + $line" | bc )


montel 07-10-2012 12:07 PM

Haha, my bad :3

I was not aware that you should not use "`" or "expr". I will read up on this and start changing things. Thanks for the input.

grail 07-10-2012 01:14 PM

Code:

awk -va=6 '$2=NR==1?a:a+=b;{b=$1}' file

udiubu 07-10-2012 04:35 PM

The work all three very well. Thanks a lot to all of you.
I hope I can ask for a follow-up: let's say I would still need to create a second column which starts with value "6" in the first line. This time, from the second line on, the resulting value needs to be the sum of the previous value in the newly created column two plus the value taken from one line below. So to say: b1+a2,b2+a3,b3+a4 and so on:

3 6
3 9
6 15
3 18
6 24
6 30
3 33

This would help me organizing a lot of files.
I would be very thankful for any help.

TB0ne 07-10-2012 06:40 PM

Just curious...you've been given quite a bit of help in the above posts. Certainly enough for you to continue, but this is sounding very much like homework, and we've yet to see anything that YOU have written. Can you show us what you've done to work towards your goals??

udiubu 07-10-2012 07:08 PM

Sure I can show TBOne! Sorry but it cannot be albout homework at my age anymore..
Since I need to check different jittering options for fMRI simulation, I thought I could have managed to create the option files all in one row, with a simple awk command.

So assuming David the H's suggestion to work nicely when the two values need to be summed along the same line:

awk 'BEGIN{ a=6 } { print $1 , a ; a += $1 } END{ print a }' file

I have been trying around with something similar, by reprinting the value in $1 once again:

awk 'BEGIN{ a=6 } { print $1 , a ; print $1, a += $1 } END{ print a }' file

But this deesn't work, so you get the same value in $1 twice, and the similar values as above.
I'm looking for a way to say "a += $1(but the next line).
I know how to get the line below and above a matched line with grep, I guess it should be -A and -B, but I really don't know how to go on in case of a regexp like above.

I hope I clarified my position.

Sincerely,

Udiubu

grail 07-10-2012 09:49 PM

Simple change to mine for second solution:
Code:

awk -va=6 '$2=NR==1?a:a+=$1' file

udiubu 07-11-2012 07:31 AM

That worked great!

Thanks grail
I just don't get the question mark in this expression.

Best,

Udiubu

grail 07-11-2012 08:01 AM

?: - This is a shorthand version of an 'if' statement:
Code:

$2=NR==1?a:a+=$1

If NR == 1
    $2 = a
else
    $2 = ( a += $1 )


udiubu 07-11-2012 08:25 AM

OK got it!
Thanks a lot grail.

David the H. 07-11-2012 09:47 AM

Quote:

Originally Posted by montel (Post 4724099)
Haha, my bad :3

I was not aware that you should not use "`" or "expr". I will read up on this and start changing things. Thanks for the input.

Not to worry. You weren't wrong, per-se. Just out of date. These were once commonly-used forms, but modern shells have replaced them with newer, better operators. I'm just helping to spread the word. :jawa:



If you hadn't noticed, grail is our resident awk super-guru. I don't know how he does it, but he always manages to distill whatever I write into something half as long, but which takes twice as long to figure out. ;)

I've noticed recently that this often involves ternary operators. I've been working on using them more myself because of him, but I'm nowhere near as proficient yet. I tend to avoid them when posting here anyway, as not many people are familiar with them, and regular if/else clauses tend to be easier to parse.

So here's my entry for the second challenge using more standard syntax. Logic-wise, it's not much different from grail's version.

Code:

awk '{ if ( NR==1 ) { a=6 } else { a+=$1 } ; print $1 , a }' file.txt

Quote:

I have been trying around with something similar, by reprinting the value in $1 once again:

awk 'BEGIN{ a=6 } { print $1 , a ; print $1, a += $1 } END{ print a }' file

But this deesn't work, so you get the same value in $1 twice, and the similar values as above.
I'm looking for a way to say "a += $1(but the next line).

Yep, you're just printing the current line's value twice, with a bit of addition thrown in. Remember, awk loads and processes one record at a time; a single line by default. The values on the next line aren't available until it finishes processing the commands for the current one and moves on to the next. We use the variable to carry the current total over into the next step so that that line's value can be added to it then.

Notice that the main difference in the two versions is in the order we do the actions. In the first solution, we added the $1 to the variable after printing, but before moving on to the next line (so that it always prints the total from the previous line), whereas in this one we add the current line's $1 before printing (so that the total includes the current line's value when printed).

The BEGIN/END/if parts are only there to handle the edge cases of the first and last lines.


By the way, please use ***[code][/code] tags*** around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, bolding, colors, or other fancy formatting.

udiubu 07-11-2012 11:26 AM

I couldn't have got it better David! Thanks so much

Quote:

Yep, you're just printing the current line's value twice, with a bit of addition thrown in. Remember, awk loads and processes one record at a time; a single line by default. The values on the next line aren't available until it finishes processing the commands for the current one and moves on to the next. We use the variable to carry the current total over into the next step so that that line's value can be added to it then.

Notice that the main difference in the two versions is in the order we do the actions. In the first solution, we added the $1 to the variable after printing, but before moving on to the next line (so that it always prints the total from the previous line), whereas in this one we add the current line's $1 before printing (so that the total includes the current line's value when printed).
Best,

Udiubu

TB0ne 07-11-2012 12:53 PM

Quote:

Originally Posted by udiubu (Post 4725150)
I couldn't have got it better David! Thanks so much

Best,
Udiubu

Outstanding, and thanks for following up.

grail 07-11-2012 01:11 PM

Quote:

Originally Posted by David the H. (Post 4725050)
If you hadn't noticed, grail is our resident awk super-guru. I don't know how he does it, but he always manages to distill whatever I write into something half as long, but which takes twice as long to figure out. ;)

Thanks David ... it is nice when we get positive feedback from the regulars too :)


All times are GMT -5. The time now is 06:04 PM.