LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 08-02-2012, 06:42 PM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 112

Rep: Reputation: 15
Script to sum values across columns if they have the same row title


Hi all,

I have a large file that looks like this:
Code:
0007	10
0007	2
0007	14
0010	13
0010	3
0010	6
0010	15
0010	8
0010	24
0010	9
0010	1
0010	5
0010	14
0010	4
0010	1
0010	30
0011	6
0011	15
0011	15
0011	8
0011	15
0011	14
0011	8
0011	16
0011	27
0011	3
0011	5
0011	18
0011	6
0011	3
0011	7
0015	14
0015	20
0015	15
0015	5
0015	27
0015	13
0015	20
0015	28
0015	5
0015	6
0015	2
0015	14
0015	13
0015	7
0015	3
I would like to get the summation for all the values associated with 0007, 0010, 0011, and 0015 and produce an output file that looks like this:
Code:
7	26
10	133
11	166
15	192
Can anyone suggest a straightforward way to implement this?

Thank so much!
Kevin
 
Old 08-02-2012, 09:54 PM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
What is your attempt and where are you stuck?
 
Old 08-02-2012, 10:27 PM   #3
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,197

Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
awk would be the most straightforward way. it could match the regular expressions and sum the fields that match that regex.
 
Old 08-03-2012, 12:09 PM   #4
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 112

Original Poster
Rep: Reputation: 15
Got something that works! Thanks all!

Code:
sed -n '/^0007/p' number_of_positions_cleaned_in_each_file.txt | awk '{ sum+=$2} END {print sum}'
 
Old 08-03-2012, 12:22 PM   #5
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,197

Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
Quote:
Originally Posted by kmkocot View Post
Got something that works! Thanks all!

Code:
sed -n '/^0007/p' number_of_positions_cleaned_in_each_file.txt | awk '{ sum+=$2} END {print sum}'
good job. this mite be more efficient
Code:
awk '/^0007/ { sum+=$2} END {print sum}' kmkocot.txt
 
Old 08-04-2012, 01:13 AM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
I think the OP made a good-faith effort, so here's how I would solve it. We use an array instead of a simple variable.

Code:
gawk 'BEGIN{ FS=OFS="\t" ; PROCINFO["sorted_in"]="@ind_num_asc" } { sum[$1]+=$2 } END{ for ( i in sum ){ print int( i ) , sum[i] } }' infile.txt
This relies on a feature found only in gnu awk version 4+, array sorting, so I called it specifically. On most Linuxes it should be the default awk.

Code:
BEGIN{ FS=$OFS="\t" ; PROCINFO["sorted_in"]="@ind_num_asc" }
This sets the input and output delimiters to tab, and gawk's (again v4+) internal array sorting to index-numeric-ascending. Otherwise the final output will be random in respect to the input.

http://www.gnu.org/software/gawk/man...y-Sorting.html

We could use the asorti function instead, but I find this way to be easier.
If you're using a version of awk that doesn't support sorting, then the easiest option is probably to just pipe the output through sort -n -k1 afterwards.

Code:
{ sum[$1]+=$2 }
Run through every line and store the values in an array, With indexes based on field 1. Every line that has the same $1 will have it's $2 value added to that entry. This is exactly like the "sum+=$2" the OP used, but allows for tracking multiple arbitrary values.

Code:
END{ for ( i in sum ){ print int( i ) , sum[i] } }
At the end of the file loop through the array. Print the index ( the $1 fields ), and the final value for that array entry. I used the int function on the i values to strip off the leading zeroes first. It's also possible to use printf and the %d tokens to format the output in the same way.
 
Old 08-04-2012, 10:35 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
How about:
Code:
awk -vidx="0007" '{sum[$1]+=$2}END{print sum[idx]}' file
 
Old 08-05-2012, 08:19 PM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,004
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Moved: This thread is more suitable in <programming> and has been moved accordingly to help your thread/question get the exposure it deserves.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sum up values from each columns (awk) lcvs Linux - Newbie 10 06-20-2012 05:16 AM
Sum up value of columns in 4 files using shell script reach.sree@gmail.com Programming 6 06-07-2012 08:29 PM
text manipulation in bash: sort columns according to the first row lethalfang Linux - Newbie 5 06-20-2011 05:10 PM
mysql sum from one of two columns secretlydead Programming 1 12-16-2009 02:11 AM
Shell script to parse csv-like output, row by row utahnix Linux - General 8 12-08-2007 06:03 AM


All times are GMT -5. The time now is 12:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration