I think the OP made a good-faith effort, so here's how I would solve it. We use an array instead of a simple variable.
Code:
gawk 'BEGIN{ FS=OFS="\t" ; PROCINFO["sorted_in"]="@ind_num_asc" } { sum[$1]+=$2 } END{ for ( i in sum ){ print int( i ) , sum[i] } }' infile.txt
This relies on a feature found only in
gnu awk version 4+, array sorting, so I called it specifically. On most Linuxes it should be the default awk.
Code:
BEGIN{ FS=$OFS="\t" ; PROCINFO["sorted_in"]="@ind_num_asc" }
This sets the input and output delimiters to
tab, and gawk's (again v4+) internal array sorting to index-numeric-ascending. Otherwise the final output will be random in respect to the input.
http://www.gnu.org/software/gawk/man...y-Sorting.html
We could use the
asorti function instead, but I find this way to be easier.
If you're using a version of awk that doesn't support sorting, then the easiest option is probably to just pipe the output through
sort -n -k1 afterwards.
Run through every line and store the values in an array, With indexes based on field 1. Every line that has the same $1 will have it's $2 value added to that entry. This is exactly like the "
sum+=$2" the OP used, but allows for tracking multiple arbitrary values.
Code:
END{ for ( i in sum ){ print int( i ) , sum[i] } }
At the end of the file loop through the array. Print the index ( the $1 fields ), and the final value for that array entry. I used the
int function on the i values to strip off the leading zeroes first. It's also possible to use
printf and the
%d tokens to format the output in the same way.