CSV file - Duplicate records need merging | BASH?
Hi,
I hope you can help. I'm looking to take a CSV file which will have two fields:- Customer Number Balance The file will have duplicate records which need merging along with the value of the second field (balance). In theory you could have the following file: Code:
Customer Number, Balance So the above example would look like this: Code:
Customer Number, Balance Thanks for your help. |
This awk code should do the trick:
Code:
BEGIN{ FS = ","; getline } |
How about:
Code:
awk -F, '{a[$1]+=$2}END{for(val in a) printf("%s, %.2f\n", val, a[val])}' |
Thats fantastic. Many thanks for the reply.
Would it be too much to ask just to explain a little as to what is going on in the syntax please? I assume the $ is not a variable but the field location? |
Quote:
In more details: Code:
BEGIN{ FS = ","; getline } Code:
balance[$1] = balance[$1] + $2 Code:
n = asorti(balance,indices) Code:
for (i = 1; i <= n; i++) A must read for learning awk programming is the official guide GAWK: Effective AWK Programming. |
Thank you. And...really stupid question..........
How do I run the above script? |
Save it in a file, e.g. sumbalance.awk, then run awk with -f option:
Code:
awk -f sumbalance.awk file |
I'm getting
Quote:
|
So you do not have GNU awk? In general, it is a pain to do sorting with nawk itself, but you can always pipe out:
Code:
awk 'BEGIN{FS = ",";getline}{balance[$1]+=$2} |
Maybe you have an older version of gawk, try
Code:
awk --version Code:
BEGIN{ FS = ","; getline } |
I had to install gawk but the script worked.
Thank you so much for your help!! :D |
Merge and sum lines problem
I have a file containing this lines:
15 11Aug20080102 test.com/info/ zone1 test.com/ 81 11Aug20080200 test.com/info/ zone1 test.com/ 29 11Aug20080301 test.com/info/ zone1 test.com/ 49 11Aug20080400 test.com/info/ zone1 test.com/ 18 11Aug20080205 test.com/info/ zone1 test.com.mx/ 23 11Aug20080316 test.com/info/ zone1 test.com.mx/ how can I merge this lines and sum the values on the first column but printing the first date ocurrence? desired output: 174 11Aug20080102 test.com/info/ zone1 test.com/ 41 11Aug20080102 test.com/info/ zone1 test.com.mx/ I just tried to modified the code from this post but I haven't done it work. Cheers, EM |
Not sure about the "first date occurrence" but you can assign the different fields to different arrays using the last field as index. When assigning the date you can previously check if the array element is null, then assign the date in field 2. I'm thinking about something like this:
Code:
{ The output of the above code is: Code:
$ awk -f test.awk file |
Try This
{
{OFS = FS = ","} x[$1] = x[$1]+$2 } END{ for (i in x) printf "%s, %5.2f\n", i,x[i] } |
Hi,
Im trying to reuse this code on this same input file but under Solaris with gawk Although the translation by profesor Colucix is done perfectly, I dont know how to modify it a bit. The result I need is slight different: The input file.txt: Code:
Customer Number, Balance Code:
001, 40.00, 72.00 |
All times are GMT -5. The time now is 08:23 PM. |