LinuxQuestions.org - CSV | GAWK | Record merge problem!

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - CSV | GAWK | Record merge problem! (https://www.linuxquestions.org/questions/programming-9/csv-%7C-gawk-%7C-record-merge-problem-659045/)

CSV | GAWK | Record merge problem!

Hi,

I'm looking to take a CSV file which will have three fields:-

Customer Number
Balance
Payor ID

The file will have duplicate records which need merging along with the value of the second field (balance).

In theory you could have the following file:

Code:

customer, balance, payor

001, 200.00, Microsoft

002, 100.00, Apple

002, 100.00, Apple

003, 800.00, IBM

004, 250.00, Microsoft

001, 100.00, Microsoft

Please see my previous post and the excellent answers I got. This worked great with 2 fields but I need the third field (Payor ID) to be printf out with the results so it would like this..

Code:

customer, balance, payor

001, 300.00, Microsoft

002, 200.00, Apple

003, 800.00, IBM

004, 250.00, Microsoft

In cases where there is a different payor ID for the same customer id a "XXX" would need to be listed instead of a payor...

Code:

005, 500.00, XXX

Here is my current code which only printf's the two fields...

Code:

BEGIN{ FS = ","; getline }

{

 # populate array with customer as index and sum of balances as values

 balance[$1] = balance[$1] + $2

}

END{ # sort the indices of the array and put them in array indices

    n = asorti(balance,indices)



    # print the sorted array

    for (i = 1; i <= n; i++)

        printf "%s, %5.2f\n", indices[i], balance[indices[i]]

}

I have tried all permutations to get this to work and read the man page for awk many times so I'm desperate for some help.

Many thanks!

Trying to avoid GNU Awk only features:

Code:

awk -F, 'END { 

  while (++i <= c)    

    printf "%s,%.2f,%s\n",\

    p[i], s[p[i]], t[p[i]]  

  }

NR == 1 { 

  print

  next

  }

{

  if (!_[$1]++) {

    p[++c] = $1

    t[$1] = $3

    }

  if (_[$1] > 1 && $3 != t[$1])

    t[$1] = " XXX"  

  s[$1] += $2

  }' filename

The above code tries to preserve the original order of the field1.

With GNU Awk:

Code:

awk -F, 'NR == 1 { print; next }

{ r[$1] += $2; t[$1] = (_[$1] ++ && $3 != t[$1]) ? " XXX" : $3 }

END { n = asorti(r, idx) 

while (++i <= n)

  printf "%s, %.2f,%s\n", idx[i], r[idx[i]], t[idx[i]] 

  }' filename

Thanks radoulov that worked great.:cool:

However, is it possible where there are different payor ID's for the same customer id instead of the XXX could the routine prinf a new line for each payor ID associated with a customer?

So in the example below:-

Code:

customer, balance, payor

001, 200.00, Microsoft

002, 100.00, Apple

002, 100.00, Apple

003, 800.00, IBM

004, 250.00, Microsoft

001, 100.00, Microsoft

001, 100.00, 001

001, 100.00, Apple

You would get as a result:-

Code:

customer, balance, payor

001, 300.00, Microsoft

001, 100.00, 001

001, 100.00, Apple

002, 200.00, Apple

003, 800.00, IBM

004, 250.00, Microsoft

I hop this makes sense, sorry to mess you around but I thought it wouldn't be possible to do this but perhaps it is?

Thank you.:confused:

Try this, I don't know if the payor order matters:

[GNU Awk]

Code:

awk -F', *' 'NR == 1 { print; next }

{ _[$1, $3] += $2 }

END { 

  n = asorti(_, __)

  while (++i <= n) {

    split(__[i], t, SUBSEP)

    printf "%s, %.2f, %s\n",

      t[1], _[__[i]], t[2]    

      }

    }' filename

Quote:

Originally Posted by radoulov (Post 3230850)

Try this, I don't know if the payor order matters:

[GNU Awk]

Code:

awk -F', *' 'NR == 1 { print; next }

{ _[$1, $3] += $2 }

END { 

  n = asorti(_, __)

  while (++i <= n) {

    split(__[i], t, SUBSEP)

    printf "%s, %.2f, %s\n",

      t[1], _[__[i]], t[2]    

      }

    }' filename

You are most excellent! Thank you!!