LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-29-2008, 11:35 AM   #1
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Rep: Reputation: 15
Question CSV | GAWK | Record merge problem!


Hi,

I'm looking to take a CSV file which will have three fields:-

Customer Number
Balance
Payor ID

The file will have duplicate records which need merging along with the value of the second field (balance).

In theory you could have the following file:

Code:
customer, balance, payor
001, 200.00, Microsoft
002, 100.00, Apple
002, 100.00, Apple
003, 800.00, IBM
004, 250.00, Microsoft
001, 100.00, Microsoft
Please see my previous post and the excellent answers I got. This worked great with 2 fields but I need the third field (Payor ID) to be printf out with the results so it would like this..

Code:
customer, balance, payor
001, 300.00, Microsoft
002, 200.00, Apple
003, 800.00, IBM
004, 250.00, Microsoft
In cases where there is a different payor ID for the same customer id a "XXX" would need to be listed instead of a payor...

Code:
005, 500.00, XXX
Here is my current code which only printf's the two fields...

Code:
BEGIN{ FS = ","; getline }
{
 # populate array with customer as index and sum of balances as values
 balance[$1] = balance[$1] + $2
}
END{ # sort the indices of the array and put them in array indices
     n = asorti(balance,indices)

     # print the sorted array
     for (i = 1; i <= n; i++)
         printf "%s, %5.2f\n", indices[i], balance[indices[i]]
}
I have tried all permutations to get this to work and read the man page for awk many times so I'm desperate for some help.

Many thanks!
 
Old 07-29-2008, 01:57 PM   #2
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Trying to avoid GNU Awk only features:

Code:
awk -F, 'END { 
  while (++i <= c)     
    printf "%s,%.2f,%s\n",\
    p[i], s[p[i]], t[p[i]]   
  }
NR == 1 { 
  print
  next
  }
{
  if (!_[$1]++) {
    p[++c] = $1
    t[$1] = $3
    }
  if (_[$1] > 1 && $3 != t[$1])
    t[$1] = " XXX"  
  s[$1] += $2
   }' filename
The above code tries to preserve the original order of the field1.

With GNU Awk:

Code:
awk -F, 'NR == 1 { print; next }
{ r[$1] += $2; t[$1] = (_[$1] ++ && $3 != t[$1]) ? " XXX" : $3 }
END { n = asorti(r, idx) 
while (++i <= n)
  printf "%s, %.2f,%s\n", idx[i], r[idx[i]], t[idx[i]] 
  }' filename

Last edited by radoulov; 07-29-2008 at 02:30 PM.
 
Old 07-30-2008, 06:08 AM   #3
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Original Poster
Rep: Reputation: 15
Question

Thanks radoulov that worked great.

However, is it possible where there are different payor ID's for the same customer id instead of the XXX could the routine prinf a new line for each payor ID associated with a customer?

So in the example below:-

Code:
customer, balance, payor
001, 200.00, Microsoft
002, 100.00, Apple
002, 100.00, Apple
003, 800.00, IBM
004, 250.00, Microsoft
001, 100.00, Microsoft
001, 100.00, 001
001, 100.00, Apple
You would get as a result:-

Code:
customer, balance, payor
001, 300.00, Microsoft
001, 100.00, 001
001, 100.00, Apple
002, 200.00, Apple
003, 800.00, IBM
004, 250.00, Microsoft
I hop this makes sense, sorry to mess you around but I thought it wouldn't be possible to do this but perhaps it is?

Thank you.
 
Old 07-30-2008, 06:58 AM   #4
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Try this, I don't know if the payor order matters:

[GNU Awk]

Code:
awk -F', *' 'NR == 1 { print; next }
{ _[$1, $3] += $2 }
END { 
  n = asorti(_, __)
  while (++i <= n) {
    split(__[i], t, SUBSEP)
    printf "%s, %.2f, %s\n",
      t[1], _[__[i]], t[2]    
      }
    }' filename
 
Old 07-30-2008, 08:10 AM   #5
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by radoulov View Post
Try this, I don't know if the payor order matters:

[GNU Awk]

Code:
awk -F', *' 'NR == 1 { print; next }
{ _[$1, $3] += $2 }
END { 
  n = asorti(_, __)
  while (++i <= n) {
    split(__[i], t, SUBSEP)
    printf "%s, %.2f, %s\n",
      t[1], _[__[i]], t[2]    
      }
    }' filename
You are most excellent! Thank you!!
 
  


Reply

Tags
awk, csv, gawk, merge, printf


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing two csv files and write different record in third CSV file irfanb146 Linux - Newbie 3 06-30-2008 09:15 PM
CSV file merge help lmedland Programming 2 06-04-2008 11:11 AM
CSV Merge CorvusE Programming 3 12-12-2007 03:37 PM
Gawk and Apt-get Problem saltas Debian 9 09-28-2004 07:06 PM
Gawk install problem safrout Linux From Scratch 5 09-08-2002 08:11 PM


All times are GMT -5. The time now is 04:17 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration