I've found solutions to similar situations online, but I'm going crazy trying to get them to work correctly for my particular need. I've been pursuing an awk solution as it seems to be the simplest tool for my needs, but I'm open to other solutions as well.
I have two CSV files. One we'll call the master file. It looks like this:
Note: The only unique key is the combination of field 1 and 2. In other words, keywords are not unique, except within a category. There are thousands of records in this file.
I have several files containing subsets of that data, which have been edited by my client. The returned file might look a little like this:
I need parse each line of the edited file, find the matching category/keyword combo in the master file and overwrite it (true, I don't technically need to overwrite if the complete record is the same, but a hammer will work fine for this job). If it doesn't exist (because the client changed the keyword), I need to append the line to the master file.
I can handle the sorting of everything once the merge takes place (and the deletion of obsolete words from the master file), so we don't need to worry about that.
Trying to find a solution has been fun (albeit seriously maddening), but it's taking me away from the primary work of adding data to the master file.
Can anyone throw me a lifeline or point me to where a lifeline already exists? Thanks in advance!
I wonder if join could be used for this? AFAIK it is limited to having a single keyfield, but I would love to know if there is a way to specify more than one key field.
Assuming a simpler join or sort based solution cannot be found, it is not a difficult task for the likes of Awk or Perl. Here's my first go at a Perl solution:
I'm subjecting this to some pretty rigorous testing, but at first glaze it appears to be exactly what I need.
Were I so equipped, I would offer to bear your children. As it is, I'll have to settle for offering my deepest thanks.
I suppose you should update from one returned file
at a time, otherwise you have to decide how to handle
duplicate keys if they exist.
If every returned file contain a different subset
(duplicate keys are not possible):
|All times are GMT -5. The time now is 07:10 AM.|