LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   awk command to merge two files (https://www.linuxquestions.org/questions/linux-newbie-8/awk-command-to-merge-two-files-859004/)

silkysue 01-27-2011 07:06 AM

awk command to merge two files
 
Hi all

Hoping for some help with a script I need to implement. New-ish to shell scripting but no-one else here has any clue about shell scripting, let alone AWK :) and I am struggling.

I have 2 files, the 1st column in both is the RecID so to speak. The Merge file will contain all records contained in Prim file. In addition any records which aren't in the primary file but in the SecFile will get added to the Merge File.

PrimFile:
DE3001 16/06/09 P
DE4001 16/06/09 P
DE4101 16/06/09 P
DU3101 16/06/09 0

SecFile:
DE4101 13/06/04 0
DU3101 27/01/11 P
DS2111 19/01/07 P

MergeFile:
DE3001 16/06/09 P
DE4001 16/06/09 P
DE4101 16/06/09 P
DU3101 16/06/09 0
DS2111 19/01/07 P

Appreciate any help..

Suzanne

onebuck 01-27-2011 07:13 AM

Hi,

Welcome to LQ!

What have you done to find a solution to the problem? Other than to post here.

We will aid you when you help yourself to a solution. Provide us with what you have attempted and then maybe someone will be able to assist. 




Just a few links to aid you to gaining some understanding. Sure some may seem beyond a newbie but you must start somewhere;



Linux Documentation Project
Rute Tutorial & Exposition
Linux Command Guide
Utimate Linux Newbie Guide
LinuxSelfHelp
Bash Beginners Guide
Bash Reference Manual
Advanced Bash-Scripting Guide
Linux Home Networking



The above links and others can be found at 'Slackware-Links'. More than just SlackwareŽ links!

schneidz 01-27-2011 09:18 AM

something like:
Code:

for id in sec-file
 if [ `grep $id` not in prim-file ]
 then
  echo $row > to prim-file
 fi
done

would probably work.

druuna 01-27-2011 09:44 AM

Hi,

Code:

#!/bin/bash

awk 'BEGIN { while ( ( getline < "PrimFile" ) > 0 )
uniquarray[$1] = $2" "$3
}
{
if ( ! uniquarray[$1] ) {
  uniquarray[$1] = $2" "$3 }
}
END {
for (i in uniquarray) {print i,uniquarray[i]}
}' SecFile | sort > MergeFile

exit 0

As stated by onebuck, you should post what you have tried or what parts are unclear. On the other hand I do believe that an awk solution for this problem isn't something that can be easily done by someone without any scripting knowledge......

I won't include a description of how the above script works, that's for you (silkysue) to find out (and hopefully learn something in the process).

Anyway, hope this helps.

grail 01-27-2011 09:53 AM

Maybe something like:
Code:

awk 'FNR==NR{arr[$1]++;print;next}!($1 in arr)' PrimFile SecFile > MergeFile

druuna 01-27-2011 10:02 AM

@grail: Nice and short!!

silkysue 01-27-2011 10:08 AM

Hi guys

Thanks for all the responses, lots of googling and even a little bit of learning has got me there!

Quote:

awk '{a[$1]=$0}END{for(i in a)print a[i]}' oldfile primaryfile | sort > mergefile
So Druuna your solution also worked and am looking through it trying to see how it works:
1st field in PrimFile is the key, stick into an array
Loop through the array and for any key in SecFile but not PrimFile print to file
Sort to Mergefile

Thanks all!!

SOLVED

druuna 01-27-2011 10:14 AM

Hi,

You're welcome :)

One small comment:

Quote:

Originally Posted by silkysue (Post 4239585)
So Druuna your solution also worked and am looking through it trying to see how it works:
1st field in PrimFile is the key, stick into an array
Loop through the array and for any key in SecFile but not PrimFile print to file
Sort to Mergefile

You have the bold part backwards....

SecFile is "looped" (read line by line by awk) and its first field is checked against the array. If it doesn't exist; add it.


All times are GMT -5. The time now is 06:59 AM.