[SOLVED] Need help with awk and two csv: field match / substitution
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Need help with awk and two csv: field match / substitution
Hi everybody,
I'm a newbie in awk and i'm trying to understand how to achieve this:
I have two .csv files, with semi-colon separated values
file1 is the "map" file, structured like this:
basically, i have to match field 1, 2, and 4 from file1 with, respectively, field 2,1 and 4 from file 2,
whenever a match is found, makem a substitution in file2 with the OLD_REGION and OLD_ZIP field from file1
and produce a third file with just the records modified.
I'm desperately trying to understand the awk 'NR==FNR{a[$1]=$3;next... logic, but I'mreally stuck at the moment.
Could you please give me any help, that would be much appreciated, thanks.
Cheers,
Archduke83
pan64 has the good oil on where to start. As for the snippet:
Code:
NR==FNR{a[$1]=$3;next}
NR - number of records in all files passed to awk
FNR - number of records per file starting at 1 each time a new file is started
So the first part will only ever happen for the first file passed to awk. After this the NR number will keep increasing to finally be the total of all FNRs for all files
a[$1] = $3 - create array "a" with an index equal to the value in the first field ($1) and assign the value of the third field ($3). As per pan64's link you will find a section on arrays
saying how all indexes are strings, even if they look like numbers.
next - process the next line from the current file
nawk -F";" 'NR==FNR{a[$1FS$2]=$1FS$2;print a[$1FS$2];next} $2FS$1 in a {printf $2FS$1 "\n"}' file1 file2
But it doesn't seem to act properly, I just set the correct pattern in memory from the 1st file, but that doesn't match with the pattern i create from the second file.
How does what you have written reflect your original query?
Code:
nawk -F";" 'NR==FNR{a[$1FS$2]=$1FS$2;print a[$1FS$2];next} $2FS$1 in a {printf $2FS$1 "\n"}' file1 file2
a[$1FS$2]=$1FS$2 - i have to match field 1, 2, and 4 from file1, OLD_REGION and OLD_ZIP
So your example makes no use of $4 and is not saving OLD_REGION and OLD_ZIP, which are fields 3 and 5
print a[$1FS$2] - Why? In the output you asked for, "makem a substitution in file2 with the OLD_REGION and OLD_ZIP field from file1". So no request to have field 1 & 2 from file 1
$2FS$1 in a - This part makes sense but of course is still missing field 4
printf $2FS$1 "\n" - Just use print as it applies a new line automatically.
You may need to read the link provided by pan64 a little further.
Also, be cautious as nawk does not support all the features of gawk, which is what the link provides details on.
ok, sorry if I made a mistake before, I didn't mention the right fields. Here I'm trying to make the replacements (the fields are correct). Now it's:
nawk -F";" 'NR==FNR{a[$1]=$1;b[$1]=$4;c[$1]=$2;d[$1]=$3;next}
{for (cont in a) if(match(a[cont],$2) > 0 && match(b[cont],$1) > 0)
gsub($2,c[cont]);gsub($1,d[cont])}' map_prov_tc.csv data1.csv
When, instead of putting gsub, I print the correspondences (print $2FS$1 "->" c[cont] FS d[cont]}) It retrieves the correct matches.
But I can't replace the correct values.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.