LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-26-2012, 04:37 AM   #1
archduke83
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Rep: Reputation: Disabled
Unhappy Need help with awk and two csv: field match / substitution


Hi everybody,
I'm a newbie in awk and i'm trying to understand how to achieve this:
I have two .csv files, with semi-colon separated values
file1 is the "map" file, structured like this:

CITY_NAME;NEW_REGION;OLD_REGION;NEW_ZIP_CODE;OLD_ZIP_CODE

and file2 is the data file:

NEW_REGION;CITY_NAME;STREET_NAME;NEW_ZIP_CODE


basically, i have to match field 1, 2, and 4 from file1 with, respectively, field 2,1 and 4 from file 2,
whenever a match is found, makem a substitution in file2 with the OLD_REGION and OLD_ZIP field from file1
and produce a third file with just the records modified.

I'm desperately trying to understand the awk 'NR==FNR{a[$1]=$3;next... logic, but I'mreally stuck at the moment.
Could you please give me any help, that would be much appreciated, thanks.
Cheers,
Archduke83
 
Old 04-26-2012, 06:21 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,830

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
so you have an awk script? Just post it and ask what is unclear.
start to read this: http://www.gnu.org/software/gawk/man...etting-Started, 1.3 Some Simple Examples
 
Old 04-26-2012, 07:01 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
pan64 has the good oil on where to start. As for the snippet:
Code:
NR==FNR{a[$1]=$3;next}
NR - number of records in all files passed to awk

FNR - number of records per file starting at 1 each time a new file is started

So the first part will only ever happen for the first file passed to awk. After this the NR number will keep increasing to finally be the total of all FNRs for all files

a[$1] = $3 - create array "a" with an index equal to the value in the first field ($1) and assign the value of the third field ($3). As per pan64's link you will find a section on arrays
saying how all indexes are strings, even if they look like numbers.

next - process the next line from the current file

Hope that helps.
 
Old 04-27-2012, 07:05 AM   #4
archduke83
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Ok, i came up with this:

nawk -F";" 'NR==FNR{a[$1FS$2]=$1FS$2;print a[$1FS$2];next} $2FS$1 in a {printf $2FS$1 "\n"}' file1 file2

But it doesn't seem to act properly, I just set the correct pattern in memory from the 1st file, but that doesn't match with the pattern i create from the second file.

What am I missing?
Thank you for your help guys.
 
Old 04-27-2012, 09:37 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
How does what you have written reflect your original query?
Code:
nawk -F";" 'NR==FNR{a[$1FS$2]=$1FS$2;print a[$1FS$2];next} $2FS$1 in a {printf $2FS$1 "\n"}' file1 file2
a[$1FS$2]=$1FS$2 - i have to match field 1, 2, and 4 from file1, OLD_REGION and OLD_ZIP

So your example makes no use of $4 and is not saving OLD_REGION and OLD_ZIP, which are fields 3 and 5

print a[$1FS$2] - Why? In the output you asked for, "makem a substitution in file2 with the OLD_REGION and OLD_ZIP field from file1". So no request to have field 1 & 2 from file 1

$2FS$1 in a - This part makes sense but of course is still missing field 4

printf $2FS$1 "\n" - Just use print as it applies a new line automatically.

You may need to read the link provided by pan64 a little further.

Also, be cautious as nawk does not support all the features of gawk, which is what the link provides details on.
 
Old 04-27-2012, 10:57 AM   #6
archduke83
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Unhappy

ok, sorry if I made a mistake before, I didn't mention the right fields. Here I'm trying to make the replacements (the fields are correct). Now it's:

nawk -F";" 'NR==FNR{a[$1]=$1;b[$1]=$4;c[$1]=$2;d[$1]=$3;next}
{for (cont in a) if(match(a[cont],$2) > 0 && match(b[cont],$1) > 0)
gsub($2,c[cont]);gsub($1,d[cont])}' map_prov_tc.csv data1.csv

When, instead of putting gsub, I print the correspondences (print $2FS$1 "->" c[cont] FS d[cont]}) It retrieves the correct matches.
But I can't replace the correct values.
 
Old 04-27-2012, 11:33 AM   #7
archduke83
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Lightbulb

This seems to do the job:

nawk -F";" 'NR==FNR{a[$1]=$1;b[$1]=$4;c[$1]=$2;d[$1]=$3;next}
{for (cont in a) if(match(a[cont],$2) > 0 && match(b[cont],$1) > 0)
sub($2,c[cont]) sub($1,d[cont])}1' map_prov_tc.csv data1.csv
 
Old 04-27-2012, 11:34 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
When you use gsub you are changing the value of the line, but if you never print the line it cannot be displayed or redirected into another file.
 
Old 04-27-2012, 11:41 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Yes, as now you have printed the results ... good work

Here is an alternative:
Code:
awk 'BEGIN{OFS=FS=";"}NR==FNR{a[$1$2$4]=$3FS$5;next}split(a[$2$1$4],b){$1=b[1];$4=b[2]}1' file1 file2
 
Old 04-27-2012, 11:49 AM   #10
archduke83
LQ Newbie
 
Registered: Apr 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Thank you both grail and pan64
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk - rearrange column data in csv file to match columns wolverene13 Programming 9 12-21-2011 04:55 AM
[SOLVED] using awk to parse csv and exclude field range rylincoln Linux - Software 3 09-12-2011 11:26 PM
[SOLVED] awk: how to print a field when field position is unknown? elfoozo Programming 12 08-18-2010 03:52 AM
AWK: change a particular field in a csv file help help help!!!! haydar68 Programming 20 08-03-2008 01:10 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:16 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration