LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-29-2013, 05:11 PM   #1
bop-a-nator
LQ Newbie
 
Registered: Sep 2012
Location: North East USA
Distribution: at work: Red Hat Enterprise Linux Server release 5.8 (Tikanga); at home: what do you recommend?
Posts: 24

Rep: Reputation: Disabled
Matching two fields between two files with awk


Hi,

I have two files I am trying to find the items on File2 based on Field3 and Field4 matching the content of File1 Field1 and Field2. I may have other fields to check, but I figured if I understood how to do two, I could go from there. I’m not really sure how to set this up. I do not have Perl, and yes, I am looking at awk specifically, I realize grep may be an option too, though my question is about awk. Appreciate the help, bop-a-nator

File1:

Fruit,Kiwi
Fruit,Pear
Veggie,carrot
Dairy,Milk


File2:

Item,A,Veggie,tomato,20120903,Bin,42
Item,B,Fruit,Kiwi,20120901,Bin,2
Item,B,Fruit,Grapes,20120903,Bin,12
Item,A,Dairy,Milk,20120921,Fridge,3
Item,B,Fruit,Pear,20120903,Bin,14
Item,C,Veggie,carrot,20120903,Bin,45
Item,B,Veggie,celery,20120903,Bin,32

Output I am trying to get:

Item,B,Fruit,Kiwi,20120901
Item,A,Dairy,Milk,20120921
Item,B,Fruit,Pear,20120903
Item,C,Veggie,carrot,20120903

What I ran:

/bin/gawk 'BEGIN{OFS=FS=","}NR==FNR{file1[$3,$4]=$0;next} $1,$2 in file1 {print $1,$2,$3,$4,$5}' File1 File2 > Fileout

What came out on Fileout:

Item,A,Veggie,tomato,20120903
Item,B,Fruit,Kiwi,20120901
Item,B,Fruit,Grapes,20120903
Item,A,Dairy,Milk,20120921
Item,B,Fruit,Pear,20120903
Item,C,Veggie,carrot,20120903
Item,B,Veggie,celery,20120903
 
Old 04-29-2013, 08:36 PM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by bop-a-nator View Post
... I realize grep may be an option ...
grep is a good choice for this matching problem. Try this ...
Code:
grep -f $InFile1 $InFile2 |cut -d, -f1-5 >$OutFile
Daniel B. Martin
 
Old 04-30-2013, 02:31 AM   #3
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Just try this:
Code:
#!/bin/bash
while read -r line; do
grep $line File2 | cut -d"," -f1-5    ## Either use grep plus cut
# awk -v t='$line' 'BEGIN{FS=","}; /t/ {print $1,$2,$3,$4,$5}' File2 # Or usr awk
done < File1

Last edited by shivaa; 04-30-2013 at 08:07 AM. Reason: Little modification
 
Old 04-30-2013, 03:26 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Well you were close with awk:
Code:
awk 'BEGIN{OFS=FS=","}NR==FNR{a[$1][$2];next}$4 in a[$3]{print $1,$2,$3,$4,$5}' file1 file2
This should work for gawk v4+
 
Old 04-30-2013, 10:59 AM   #5
bop-a-nator
LQ Newbie
 
Registered: Sep 2012
Location: North East USA
Distribution: at work: Red Hat Enterprise Linux Server release 5.8 (Tikanga); at home: what do you recommend?
Posts: 24

Original Poster
Rep: Reputation: Disabled
Hi grail,

I tried the bit above and I seem to get a syntax error with the "[" I don't see any mis-matches and I think I had the quotes all in the correct locations.

prompt> /bin/gawk 'BEGIN{OFS=FS=","}NR==FNR{a[$1][$2];next}$4 in a[$3]{print $1,$2,$3,$4,$5}' File1 File2
gawk: BEGIN{OFS=FS=","}NR==FNR{a[$1][$2];next}$4 in a[$3]{print $1,$2,$3,$4,$5}
gawk: ^ syntax error
gawk: BEGIN{OFS=FS=","}NR==FNR{a[$1][$2];next}$4 in a[$3]{print $1,$2,$3,$4,$5}
gawk: ^ syntax error

Thanks,
bop-a-nator

Last edited by bop-a-nator; 04-30-2013 at 11:02 AM. Reason: Interesting the syntax error is pointing at the [ in front of the $2 and at the [ in front of the $3 on my screen.
 
Old 04-30-2013, 11:51 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Which version of gawk do you have? Assuming older which does not support multi-dimension arrays, the following change should work:
Code:
awk 'BEGIN{OFS=FS=","}NR==FNR{a[$1$2];next}$3$4 in a{print $1,$2,$3,$4,$5}' file1 file2
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
AWK comparing 2 files and retrieving lines that have a matching field tekvaio Programming 1 11-27-2012 06:55 AM
[SOLVED] awk pattern matching between huge files? rare_aquatic_badger Programming 8 05-19-2012 06:43 AM
matching and printing fields of two files ksaad Programming 3 11-07-2011 02:15 PM
[SOLVED] Comparing two fields in two files using Awk. Tauro Linux - Newbie 16 07-21-2011 12:47 AM
[SOLVED] AWK (or TCL/TK): Matching rows and columns between multiple files Euler2 Programming 6 05-30-2011 06:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration