help on awk print matched field

phpshell · 03-20-2013, 02:18 AM

I am really need help
I have two files

file1
frank 101 4544444 glass
fahad 102 4547977 car

file2
101 transfer 888
105 transfer 999

output
101 transfer 888

i am using
awk 'NR==FNR{a[$2];next} ($1 in a)' file1 file2

anybody help me to print as below
frank 101 transfer 888

pan64 · 03-20-2013, 02:51 AM

Seeking your initiative to follow the pattern in the script and do it yourself.

do you know how that awk script works? In that case you can find easily the way.....

awk ' NR==FNR { a[$2]=$4; next } ($1 in a) { print a[$1]$0 } ' file1 file2

phpshell · 03-20-2013, 03:01 AM

Dear pan64
many thanks for your quick respond
i just understand the pattern right now because i am new on this

pan64 · 03-20-2013, 03:48 AM

if you really want to say thanks just click on YES (bottom right corner)

danielbmartin · 03-20-2013, 07:34 AM

Quote:

Originally Posted by pan64

Code:

awk ' NR==FNR { a[$2]=$4; next } ($1 in a) { print a[$1]$0 } ' file1 file2

Using this code I obtained this unexpected result:

Code:

glass101 transfer 888

Please advise.

Daniel B. Martin

pan64 · 03-20-2013, 07:54 AM

yes, and it is your job to fix it

danielbmartin · 03-20-2013, 08:09 AM

The problem statement calls for matching two files on a key value.
The linux command join is suitable for this task.

InFile1 ...

Code:

frank 101 4544444 glass
fahad 102 4547977 car
herman 103 454212 clock
charles 107 454822 television
alfred 115 454629 radio
david 117 454133 table
george 122 454009 desk

InFile2 ...

Code:

101 transfer 888
105 transfer 999
106 sold 123
111 stolen 345
115 destroyed 234
122 missing 666

This code ...

Code:

join -1 1 -2 2  $InFile2 $InFile1 |cut -d" " -f1-3 >$OutFile

... produced this OutFile ...

Code:

101 transfer 888
115 destroyed 234
122 missing 666

The sample input files were sorted on the key value.
This proposed solution assumes this will always be the case.

Daniel B. Martin

grail · 03-20-2013, 09:19 AM

The awk solution, which obviously needs a little tweak ( I believe pan64 is hinting the direction without giving the full solution, which I like as it is clear the OP has not done enough investigation
yet), does not require the data to be sorted just that the value of the array index is unique.

danielbmartin · 03-20-2013, 10:02 AM

Quote:

Originally Posted by grail

The awk solution ... does not require the data to be sorted just that the value of the array index is unique.

The short sample input files provided by OP have the key fields in sorted order. He did not explicitly state that his files are already sorted, and we cannot know the details of his application. If those files are already sorted there may be a performance advantage in using join.

I don't know the internals of awk, but technical intuition suggests that the ($1 in a) part of

Code:

awk ' NR==FNR { a[$2]=$4; next } ($1 in a) { print a[$1]$0 } ' file1 file2

results in a serial search. If the input files are large, a serial search can be painfully slow.

Daniel B. Martin

pan64 · 03-20-2013, 10:14 AM

here you can find an interesting (but old) discussion: http://awk.info/?news/mawkHashing
and here some other info: http://www.cse.yorku.ca/~oz/hash.html

if file1 sorted there is a much better solution of course

grail · 03-20-2013, 10:46 AM

hmmm ... not sure on the serial search idea (is an indexed array using numbers a serial search when you say is N in array?), but a sorted system of course will return faster results