Comparing two fields in two files using Awk.
I have a file1:
Code:
$ cat PF(1).out Code:
$ cat Unipfam(1) Code:
Tmp39 PF10271.3 423 ENSP00000326063 488 1.2e-201 41-478 D3DN80_HUMAN Code:
awk 'NR==FNR{a[$3]=$1;next}a[$7]{print $0 "\t" a[$7]}' file2 file1 >outfile wht i see the code does is.. Quote:
Thanks. :) |
I am not sure I understand. When running the awk code on the examples you have shown I get only one line returned, which seems to be correct.
Maybe you could explain further what the issue is? |
Code:
$ cat tauro.awk Code:
awk -f tauro.awk pf Cheers, Tink |
Thanks a lot Tink :):)
That was great.. it was just a simple getline, n i was trying to improve the code for the past 8 hours..Jeez ! @grail When I have a list(a) of files to be matched with files in list(b) Say I have 4 files in each list. Quote:
2 is matched with B --> if match then print.. But its Blank, coz there isnt any match in both files again.. 2 is matched with C ! 3 is matched with D ! Line specificity is lost. Anyway, I got the result. Thanks tink again |
Glad I could help ;D
|
Tink.. 1 more thing
You see this line in file2 : #=GS D3DN80_HUMAN 40-478 AC D3DN80.1 and this line in file 1: Tmp39 PF10271.3 423 ENSP00000326063 488 1.2e-201 41-478 40-478...and 41-478 is the same. However i tried removing the field separators and matching the lines containing similar fields. Face some weird erroneous results. Your take on this ? |
When you say "Is the same" ... do you mean they should be treated
as if they were the same, or they are being treated the same and shouldn't? A larger amount of actual data would help :} |
I mean I want them to be treated the same.
File1: Code:
Code:
|
So it only depends on the bit after the hyphen whether you have a match or not?
|
Bit before the hyphen too. Atleast one bit before or after the hyphen should match.
|
Icky, icky, icky ...
:D Can you be more specific? Is it enough if the first digit of the left bit matches? Any bit matches the equivalent in the same position? Cheers, Tink |
Alright, as an example
File1 Code:
2-oxoacid_dh PF00198.17 231 ENSP00000445698 301 3.7e-85 69-298 Code:
#=GS B4E1Q7_HUMAN 67-298 AC B4E1Q7.1 Now there may be cases when Code:
2-oxo*** PF****.17 231 ENSP******* 301 3.7e-85 67-295 Code:
#=GS B****_HUMAN 67-298 AC B4E1Q7.1 I need output for both the cases as: Code:
2-oxoacid_dh PF00198.17 231 ENSP00000445698 301 3.7e-85 69-298 B4E1Q7_HUMAN |
I'm really trying to understand :}
Does this come close to what you're after? Code:
BEGIN{ |
Quote:
Also no real need to split FS with OR either: Code:
BEGIN{ NR and FNR comparison to read into arrays. |
I get a single result with this code.
When field separator "-" is replaced with " ", the no. fields increase and they are different in different files. How ? Here it is: Quote:
Quote:
I think I will make a field separator unique to $7 so that the hyphen thing wont be a problem. I'll be back with the final code. Thanks. |
Easy enough:
Code:
$10 -> $NF |
Thanks a lot tink and grail.
:) |
All times are GMT -5. The time now is 06:53 AM. |