LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-16-2012, 05:04 PM   #1
maudgalyan
LQ Newbie
 
Registered: Oct 2012
Posts: 3

Rep: Reputation: 0
Using an awk array to search and replace


Hello,
I have a file where each line contains two entries per line. The first is the text I want to use to search another file and the second is what should be used to replace the text if it is found.
Here's an example:
Code:
uc007afh.1,Lypla1
uc007afg.1,Lypla1
uc007afi.2,Tcea1
uc011wht.1,Tcea1
uc011whu.1,Tcea1
uc007afn.1,Atp6v1h
uc007afm.1,Atp6v1h
uc007afo.1,Oprk1
uc007afp.1,Oprk1
uc007afq.1,Oprk1
Here is the file to be searched:
Code:
chr1    3195984 3205713 uc007aet.1      0       -       3195984 3195984 0       2       1414,2194,      0,7535,
chr1    3204562 3661579 uc007aeu.1      0       -       3206102 3661429 0       3       2487,200,947,   0,207220,456070,
chr1    3638391 3648985 uc007aev.1      0       -       3638391 3638391 0       2       2199,58,        0,10536,
chr1    4280926 4399322 uc007aew.1      0       -       4283061 4399268 0       4       2167,172,636,72,        0,61064,61356,118324,
chr1    4333587 4350395 uc007aex.2      0       -       4334680 4342906 0       4       6585,172,636,115,       0,8403,8695,16693,
chr1    4481008 4483816 uc007aey.1      0       -       4481796 4483487 0       2       1741,636,       0,2172,
chr1    4481008 4486494 uc007aez.1      0       -       4481796 4483487 0       5       1741,367,92,807,123,    0,2172,2844,4208,5363,
chr1    4481008 4486494 uc007afa.1      0       -       4481796 4485236 0       4       1741,92,807,123,        0,2844,4208,5363,
chr1    4481008 4486494 uc007afb.1      0       -       4481796 4482672 0       3       1741,92,123,    0,2844,5363,
chr1    4481008 4486494 uc007afc.1      0       -       4481796 4483487 0       4       1741,391,92,123,        0,2172,2844,5363,
I am using the following awk script to load the first file into an array to be used to search the second file:

Code:
awk 'NR==FNR{A[$1]=$2;next} $4 in A{$4=A[$2]}1' FS=, search.csv FS="\t" OFS= "\t" input.txt > output.txt
However, the output simply removes the field rather than replace it:
Code:
chr1    3195984 3205713         0       -       3195984 3195984 0       2       1414,2194,      0,7535,
chr1    3204562 3661579         0       -       3206102 3661429 0       3       2487,200,947,   0,207220,456070,
chr1    3638391 3648985         0       -       3638391 3638391 0       2       2199,58,        0,10536,
chr1    4280926 4399322         0       -       4283061 4399268 0       4       2167,172,636,72,        0,61064,61356,118324,
chr1    4333587 4350395         0       -       4334680 4342906 0       4       6585,172,636,115,       0,8403,8695,16693,
chr1    4481008 4483816         0       -       4481796 4483487 0       2       1741,636,       0,2172,
chr1    4481008 4486494         0       -       4481796 4483487 0       5       1741,367,92,807,123,    0,2172,2844,4208,5363,
chr1    4481008 4486494         0       -       4481796 4485236 0       4       1741,92,807,123,        0,2844,4208,5363,
chr1    4481008 4486494         0       -       4481796 4482672 0       3       1741,92,123,    0,2844,5363,
chr1    4481008 4486494         0       -       4481796 4483487 0       4       1741,391,92,123,        0,2172,2844,5363,
What am I missing here?

Thanks!
 
Old 10-17-2012, 01:58 AM   #2
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,387

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Code:
awk 'NR==FNR{A[$1]=$2;next} $4 in A{$4=A[$2]}1' ...
# surely you meant
awk 'NR==FNR{A[$1]=$2;next} $4 in A{$4=A[$4]}1' ...
 
1 members found this post helpful.
Old 10-17-2012, 08:13 AM   #3
maudgalyan
LQ Newbie
 
Registered: Oct 2012
Posts: 3

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by ntubski View Post
Code:
awk 'NR==FNR{A[$1]=$2;next} $4 in A{$4=A[$2]}1' ...
# surely you meant
awk 'NR==FNR{A[$1]=$2;next} $4 in A{$4=A[$4]}1' ...
Thank you.
That worked, but it's left me a bit confused. I thought I should use A[$2] to use the second column of the array as the replacement value. What exactly is A[$4] pointing to?
 
Old 10-17-2012, 09:18 AM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,387

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Quote:
Originally Posted by maudgalyan View Post
I thought I should use A[$2] to use the second column of the array as the replacement value. What exactly is A[$4] pointing to?
Arrays don't have columns, they have keys that map to values.
Code:
# I'm using "=>" to mean "evaluates to"

# after loading search.csv into A:
A["uc007afh.1"] => "Lypla1"
A["uc007afg.1"] => "Lypla1"
A["uc007afi.2"] => "Tcea1"
...

# when you are going over input.txt:
# $4 is the 4th column of input.txt, suppose $4 = "uc007afi.2"
A[$4] => A["uc007afi.2"] => "Lypla1"
 
2 members found this post helpful.
Old 10-17-2012, 09:34 AM   #5
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 9,078
Blog Entries: 4

Rep: Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170Reputation: 3170
"Arrays" in awk are what other languages call hashes. They are content-addressable.
 
1 members found this post helpful.
Old 10-17-2012, 11:26 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,627

Rep: Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943
hmmmm ... must just be me as everyone else seems to be ok, but I am curious how you got any output at all using your version of setting multiple values for FS?
Here's what I get (using the correction of $4 in array lookup):
Code:
$ awk 'NR==FNR{A[$1]=$2;next} $4 in A{$4=A[$4]}1' FS=, search.csv FS="\t" OFS= "\t" input.txt
awk: fatal: cannot open file `\t' for reading (No such file or directory)
$ awk --version
GNU Awk 4.0.1
The above is what I expected, but I am curious why it affects no one else?
 
Old 10-17-2012, 11:37 AM   #7
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,387

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Quote:
Originally Posted by grail View Post
The above is what I expected, but I am curious why it affects no one else?
I admit, I didn't actually test anything. But the multiple settings of FS should be fine:
Quote:
Originally Posted by David the H. View Post
another option is the little-known feature in awk that lets you set up variables as arguments after the expression, and before the filename(s) that they apply to.
There is an extra space between OFS and its value, though.
 
Old 10-17-2012, 01:06 PM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,627

Rep: Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943
Well I stand corrected and learn something new I can't seem to find supporting literature in the manual but testing has revealed that if each variable is set prior to the filename,
the new values are used for the subsequent file(s).
 
Old 10-17-2012, 01:19 PM   #9
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,387

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Quote:
Originally Posted by grail View Post
I can't seem to find supporting literature in the manual
GNU Awk Manual: 6.1.3.2 Assigning Variables on the Command Line
 
2 members found this post helpful.
Old 10-17-2012, 01:27 PM   #10
maudgalyan
LQ Newbie
 
Registered: Oct 2012
Posts: 3

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by ntubski View Post
Arrays don't have columns, they have keys that map to values.
Code:
# I'm using "=>" to mean "evaluates to"

# after loading search.csv into A:
A["uc007afh.1"] => "Lypla1"
A["uc007afg.1"] => "Lypla1"
A["uc007afi.2"] => "Tcea1"
...

# when you are going over input.txt:
# $4 is the 4th column of input.txt, suppose $4 = "uc007afi.2"
A[$4] => A["uc007afi.2"] => "Lypla1"
Ah... now I understand. I was confused with the way the original assignment of the array was coded, but now it's clear. Thanks!
 
Old 10-17-2012, 01:39 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,627

Rep: Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943Reputation: 2943
Thanks ntubski ... I was looking in the FS sections
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Pass search results to awk, and use awk output to search other files bspears1 Linux - Newbie 8 07-21-2012 10:17 AM
[SOLVED] How to get search results to array by awk webhope Programming 5 05-05-2010 12:59 PM
bash: use file as input into array, parse out other variables from array using awk beeblequix Linux - General 2 11-20-2009 11:07 AM
Awk- search & replace text viceroy Linux - Newbie 7 07-22-2007 11:18 AM
problem in perl replace command with slash (/) in search/replace string ramesh_ps1 Red Hat 4 09-10-2003 02:04 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration