LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-18-2012, 11:16 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
Replace words and phrases in text


Have: InFile2, a file of text which requires word (or phrase) substitutions.
Code:
Once upon a midnight dreary, while I pondered weak and weary,
Over many a quaint and curious volume of forgotten lore,
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
''Tis some visitor,' I muttered, 'tapping at my chamber door -
Only this, and nothing more.'

Ah, distinctly I remember it was in the bleak December,
And each separate dying ember wrought its ghost upon the floor.
Eagerly I wished the morrow; - vainly I had sought to borrow
From my books surcease of sorrow - sorrow for the lost Lenore -
For the rare and radiant maiden whom the angels named Lenore -
Nameless here for evermore.
Have: InFile1, a file of word (or phrase) substitutions such as:
Code:
door,window
rapping,knocking
Lenore,Annie
bleak December,frigid January
Want:
Code:
Read the text file and make substitutions.
Wherever "door" appears, change it to "window".
Wherever "rapping" appears, change it to "knocking".
Wherever "bleak December" appears, change it to "frigid January".
Etc.
As a learning exercise I coded a solution using sed.
Code:
sed -r 's|(^.*),(.*)|s/\1\/\2/g|' $InFile1 \
|sed -f - $InFile2 > $OutFile
This works.

Now, to continue the learning exercise, I attempted to code a solution using awk but I'm stumped. I'm struggling with variations on this theme;
Code:
awk -F "," 'NR==FNR{A[$1]=$2;next} FNR<NR  \
  <something goes here>' $InFile1 $InFile2
What "goes here"? gensub?
awk experts, please advise.

Daniel B. Martin
 
Old 10-18-2012, 12:39 PM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,439

Rep: Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879
Code:
awk -F, 'FNR==NR{c[$1]=$2;next}{for(i in c)gsub(i,c[i])}1' InFile1 InFile2
 
1 members found this post helpful.
Old 10-18-2012, 12:58 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Original Poster
Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by grail View Post
Code:
awk -F, 'FNR==NR{c[$1]=$2;next}{for(i in c)gsub(i,c[i])}1' InFile1 InFile2
Beautiful! Thank you! This thread is SOLVED!

Daniel B. Martin
 
Old 10-18-2012, 04:14 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,396

Rep: Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814
You could use the compiler strategy (as opposed to the interpreter strategy) in awk too:
Code:
awk -F, 'BEGIN{printf"{"} {printf("gsub(\"%s\", \"%s\");", $1, $2)} END{print"print}"}' replacements.txt \
| awk -f - the-raven.txt
I think the interpreter strategy would be much more difficult in sed though.
 
1 members found this post helpful.
Old 10-18-2012, 06:29 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Original Poster
Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by ntubski View Post
Code:
awk -F, 'BEGIN{printf"{"} {printf("gsub(\"%s\", \"%s\");", $1, $2)} END{print"print}"}' replacements.txt \
| awk -f - the-raven.txt
Nice!

In order to better understand your technique I inserted a tee to make the code "spit out" the intermediate file which works the magic.
Code:
echo "Method of LQ Senior Member ntubski, using awk"
awk -F,  \
  'BEGIN{printf"{"}                           \
   {printf("gsub(\"%s\", \"%s\");", $1, $2)}  \
   END{print"print}"}' $InFile1               \
|tee $Work01                                  \
|awk -f - $InFile2
The work file contains this:
Code:
{gsub("door","window");gsub("rapping","knocking");gsub("Lenore","Annie");gsub("bleak December","frigid January");print}
Daniel B. Martin
 
Old 10-18-2012, 09:36 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Original Poster
Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by ntubski View Post
I think the interpreter strategy would be much more difficult in sed though.
The the first post in this thread gave a sed solution which is comparable to your awk. Here I have added a tee so we may see the intermediate file.
Code:
sed -r 's|(^.*),(.*)|s/\1\/\2/g|' $InFile1 \
|tee $Work02                               \
|sed -f - $InFile2 > $OutFile
The work file contains this:
Code:
s/door/window/g
s/rapping/knocking/g
s/Lenore/Annie/g
s/bleak December/frigid January/g
Daniel B. Martin

Last edited by danielbmartin; 10-18-2012 at 09:36 PM. Reason: Cosmetic improvements
 
Old 10-20-2012, 11:40 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Don't forget that you can use bash's process substitution too, to clean up the command a bit:

Code:
sed -f <( sed -r 's|(.*),(.*)|s/\1\/\2/|' "$InFile1" ) "$InFile2" > "$OutFile"
Also never forget to do proper quoting, to be safe.

'g' isn't needed either, since the expression is applied only once, nor is the initial '^' anchor.

In fact, here's a variation that replaces the complex regex with two simple substitutions:
Code:
sed 's|,|/| ; s|.*|s/&/|'
 
Old 10-20-2012, 11:47 AM   #8
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,396

Rep: Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814
Quote:
Originally Posted by David the H. View Post
'g' isn't needed either, since the expression is applied only once
Watch out, the g is for the generated sed program, not the compiler.
 
Old 10-20-2012, 11:59 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Original Poster
Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by David the H. View Post
'g' isn't needed either...
Without the 'g', the input line ...
Code:
As of some one gently rapping, rapping at my chamber door.
... is transformed to ...
Code:
As of some one gently knocking, rapping at my chamber window.
The desired transformation is ...
Code:
As of some one gently knocking, knocking at my chamber window.
Daniel B. Martin
 
Old 10-20-2012, 02:02 PM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Ah yes. Thanks for catching that. It's hard to keep track of what's what when you're using code to generate code.
 
  


Reply

Tags
awk, text processing


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how do you replace text between two words in a whole file not just 1 line w/ sed/awk lityit Programming 5 11-04-2011 12:04 AM
[SOLVED] how can vim replace all the same words in a certain line. e3399 Linux - Newbie 3 12-02-2010 11:39 AM
LXer: The Jargon of Freedom: 60 Words and Phrases with Context LXer Syndicated Linux News 0 08-01-2010 10:11 PM
Find several different words and replace with one using sed. Techno Guy Linux - Newbie 18 07-06-2009 07:16 AM
Search and Replace: Asian Words to English Words ieeestd802 Linux - Software 0 10-27-2004 07:48 PM


All times are GMT -5. The time now is 02:11 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration