LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-18-2012, 11:16 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Replace words and phrases in text


Have: InFile2, a file of text which requires word (or phrase) substitutions.
Code:
Once upon a midnight dreary, while I pondered weak and weary,
Over many a quaint and curious volume of forgotten lore,
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
''Tis some visitor,' I muttered, 'tapping at my chamber door -
Only this, and nothing more.'

Ah, distinctly I remember it was in the bleak December,
And each separate dying ember wrought its ghost upon the floor.
Eagerly I wished the morrow; - vainly I had sought to borrow
From my books surcease of sorrow - sorrow for the lost Lenore -
For the rare and radiant maiden whom the angels named Lenore -
Nameless here for evermore.
Have: InFile1, a file of word (or phrase) substitutions such as:
Code:
door,window
rapping,knocking
Lenore,Annie
bleak December,frigid January
Want:
Code:
Read the text file and make substitutions.
Wherever "door" appears, change it to "window".
Wherever "rapping" appears, change it to "knocking".
Wherever "bleak December" appears, change it to "frigid January".
Etc.
As a learning exercise I coded a solution using sed.
Code:
sed -r 's|(^.*),(.*)|s/\1\/\2/g|' $InFile1 \
|sed -f - $InFile2 > $OutFile
This works.

Now, to continue the learning exercise, I attempted to code a solution using awk but I'm stumped. I'm struggling with variations on this theme;
Code:
awk -F "," 'NR==FNR{A[$1]=$2;next} FNR<NR  \
  <something goes here>' $InFile1 $InFile2
What "goes here"? gensub?
awk experts, please advise.

Daniel B. Martin
 
Old 10-18-2012, 12:39 PM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Code:
awk -F, 'FNR==NR{c[$1]=$2;next}{for(i in c)gsub(i,c[i])}1' InFile1 InFile2
 
1 members found this post helpful.
Old 10-18-2012, 12:58 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Code:
awk -F, 'FNR==NR{c[$1]=$2;next}{for(i in c)gsub(i,c[i])}1' InFile1 InFile2
Beautiful! Thank you! This thread is SOLVED!

Daniel B. Martin
 
Old 10-18-2012, 04:14 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,784

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
You could use the compiler strategy (as opposed to the interpreter strategy) in awk too:
Code:
awk -F, 'BEGIN{printf"{"} {printf("gsub(\"%s\", \"%s\");", $1, $2)} END{print"print}"}' replacements.txt \
| awk -f - the-raven.txt
I think the interpreter strategy would be much more difficult in sed though.
 
1 members found this post helpful.
Old 10-18-2012, 06:29 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by ntubski View Post
Code:
awk -F, 'BEGIN{printf"{"} {printf("gsub(\"%s\", \"%s\");", $1, $2)} END{print"print}"}' replacements.txt \
| awk -f - the-raven.txt
Nice!

In order to better understand your technique I inserted a tee to make the code "spit out" the intermediate file which works the magic.
Code:
echo "Method of LQ Senior Member ntubski, using awk"
awk -F,  \
  'BEGIN{printf"{"}                           \
   {printf("gsub(\"%s\", \"%s\");", $1, $2)}  \
   END{print"print}"}' $InFile1               \
|tee $Work01                                  \
|awk -f - $InFile2
The work file contains this:
Code:
{gsub("door","window");gsub("rapping","knocking");gsub("Lenore","Annie");gsub("bleak December","frigid January");print}
Daniel B. Martin
 
Old 10-18-2012, 09:36 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by ntubski View Post
I think the interpreter strategy would be much more difficult in sed though.
The the first post in this thread gave a sed solution which is comparable to your awk. Here I have added a tee so we may see the intermediate file.
Code:
sed -r 's|(^.*),(.*)|s/\1\/\2/g|' $InFile1 \
|tee $Work02                               \
|sed -f - $InFile2 > $OutFile
The work file contains this:
Code:
s/door/window/g
s/rapping/knocking/g
s/Lenore/Annie/g
s/bleak December/frigid January/g
Daniel B. Martin

Last edited by danielbmartin; 10-18-2012 at 09:36 PM. Reason: Cosmetic improvements
 
Old 10-20-2012, 11:40 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Don't forget that you can use bash's process substitution too, to clean up the command a bit:

Code:
sed -f <( sed -r 's|(.*),(.*)|s/\1\/\2/|' "$InFile1" ) "$InFile2" > "$OutFile"
Also never forget to do proper quoting, to be safe.

'g' isn't needed either, since the expression is applied only once, nor is the initial '^' anchor.

In fact, here's a variation that replaces the complex regex with two simple substitutions:
Code:
sed 's|,|/| ; s|.*|s/&/|'
 
Old 10-20-2012, 11:47 AM   #8
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,784

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by David the H. View Post
'g' isn't needed either, since the expression is applied only once
Watch out, the g is for the generated sed program, not the compiler.
 
Old 10-20-2012, 11:59 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by David the H. View Post
'g' isn't needed either...
Without the 'g', the input line ...
Code:
As of some one gently rapping, rapping at my chamber door.
... is transformed to ...
Code:
As of some one gently knocking, rapping at my chamber window.
The desired transformation is ...
Code:
As of some one gently knocking, knocking at my chamber window.
Daniel B. Martin
 
Old 10-20-2012, 02:02 PM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Ah yes. Thanks for catching that. It's hard to keep track of what's what when you're using code to generate code.
 
  


Reply

Tags
awk, text processing



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how do you replace text between two words in a whole file not just 1 line w/ sed/awk lityit Programming 5 11-04-2011 12:04 AM
[SOLVED] how can vim replace all the same words in a certain line. e3399 Linux - Newbie 3 12-02-2010 11:39 AM
LXer: The Jargon of Freedom: 60 Words and Phrases with Context LXer Syndicated Linux News 0 08-01-2010 10:11 PM
Find several different words and replace with one using sed. Techno Guy Linux - Newbie 18 07-06-2009 07:16 AM
Search and Replace: Asian Words to English Words ieeestd802 Linux - Software 0 10-27-2004 07:48 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration