LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Replace words and phrases in text (https://www.linuxquestions.org/questions/programming-9/replace-words-and-phrases-in-text-4175432916/)

danielbmartin 10-18-2012 11:16 AM

Replace words and phrases in text
 
Have: InFile2, a file of text which requires word (or phrase) substitutions.
Code:

Once upon a midnight dreary, while I pondered weak and weary,
Over many a quaint and curious volume of forgotten lore,
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
''Tis some visitor,' I muttered, 'tapping at my chamber door -
Only this, and nothing more.'

Ah, distinctly I remember it was in the bleak December,
And each separate dying ember wrought its ghost upon the floor.
Eagerly I wished the morrow; - vainly I had sought to borrow
From my books surcease of sorrow - sorrow for the lost Lenore -
For the rare and radiant maiden whom the angels named Lenore -
Nameless here for evermore.

Have: InFile1, a file of word (or phrase) substitutions such as:
Code:

door,window
rapping,knocking
Lenore,Annie
bleak December,frigid January

Want:
Code:

Read the text file and make substitutions.
Wherever "door" appears, change it to "window".
Wherever "rapping" appears, change it to "knocking".
Wherever "bleak December" appears, change it to "frigid January".
Etc.

As a learning exercise I coded a solution using sed.
Code:

sed -r 's|(^.*),(.*)|s/\1\/\2/g|' $InFile1 \
|sed -f - $InFile2 > $OutFile

This works.

Now, to continue the learning exercise, I attempted to code a solution using awk but I'm stumped. I'm struggling with variations on this theme;
Code:

awk -F "," 'NR==FNR{A[$1]=$2;next} FNR<NR  \
  <something goes here>' $InFile1 $InFile2

What "goes here"? gensub?
awk experts, please advise.

Daniel B. Martin

grail 10-18-2012 12:39 PM

Code:

awk -F, 'FNR==NR{c[$1]=$2;next}{for(i in c)gsub(i,c[i])}1' InFile1 InFile2

danielbmartin 10-18-2012 12:58 PM

Quote:

Originally Posted by grail (Post 4809289)
Code:

awk -F, 'FNR==NR{c[$1]=$2;next}{for(i in c)gsub(i,c[i])}1' InFile1 InFile2

Beautiful! Thank you! This thread is SOLVED!

Daniel B. Martin

ntubski 10-18-2012 04:14 PM

You could use the compiler strategy (as opposed to the interpreter strategy) in awk too:
Code:

awk -F, 'BEGIN{printf"{"} {printf("gsub(\"%s\", \"%s\");", $1, $2)} END{print"print}"}' replacements.txt \
| awk -f - the-raven.txt

I think the interpreter strategy would be much more difficult in sed though.

danielbmartin 10-18-2012 06:29 PM

Quote:

Originally Posted by ntubski (Post 4809433)
Code:

awk -F, 'BEGIN{printf"{"} {printf("gsub(\"%s\", \"%s\");", $1, $2)} END{print"print}"}' replacements.txt \
| awk -f - the-raven.txt


Nice!

In order to better understand your technique I inserted a tee to make the code "spit out" the intermediate file which works the magic.
Code:

echo "Method of LQ Senior Member ntubski, using awk"
awk -F,  \
  'BEGIN{printf"{"}                          \
  {printf("gsub(\"%s\", \"%s\");", $1, $2)}  \
  END{print"print}"}' $InFile1              \
|tee $Work01                                  \
|awk -f - $InFile2

The work file contains this:
Code:

{gsub("door","window");gsub("rapping","knocking");gsub("Lenore","Annie");gsub("bleak December","frigid January");print}
Daniel B. Martin

danielbmartin 10-18-2012 09:36 PM

Quote:

Originally Posted by ntubski (Post 4809433)
I think the interpreter strategy would be much more difficult in sed though.

The the first post in this thread gave a sed solution which is comparable to your awk. Here I have added a tee so we may see the intermediate file.
Code:

sed -r 's|(^.*),(.*)|s/\1\/\2/g|' $InFile1 \
|tee $Work02                              \
|sed -f - $InFile2 > $OutFile

The work file contains this:
Code:

s/door/window/g
s/rapping/knocking/g
s/Lenore/Annie/g
s/bleak December/frigid January/g

Daniel B. Martin

David the H. 10-20-2012 11:40 AM

Don't forget that you can use bash's process substitution too, to clean up the command a bit:

Code:

sed -f <( sed -r 's|(.*),(.*)|s/\1\/\2/|' "$InFile1" ) "$InFile2" > "$OutFile"
Also never forget to do proper quoting, to be safe.

'g' isn't needed either, since the expression is applied only once, nor is the initial '^' anchor.

In fact, here's a variation that replaces the complex regex with two simple substitutions:
Code:

sed 's|,|/| ; s|.*|s/&/|'

ntubski 10-20-2012 11:47 AM

Quote:

Originally Posted by David the H. (Post 4810822)
'g' isn't needed either, since the expression is applied only once

Watch out, the g is for the generated sed program, not the compiler.

danielbmartin 10-20-2012 11:59 AM

Quote:

Originally Posted by David the H. (Post 4810822)
'g' isn't needed either...

Without the 'g', the input line ...
Code:

As of some one gently rapping, rapping at my chamber door.
... is transformed to ...
Code:

As of some one gently knocking, rapping at my chamber window.
The desired transformation is ...
Code:

As of some one gently knocking, knocking at my chamber window.
Daniel B. Martin

David the H. 10-20-2012 02:02 PM

Ah yes. Thanks for catching that. It's hard to keep track of what's what when you're using code to generate code.


All times are GMT -5. The time now is 02:00 PM.