Quote:
Originally Posted by Quon
I want to replace everything between "Word4" and "Word8"
|
I think you'll get best results with GNU awk. (Any awk variant will work, but GNU awk has the ability to retain the type of whitespace between words.)
Code:
gawk -v startword="Word4" -v endword="Word8" -v replacement="Stuff" '
BEGIN { RS="[\t\n\v\f\r ]+" ; FS=RS; RT="\n" }
($0 == startword) {
while (getline > 0)
if ($0 == endword)
break
printf("%s%s", replacement, RT)
next
}
{ printf("%s%s", $0, RT) }
' input-file > output-file
If you want to use regular expressions instead of case sensitive string comparisons, replace the two
== with
~ .
The BEGIN rule is run before any input is processed. It sets the record (and field) separator to any consecutive whitespace, including newlines. Thus, each word is its own record. For GNU awk, RT is automatically set to the text that matched RS, but other awks don't support it. Like I said, this works best with GNU awk. If you replace gawk with awk it will work with any awk variant, but all words will be split to separate lines.
In awk, $0 refers to the entire input record. Here, it is always the current word, including any punctuation. If the input record matches the start word, the while loop will read records until the end word is found. The replacement is printed, and awk is told to check the next record.
If the input record does not match the start word, the final rule prints it.
Note that the above also replaces the start and end words. If you want to keep them intact, replacing only what is between them, use
Code:
gawk -v startword="Word4" -v endword="Word8" -v replacement="Stuff " '
BEGIN { RS="[\t\n\v\f\r ]+" ; FS=RS; RT="\n" }
($0 == startword) {
printf("%s%s", $0, RT)
while (getline > 0)
if ($0 == endword)
break
printf("%s%s%s", replacement, $0, RT)
next
}
{ printf("%s%s", $0, RT) }
' input-file > output-file