How do I replace the text between patterns located on separate lines? (sed, awk, etc)

Quon · 02-12-2012, 12:14 AM

If I have these words:

Word1
Word2 Word3
Word4 Word5
Word6
Word7
Word8
Word9 Word10

and I want to replace everything between "Word4" and "Word8"

Or, if I want to replace everything after "Word4" (I tried sed 's/Word4*//' but it only works for that line.)

rgdacosta · 02-12-2012, 12:21 AM

If you want your command to work on lines 3 to 6 then you can use something like this:

Code:

sed '3,6 s/Word/Number/' words

Basically, sed conducts the operation on lines 3 to 6. The operation is a substitute and replace which is written to STDOUT so no changes are made to the file.

Hope that's what you wanted.

NevemTeve · 02-12-2012, 12:29 AM

Code:

sed -e '/Word4/,/Word8/c\
New content line1\
New content line2' inputfile >outputfile

codemaniac · 02-12-2012, 02:08 AM

Quote:

sed 's/Word[4-8]//g'

This would replace Word4 , Word5, Word6,Word7, and Word8 in your sample input file .

David the H. · 02-12-2012, 02:53 AM

Sed only operates on single lines by default. You have to use addressing, nested commands, and/or multi-line processing to do anything that spans multiple lines.

Check out the sed faq here, specifically sections 4.23 and 4.24.

http://sed.sourceforge.net/sedfaq4.html#s4.23

The exact solution to use depends a lot on what you want the output to look like. Here's one possibility (basically the same as NevemTeve posted above):

Code:

sed '/Word4/,/Word8/ c Word4\nNewWord\nWord8'

This simply replaces the entire matched block with the desired text, including re-inserting the parts of the starting and stopping lines you want to keep.

Here are a few more useful sed references.
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sed1line.txt

It's often easier to use awk or another tool with better multi-line ability instead.

Nominal Animal · 02-12-2012, 06:27 AM

Quote:

Originally Posted by Quon

I want to replace everything between "Word4" and "Word8"

I think you'll get best results with GNU awk. (Any awk variant will work, but GNU awk has the ability to retain the type of whitespace between words.)

Code:

gawk -v startword="Word4" -v endword="Word8" -v replacement="Stuff" '
    BEGIN { RS="[\t\n\v\f\r ]+" ; FS=RS; RT="\n" }
    ($0 == startword) {
        while (getline > 0)
            if ($0 == endword)
                break
        printf("%s%s", replacement, RT)
        next
    }
    {   printf("%s%s", $0, RT) }
' input-file > output-file

If you want to use regular expressions instead of case sensitive string comparisons, replace the two == with ~ .

The BEGIN rule is run before any input is processed. It sets the record (and field) separator to any consecutive whitespace, including newlines. Thus, each word is its own record. For GNU awk, RT is automatically set to the text that matched RS, but other awks don't support it. Like I said, this works best with GNU awk. If you replace gawk with awk it will work with any awk variant, but all words will be split to separate lines.

In awk, $0 refers to the entire input record. Here, it is always the current word, including any punctuation. If the input record matches the start word, the while loop will read records until the end word is found. The replacement is printed, and awk is told to check the next record.

If the input record does not match the start word, the final rule prints it.

Note that the above also replaces the start and end words. If you want to keep them intact, replacing only what is between them, use

Code:

gawk -v startword="Word4" -v endword="Word8" -v replacement="Stuff " '
    BEGIN { RS="[\t\n\v\f\r ]+" ; FS=RS; RT="\n" }
    ($0 == startword) {
        printf("%s%s", $0, RT)
        while (getline > 0)
            if ($0 == endword)
                break
        printf("%s%s%s", replacement, $0, RT)
        next
    }
    {   printf("%s%s", $0, RT) }
' input-file > output-file