LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Sed / Replace multiline, multiple instances (https://www.linuxquestions.org/questions/programming-9/sed-replace-multiline-multiple-instances-783195/)

jkmaster 01-19-2010 05:13 AM

Sed / Replace multiline, multiple instances
 
Hello,

I've written a sed command to match and replace a block of five lines.
Until now, the command only replaces one instance of the five lines.

What do I need to change to replace all instances? The instances are not entirely the same, but do match the regular expressions.

Here is the command file (used with sed -n -f).
For better clarity, I do not include the actual regexes.

Code:

/regex to match the first line/ {
H
# append 2nd line
n
H
# append 3rd line
n
H
# append 4th line
n
H
# append 5th line
n
H
# get the contents of the Hold buffer, then replace
g
s/regex to match the five lines/regex to replace the five lines/
# clear the Hold buffer
x;s/.*//;x;
}
p

Using Sed GnuWin32 on Windows XP.

Thanks in advance for your help.

jk

pixellany 01-19-2010 07:19 AM

This strikes me as doing things the hard way, but it should work.

Some observations:
1. If the first movement to the hold buffer is "h" and not "H", then you don't need the code to clear the hold buffer.

2. Normally, when you combine multiple lines, you would strip out the newlines before attempting further actions.

3. I don't understand the "p" outside of the {...} construct.

Have you tried this in a script instead of having sed call a file? (I don't know if this could be relevant.)

The best was to get help on this is to post a sample of the actual file and the specific changes you want to make.

jkmaster 01-19-2010 08:39 AM

Thanks a lot for your reply.

I have used sed before for basic find-and-replace tasks, this one being more ambitious. That's why I'm probably missing out on certain skills or best practices in sed scripting.

Anyway, here's what I'm trying to do:

Starting with a file generated in a markup language (FrameMaker MIF), I'd like to
- take some string values (highlighted in red), and
- add the same values inside a different markup element to the original file (the whole additions highlighted in blue).

The original MIF file has 30000+ lines, so here's an excerpt with the relevant bits on which you can try out the script:

Code:

<MIFFile 8.00>
 ----- snip -----
  <XRef
  <XRefName `Navigation 4'>
  <XRefSrcText `CHDEEGGJ'>
  <XRefSrcIsElem Yes>
  <XRefSrcFile `<c\>test2.xml'>
  <XRefLastUpdate 1263890843 26000>
  <Unique 1007728>
  <Element
  <Unique 1007731>
  <ETag `xref'>
  <Attributes
  <Attribute
    <AttrName `IDREF'>
    <AttrValue `CHDEEGGJ'>
  > # end of Attribute
  > # end of Attributes
  <Collapsed No>
  <SpecialCase No>
  <AttributeDisplay ReqAndSpec>
  > # end of Element
  > # end of XRef
  ----- snip -----
  <XRef
  <XRefName `Heading & Page'>
  <XRefSrcText `CHDHFFJF: cname: 1.1.1.2 Sample Heading'>
  <XRefSrcIsElem Yes>
  <XRefSrcFile `<c\>test2.xml'>
  <XRefLastUpdate 1263801131 424000>
  <Unique 1011275>
  <Element
  <Unique 1011278>
  <ETag `xref'>
  <Attributes
  <Attribute
    <AttrName `IDREF'>
    <AttrValue `CHDHFFJF'>
  > # end of Attribute
  > # end of Attributes
  <Collapsed No>
  <SpecialCase No>
  <AttributeDisplay ReqAndSpec>
  > # end of Element
  > # end of XRef
  ----- snip -----
  # EOF

The intended result should look like this, however, I get only the first of the two blocks highlighted in blue:

Code:

<MIFFile 8.00>
 ----- snip -----
  <Marker
  <MType 12>
  <MTypeName `UnstructXRef'>
  <MText `;;Navigation 4;;CHDEEGGJ;;test2.xml;;' >
  > # end of Marker

  <XRef
  <XRefName `Navigation 4'>
  <XRefSrcText `CHDEEGGJ'>
  <XRefSrcIsElem Yes>
  <XRefSrcFile `<c\>test2.xml'>
  <XRefLastUpdate 1263890843 26000>
  <Unique 1007728>
  <Element
  <Unique 1007731>
  <ETag `xref'>
  <Attributes
  <Attribute
    <AttrName `IDREF'>
    <AttrValue `CHDEEGGJ'>
  > # end of Attribute
  > # end of Attributes
  <Collapsed No>
  <SpecialCase No>
  <AttributeDisplay ReqAndSpec>
  > # end of Element
  > # end of XRef
  ----- snip -----
  <Marker
  <MType 12>
  <MTypeName `UnstructXRef'>
  <MText `;;Heading & Page;;CHDHFFJF: cname: 1.1.1.2 Sample Heading;;test2.xml;;' >
  > # end of Marker

  <XRef
  <XRefName `Heading & Page'>
  <XRefSrcText `CHDHFFJF: cname: 1.1.1.2 Sample Heading'>
  <XRefSrcIsElem Yes>
  <XRefSrcFile `<c\>test2.xml'>
  <XRefLastUpdate 1263801131 424000>
  <Unique 1011275>
  <Element
  <Unique 1011278>
  <ETag `xref'>
  <Attributes
  <Attribute
    <AttrName `IDREF'>
    <AttrValue `CHDHFFJF'>
  > # end of Attribute
  > # end of Attributes
  <Collapsed No>
  <SpecialCase No>
  <AttributeDisplay ReqAndSpec>
  > # end of Element
  > # end of XRef
  ----- snip -----

Finally, here is the original sed script from my input file:

Code:

/<XRef\s$/ {
h
n
# 2
H
n
# 3
H
n
# 4
H
n
# 5
H
g
s/\(.*\s<XRef.*\s<XRefName\s`\([A-z0-9 ]*\)'>.*\s<XRefSrcText\s`\([A-z0-9 ]*\)'>.*\s<XRefSrcFile\s`\([A-z0-9<>\\ ]*\)'>\)/<Marker\n<MType 12>\n<MTypeName `UnstructXRef'>\n<MText ;;\2;;\3;;\4;; >\n> \# end of Marker\n\1/
}
p

Leading whitespace is not critical in the output.

I find it convenient using an input file but would be just as happy with a working command line script. I've already changed the first H to lowercase and removed removed the line for cleaing the hold buffer.

Thanks again for listening. Maybe you could give me some more hints how to proceed?


jk

pixellany 01-19-2010 10:43 AM

Why not do the changes line by line?

eg:

sed -e 's/old1/new1/' \ #1st line
-e 's/old2/new2/' \ #2nd line
-e 's/old3/new3/'
.
.
.
etc.

ghostdog74 01-19-2010 09:57 PM

Code:


awk '
$1=="<XRef" {
    o=$0
    for(i=1;i<=4;i++){
        getline
        if(i==3) continue
        gsub(/.* \047|\047>|.*<c\\>/,"")
        s=s $0";;"
    }
    string=s"\047 >"
    printf "<Marker\n<MType 12>\n<MTypeName \047UnstructXRef\047\n"
    print "<MText \047" string
    print " > # end of Marker "
    print o;next
    s=""
}1 ' file


jkmaster 01-20-2010 02:53 AM

Thanks very much for your replies.

Pixellany,
I could assemble the 'blue' blocks using line-by-line operations, but is it possible at the same time to keep the original lines as consecutive lines? How could I proceed?


ghostdog74,
As I have no experience with awk so far, I just tried out your sample script without really understanding what it does in particular. The output get a bit mixed up (the original lines do not stay in place, and the substrings extracted from the 1st instance are inserted in the "MText" line for the 2nd instance), but I'll try to give it a shot.

jk

ghostdog74 01-20-2010 03:54 AM

Quote:

Originally Posted by jkmaster (Post 3833172)
ghostdog74,
As I have no experience with awk so far, I just tried out your sample script without really understanding what it does in particular. The output get a bit mixed up (the original lines do not stay in place, and the substrings extracted from the 1st instance are inserted in the "MText" line for the 2nd instance), but I'll try to give it a shot.

jk

you have backticks in your file, like
Code:

<XRefName `Navigation 4'>
is that correct? or should it really be single quote. I had changed all backticks to single quote for my testing. Therefore if you don't get the correct results , most probably is the backticks.

jkmaster 01-21-2010 05:56 AM

Quote:

Originally Posted by ghostdog74 (Post 3833225)
you have backticks in your file, like
Code:

<XRefName `Navigation 4'>
is that correct?

Yes, these are to be required by the file format. I tried the awk script with a changed sample and it looks better. Still, the original lines from which the substrings are extracted must be retained.

Right at the moment I'm busy with something else, but I'll post again how far I got with sed or awk.

jk

jkmaster 01-28-2010 09:00 AM

Just wanted to give an update, as I just got it working after staring at the sed command very intensely for a few minutes ...
My multiline script was not the wrong at all, except that the search regex didn't work for all instances :doh:.
It works like a charm when I change the following --

Code:

[A-z0-9 ]  <= old search regex
[^\n\r']    <= improved search regex

So this is the sed input file in its entirety:
Code:

/<XRef\s$/ {
h
n
# 2
H
n
# 3
H
n
# 4
H
n
# 5
H
g
s/\(.*\s<XRef.*\s<XRefName\s`\([^\r\n']*\)'>.*\s<XRefSrcText\s`\([^\r\n']*\)'>.*\s<XRefSrcFile\s`\([^\r\n']*\)'>\)/<Marker\n<MType 12>\n<MTypeName `UnstructXRef'>\n<MText ;;\2;;\3;;\4;; >\n> \# end of Marker\n\1/
}
p

Thanks again for your suggestions.

jk


All times are GMT -5. The time now is 08:52 PM.