LinuxQuestions.org - [SOLVED] How to replace newline pattern in file by other newline pattern in a shell script

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - How to replace newline pattern in file by other newline pattern in a shell script (https://www.linuxquestions.org/questions/linux-software-2/how-to-replace-newline-pattern-in-file-by-other-newline-pattern-in-a-shell-script-845804/)

How to replace newline pattern in file by other newline pattern in a shell script

I have several (vhdl) files containing a pattern with newline characters that I need to replace by another pattern that also contains newline characters.

I start with something like:

Code:

    port (

      A : in std_logic_vector(width - 1 downto 0);

      Z : out std_logic;

    );

I want to replace it by something like:

Code:

    port (

      A : in std_logic_vector(width - 1 downto 0);

      B : in std_logig;

      Z : out std_logic;

    );

(I need to paste some lines)

As I need to do this (very) often I want to use a shell script.

I tried:

1.

Code:

#!/bin/sh

sed -i 's/      A : in std_logic_vector(width - 1 downto 0);

      Z : out std_logic;/    A : in std_logic_vector(width - 1 downto 0);

      B : in std_logig;

      Z : out std_logic;/g' ./testfile

result:

Code:

sed: -e expression #1, char 5: unterminated `s' command

Code:

#!/bin/sh

sed -i 's/      A : in std_logic_vector(width - 1 downto 0);\n      Z : out std_logic;/    A : in std_logic_vector(width - 1 downto 0);\n      B : in std_logig;\n      Z : out std_logic;/g' ./testfile

result:
File remains unchanged

3.

Code:

#!/bin/sh

awk '{sub(/      A : in std_logic_vector(width - 1 downto 0);

      Z : out std_logic;/,"    A : in std_logic_vector(width - 1 downto 0);

      B : in std_logig;

      Z : out std_logic;")}1' ./testfile

result:

Code:

unexpected newline or end of string

Code:

#!/bin/sh

awk '{sub(/      A : in std_logic_vector(width - 1 downto 0);\n      Z : out std_logic;/,"    A : in std_logic_vector(width - 1 downto 0);\n      B : in std_logig;\n      Z : out std_logic;")}1' ./testfile

result:
Displays the unchanged testfile

Are there any suggestions how I can automate the pattern replacement?

Code:

$ sed --version

GNU sed version 4.1.5

A multi-line pattern is recognized only if you store all the lines into the pattern space. Using the following sed command, basically you store all the lines into the pattern space separated by newline, perform the substitution and finally print the result:

Code:

sed -n '1h; 1!H; g; s/      A : in std_logic_vector(width - 1 downto 0);\n      Z : out std_logic;/      A : in std_logic_vector(width - 1 downto 0);\n      B : in std_logig;\n      Z : out std_logic;/g; $p' testfile

Problems may arise if the file is very large, since all the content will be stored into the pattern space. Moreover, this one-liner is a monster, due to the length of the patterns. Maybe you can change the logic, for example by adding text after the line containing 'A :' but it depends on the actual task to achieve.

Wow, thank you. It seems to work. Until now I only tested a small file. The final file really is quite big (more than 50,000 lines). I will see how it works for it.

I only have one remark for others following my attempt. To perform the action in place (i.e. directly inside the file) you have to add 'i' as parameter:

Code:

sed -ni '1h; 1!H; g; s/foo\nbar/foo\nfoobar\nfoo/g; $p' ./testfile

This will turn the content of ./testfile

Code:

foo

bar

into

Code:

foo

foobar

bar

Just to give an update:

I applied the method to the 50,000 lines file. It took about 15 minutes on a Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz but at least it works.

The sed code is slow because it applies a 's' command at each cycle and so uses 50000 of them.
Either one of these should be faster; they both read all of the file into the pattern space and then use just one 's' command.

Code:

sed -in '1h; 1!H; ${g; s/foo\nbar/foo\nfoobar\nfoo/g;p}' ./testfile



sed -i ':a $!{N;ba}; s/foo\nbar/foo\nfoobar\nfoo/g' ./testfile

I tried the two proposals and both work nearly in zero time. Thank you.

Hi,

though your problem is solved I'd like to add another solution to address the large file "issue".

Code:

sed -i '/foo/ ! {/bar/ {x; /foo/ i\

foobar

x}};h' /path/to/file

This will not hold the complete file in memory to process it. It will insert the line 'foobar' if the lines 'foo' and 'bar' are in consecutive order.

As for the large file "issue":
It is not really an issue. Suppose every line contains 100 characters. with 50000 lines the filesize would be appr. 5MB. This is still negligible. So there is nothing wrong with the previous solutions. In fact, Kenhelm's is probably the fastest. So consider the above code just as complementary "backup" in the highly unlikely case that you might have to process a file of several hundred MB's in size.

PS: You might want to change the single quotes to double quotes and use variables for the patterns 'foo' and 'bar'. This way you keep the 'sed' readable if the pattern is big as in your case. You also gain flexibility.

Quote:

Originally Posted by crts (Post 4170050)

You might want to change the single quotes to double quotes and use variables for the patterns 'foo' and 'bar'. This way you keep the 'sed' readable if the pattern is big as in your case. You also gain flexibility.

For a pattern like:

Code:

foo bar

foo

bar

bar foo

Would this look like:

Code:

#!/bin/sh

FILE="./testfile"

START="foo bar

foo"

END="bar

bar foo"

INSERTION="foobar"

sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\$INSERTION x}};h" $FILE

???

Quote:

Originally Posted by XXLRay (Post 4170489)

For a pattern like:

Code:

foo bar

foo

bar

bar foo

Would this look like:

Code:

#!/bin/sh

FILE="./testfile"

START="foo bar

foo"

END="bar

bar foo"

INSERTION="foobar"

sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\$INSERTION x}};h" $FILE

???

No, if you have patterns that spread over a variable number of lines, then it is probably best to stick with the substitution command solution.
BTW, this

Code:

sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\$INSERTION x}};h" $FILE

is not correct. It is important to have it in this format:

Code:

sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\

$INSERTION

x}};h" $FILE

Otherwise sed will think that 'x' is also some content that needs to be inserted. But it is really a command.

Here is how you use the whole pattern in memory alternative with multiline patterns:

Code:

START="foo bar\nfoo"

END="bar\nbar foo"

INSERTION="foobar"



sed -e ":a $ ! {N;ba}; s/$START\n$END/$START\n$INSERTION\n$END/g" /path/to/file

Notice, that you use '\n' instead of a "real" newline (hitting enter) in your patterns.

Ah - ok thank you. I just misunderstood what you wrote and was already suspicious.