LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   How to replace newline pattern in file by other newline pattern in a shell script (https://www.linuxquestions.org/questions/linux-software-2/how-to-replace-newline-pattern-in-file-by-other-newline-pattern-in-a-shell-script-845804/)

XXLRay 11-22-2010 08:53 AM

How to replace newline pattern in file by other newline pattern in a shell script
 
I have several (vhdl) files containing a pattern with newline characters that I need to replace by another pattern that also contains newline characters.

I start with something like:
Code:

    port (
      A : in std_logic_vector(width - 1 downto 0);
      Z : out std_logic;
    );

I want to replace it by something like:
Code:

    port (
      A : in std_logic_vector(width - 1 downto 0);
      B : in std_logig;
      Z : out std_logic;
    );

(I need to paste some lines)

As I need to do this (very) often I want to use a shell script.

I tried:

1.
Code:

#!/bin/sh
sed -i 's/      A : in std_logic_vector(width - 1 downto 0);
      Z : out std_logic;/    A : in std_logic_vector(width - 1 downto 0);
      B : in std_logig;
      Z : out std_logic;/g' ./testfile

result:
Code:

sed: -e expression #1, char 5: unterminated `s' command
2.
Code:

#!/bin/sh
sed -i 's/      A : in std_logic_vector(width - 1 downto 0);\n      Z : out std_logic;/    A : in std_logic_vector(width - 1 downto 0);\n      B : in std_logig;\n      Z : out std_logic;/g' ./testfile

result:
File remains unchanged

3.
Code:

#!/bin/sh
awk '{sub(/      A : in std_logic_vector(width - 1 downto 0);
      Z : out std_logic;/,"    A : in std_logic_vector(width - 1 downto 0);
      B : in std_logig;
      Z : out std_logic;")}1' ./testfile

result:
Code:

unexpected newline or end of string
4.
Code:

#!/bin/sh
awk '{sub(/      A : in std_logic_vector(width - 1 downto 0);\n      Z : out std_logic;/,"    A : in std_logic_vector(width - 1 downto 0);\n      B : in std_logig;\n      Z : out std_logic;")}1' ./testfile

result:
Displays the unchanged testfile

Are there any suggestions how I can automate the pattern replacement?

Code:

$ sed --version
GNU sed version 4.1.5


colucix 11-22-2010 09:34 AM

A multi-line pattern is recognized only if you store all the lines into the pattern space. Using the following sed command, basically you store all the lines into the pattern space separated by newline, perform the substitution and finally print the result:
Code:

sed -n '1h; 1!H; g; s/      A : in std_logic_vector(width - 1 downto 0);\n      Z : out std_logic;/      A : in std_logic_vector(width - 1 downto 0);\n      B : in std_logig;\n      Z : out std_logic;/g; $p' testfile
Problems may arise if the file is very large, since all the content will be stored into the pattern space. Moreover, this one-liner is a monster, due to the length of the patterns. Maybe you can change the logic, for example by adding text after the line containing 'A :' but it depends on the actual task to achieve.

XXLRay 11-23-2010 01:54 AM

Wow, thank you. It seems to work. Until now I only tested a small file. The final file really is quite big (more than 50,000 lines). I will see how it works for it.

I only have one remark for others following my attempt. To perform the action in place (i.e. directly inside the file) you have to add 'i' as parameter:
Code:

sed -ni '1h; 1!H; g; s/foo\nbar/foo\nfoobar\nfoo/g; $p' ./testfile
This will turn the content of ./testfile
Code:

foo
bar

into
Code:

foo
foobar
bar


XXLRay 11-23-2010 06:15 AM

Just to give an update:

I applied the method to the 50,000 lines file. It took about 15 minutes on a Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz but at least it works.

Kenhelm 11-23-2010 12:16 PM

The sed code is slow because it applies a 's' command at each cycle and so uses 50000 of them.
Either one of these should be faster; they both read all of the file into the pattern space and then use just one 's' command.
Code:

sed -in '1h; 1!H; ${g; s/foo\nbar/foo\nfoobar\nfoo/g;p}' ./testfile

sed -i ':a $!{N;ba}; s/foo\nbar/foo\nfoobar\nfoo/g' ./testfile


XXLRay 11-24-2010 01:54 AM

I tried the two proposals and both work nearly in zero time. Thank you.

crts 11-24-2010 04:51 PM

large files
 
Hi,

though your problem is solved I'd like to add another solution to address the large file "issue".
Code:

sed -i '/foo/ ! {/bar/ {x; /foo/ i\
foobar
x}};h' /path/to/file

This will not hold the complete file in memory to process it. It will insert the line 'foobar' if the lines 'foo' and 'bar' are in consecutive order.

As for the large file "issue":
It is not really an issue. Suppose every line contains 100 characters. with 50000 lines the filesize would be appr. 5MB. This is still negligible. So there is nothing wrong with the previous solutions. In fact, Kenhelm's is probably the fastest. So consider the above code just as complementary "backup" in the highly unlikely case that you might have to process a file of several hundred MB's in size.

PS: You might want to change the single quotes to double quotes and use variables for the patterns 'foo' and 'bar'. This way you keep the 'sed' readable if the pattern is big as in your case. You also gain flexibility.

XXLRay 11-25-2010 03:13 AM

Quote:

Originally Posted by crts (Post 4170050)
You might want to change the single quotes to double quotes and use variables for the patterns 'foo' and 'bar'. This way you keep the 'sed' readable if the pattern is big as in your case. You also gain flexibility.

For a pattern like:
Code:

foo bar
foo
bar
bar foo

Would this look like:
Code:

#!/bin/sh
FILE="./testfile"
START="foo bar
foo"
END="bar
bar foo"
INSERTION="foobar"
sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\$INSERTION x}};h" $FILE

???

crts 11-26-2010 03:03 AM

Quote:

Originally Posted by XXLRay (Post 4170489)
For a pattern like:
Code:

foo bar
foo
bar
bar foo

Would this look like:
Code:

#!/bin/sh
FILE="./testfile"
START="foo bar
foo"
END="bar
bar foo"
INSERTION="foobar"
sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\$INSERTION x}};h" $FILE

???

No, if you have patterns that spread over a variable number of lines, then it is probably best to stick with the substitution command solution.
BTW, this
Code:

sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\$INSERTION x}};h" $FILE
is not correct. It is important to have it in this format:
Code:

sed -i "/$START/ ! {/$END/ {x; /$BEGINNING/ i\
$INSERTION
x}};h" $FILE

Otherwise sed will think that 'x' is also some content that needs to be inserted. But it is really a command.

Here is how you use the whole pattern in memory alternative with multiline patterns:
Code:

START="foo bar\nfoo"
END="bar\nbar foo"
INSERTION="foobar"

sed -e ":a $ ! {N;ba}; s/$START\n$END/$START\n$INSERTION\n$END/g" /path/to/file

Notice, that you use '\n' instead of a "real" newline (hitting enter) in your patterns.

XXLRay 11-29-2010 07:57 AM

Ah - ok thank you. I just misunderstood what you wrote and was already suspicious.


All times are GMT -5. The time now is 09:43 AM.