ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am trying to run a SED expression that will remove data contained within two patterns. Unfortunately I can't seem to get SED to recognise the two unless they are on the same line. I believe that there is a way to concatenate lines, but am struggling. Anyway the data file is a bibtex file of references, and I want to remove all occurences of the note field. I therefore want to match:
First Pattern = ^\tnote
Second pattern = \},.$
and anything in between = .*
This works perfectly when there aren't any line breaks in between the first and second pattern. The question is how do I get SED to recognise the two patterns over multiple lines.
Thanks for the links guys. I had been trying to use the N command, but have been failing miserably. Here are some of my attempts (replacing the matched lines with TEST for clarity):
sed -e /^\tnote/N s/^\tnote*/n*,.$/TEST/ Old > New
sed -e 's/^\tnote*/n*'\},.'$/TEST/' Old > New
Am going to try and replace using simpler terms e.g. words, over multiple lines, but would appreciate any other pointers....
SED does all of its pattern matching in the "pattern space". The default is to read in one line to the pattern space, perform a test, then read in the next line.
To look for a pattern crossing more that one line, you would have to first use "N" to append one or more lines, then perform the tests. Here's a crude example (not tested):
cat filename | sed '{/^The/ {N; s/\n//; s/a.*b/_/g}}'
Translation:
Read filename into sed
For each line beginning with "The":
...Append another line
...Remove all newlines
...find all occurences of "a.*b" and replace with "_"
"a.*b" = "the letter a, followed by any # of characters, then the letter b"
#!/usr/bin/env sh
# @(#) s2 Demonstrate deletion of bounded text, even across lines.
set -o nounset
echo
debug=":"
debug="echo"
## Use local command version for the commands in this demonstration.
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version bash awk
echo
FILE=${1-data2}
# First Pattern = ^\tnote
# Second pattern = \},.$
echo " Input file:"
cat -A $FILE
echo
echo " Results from awk:"
awk '
/^\tnote/,/\},.$/ { print "deleted: " $0; next }
1 { print "kept : " $0 }
' $FILE
exit 0
Prodcuing:
Code:
% ./s2
(Versions displayed with local utility "version")
GNU bash 2.05b.0
GNU Awk 3.1.4
Input file:
beginning of text sample$
note - text that should not be deleted.$
^Inote - text that SHOULD be deleted all on one line },.$
Another line$
^Inote - text that SHOULD be$
deleted and crosses lines (stuff) },.$
More lines - one$
two$
end of text sample$
Results from awk:
kept : beginning of text sample
kept : note - text that should not be deleted.
deleted: note - text that SHOULD be deleted all on one line },.
kept : Another line
deleted: note - text that SHOULD be
deleted: deleted and crosses lines (stuff) },.
kept : More lines - one
kept : two
kept : end of text sample
#!/bin/sh -
# @(#) s1 Demonstrate text deletion over a pattern-pattern # range.
echo
echo " (Versions displayed by local command \"version\")"
version sh sed cat
FILE=${1-data2}
echo
echo " Input file:"
cat -A $FILE
# First Pattern = ^\tnote
# Second pattern = \},.$
echo
echo " Results from sed (simple approach):"
sed '/^\tnote/,/},\.$/d' $FILE
echo
echo " Hold buffer approach:"
sed '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' $FILE
# based on:
# sed '{/^The/ {N; s/\n//; s/a.*b/_/g}}' $FILE
exit 0
Producing:
Code:
% ./s1
(Versions displayed by local command "version")
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU sed version 4.1.2
cat (coreutils) 5.2.1
Input file:
beginning of text sample$
note - text that should not be deleted.$
^Inote - text that SHOULD be deleted all on one line },.$
Another line$
^Inote - text that SHOULD be$
deleted and crosses lines (stuff) },.$
More lines - one$
two$
end of text sample$
Results from sed (simple approach):
beginning of text sample
note - text that should not be deleted.
More lines - one
two
end of text sample
Hold buffer approach:
beginning of text sample
note - text that should not be deleted.
More lines - one
two
end of text sample
In both cases, a section of text was deleted that was probably not desired. I think this is the result of greedy matching. Perhaps someone will drop by with a way around this (or a correction), but I'd go with the awk, or something in perl ... cheers, makyo
The results of running the script in post #9 also deleted the text between the 2 note sequences in my test file.
Are you sure that it is working the way you want? ... cheers, makyo
Well spotted makyo - think I need to pay closer attention. OK so I have been fiddling around and the problem as you say is greedy matching when the two patterns are on one line. One (clumsy) workaround is to run sed twice as follows:
I thought that had solved the problem, but this has just shifted the problem on to where the patterns are spread over two lines e.g.
Input:
First Line
\tnote = blah blah blah
blah blah},
Next Line
Last Line
Output
First Line
Last Line
Perhaps I need to start again with a different tool e.g. Awk or Perl, but the thought of starting again from scratch while learning something else is not very appealing....
Ok I think it is finally solved. This solution will probably offend all the pure programmers, but I just need something that works. Anyway this script seems to work:
Yes, that seems to work. No doubt you would have thought of this minor improvement to combine both sed commands into a pipeline to avoid the extra intermediate file:
Code:
#!/bin/sh -
# @(#) user3 Demonstrate delete across lines with piping.
FILE=${1-data2}
echo
echo " Input file:"
my-nl $FILE
echo
echo " Results from sed (piped):"
sed -e '/^\tnote.*\},./D' $FILE |
sed -e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' \
-e '/^\tnote.*\},./D' \
-e '/^\tnote/,/\},.$/d'
exit 0
Producing:
Code:
% ./user3
Input file:
==> data2 <==
1 beginning of text sample
2 note - text that should not be deleted.
3 note - text that SHOULD be deleted all on one line },.
4 Another line
5 note - text that SHOULD be
6 deleted and crosses lines (stuff) },.
7 More lines - one
8 two
9 end of text sample
Results from sed (piped):
beginning of text sample
note - text that should not be deleted.
Another line
More lines - one
two
end of text sample
I generally advise people to make it right, then -- if necessary -- make it run faster. The same goes for elegance, beauty, etc ... cheers, makyo
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.