LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 10-29-2007, 05:35 AM   #1
dj_bridges
LQ Newbie
 
Registered: Oct 2007
Posts: 5

Rep: Reputation: 0
Question Problems defining SED pattern over multiple lines


I am trying to run a SED expression that will remove data contained within two patterns. Unfortunately I can't seem to get SED to recognise the two unless they are on the same line. I believe that there is a way to concatenate lines, but am struggling. Anyway the data file is a bibtex file of references, and I want to remove all occurences of the note field. I therefore want to match:
First Pattern = ^\tnote
Second pattern = \},.$
and anything in between = .*

This works perfectly when there aren't any line breaks in between the first and second pattern. The question is how do I get SED to recognise the two patterns over multiple lines.

Any help / tips would be greatly appreciated.....
 
Old 10-29-2007, 05:48 AM   #2
linuxgeek_ch
LQ Newbie
 
Registered: Oct 2007
Posts: 4

Rep: Reputation: 0
You can use the N operator to concatenate lines. For more Info see http://www.unix.org.ua/orelly/unix/sedawk/ch06_01.htm

Last edited by linuxgeek_ch; 10-29-2007 at 05:52 AM.
 
Old 10-29-2007, 05:49 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
have a read here
 
Old 10-29-2007, 09:04 AM   #4
dj_bridges
LQ Newbie
 
Registered: Oct 2007
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks for the links guys. I had been trying to use the N command, but have been failing miserably. Here are some of my attempts (replacing the matched lines with TEST for clarity):

sed -e /^\tnote/N s/^\tnote*/n*,.$/TEST/ Old > New
sed -e 's/^\tnote*/n*'\},.'$/TEST/' Old > New

Am going to try and replace using simpler terms e.g. words, over multiple lines, but would appreciate any other pointers....
 
Old 10-29-2007, 09:21 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
best to show your sample file and your expected output.
 
Old 10-29-2007, 10:29 AM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729
SED does all of its pattern matching in the "pattern space". The default is to read in one line to the pattern space, perform a test, then read in the next line.

To look for a pattern crossing more that one line, you would have to first use "N" to append one or more lines, then perform the tests. Here's a crude example (not tested):

cat filename | sed '{/^The/ {N; s/\n//; s/a.*b/_/g}}'

Translation:
Read filename into sed
For each line beginning with "The":
...Append another line
...Remove all newlines
...find all occurences of "a.*b" and replace with "_"

"a.*b" = "the letter a, followed by any # of characters, then the letter b"

Here is the best tutorial on SED that I have seen: http://www.grymoire.com/Unix/Sed.html
 
Old 10-29-2007, 11:18 AM   #7
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.

An alternative with awk:
Code:
#!/usr/bin/env sh

# @(#) s2       Demonstrate deletion of bounded text, even across lines.

set -o nounset
echo

debug=":"
debug="echo"

## Use local command version for the commands in this demonstration.

echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version bash awk

echo

FILE=${1-data2}

# First Pattern = ^\tnote
# Second pattern = \},.$

echo " Input file:"
cat -A $FILE

echo
echo " Results from awk:"
awk '
/^\tnote/,/\},.$/       { print "deleted: " $0; next }
1       { print "kept   : " $0 }
' $FILE

exit 0
Prodcuing:
Code:
% ./s2

(Versions displayed with local utility "version")
GNU bash 2.05b.0
GNU Awk 3.1.4

 Input file:
beginning of text sample$
note - text that should not be deleted.$
^Inote - text  that SHOULD be deleted all on one line },.$
Another line$
^Inote - text that SHOULD be$
deleted and crosses lines (stuff) },.$
More lines - one$
two$
end of text sample$

 Results from awk:
kept   : beginning of text sample
kept   : note - text that should not be deleted.
deleted:        note - text  that SHOULD be deleted all on one line },.
kept   : Another line
deleted:        note - text that SHOULD be
deleted: deleted and crosses lines (stuff) },.
kept   : More lines - one
kept   : two
kept   : end of text sample
cheers, makyo
 
Old 10-29-2007, 02:00 PM   #8
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.

Here are two aproaches with sed:
Code:
#!/bin/sh -

# @(#) s1       Demonstrate text deletion over a pattern-pattern # range.

echo
echo " (Versions displayed by local command \"version\")"
version sh sed cat

FILE=${1-data2}

echo
echo " Input file:"
cat -A $FILE

# First Pattern = ^\tnote
# Second pattern = \},.$

echo
echo " Results from sed (simple approach):"
sed '/^\tnote/,/},\.$/d' $FILE

echo
echo " Hold buffer approach:"
sed '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' $FILE
# based on:
# sed '{/^The/ {N; s/\n//; s/a.*b/_/g}}' $FILE

exit 0
Producing:
Code:
% ./s1

 (Versions displayed by local command "version")
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU sed version 4.1.2
cat (coreutils) 5.2.1

 Input file:
beginning of text sample$
note - text that should not be deleted.$
^Inote - text  that SHOULD be deleted all on one line },.$
Another line$
^Inote - text that SHOULD be$
deleted and crosses lines (stuff) },.$
More lines - one$
two$
end of text sample$

 Results from sed (simple approach):
beginning of text sample
note - text that should not be deleted.
More lines - one
two
end of text sample

 Hold buffer approach:
beginning of text sample
note - text that should not be deleted.
More lines - one
two
end of text sample
In both cases, a section of text was deleted that was probably not desired. I think this is the result of greedy matching. Perhaps someone will drop by with a way around this (or a correction), but I'd go with the awk, or something in perl ... cheers, makyo
 
Old 10-30-2007, 07:36 AM   #9
dj_bridges
LQ Newbie
 
Registered: Oct 2007
Posts: 5

Original Poster
Rep: Reputation: 0
Makyo and everyone else,

Thank you so so much for taking the time out to help me. I managed to get the script to working using the following commands:

sed -e '/^\tnote/,/\},.$/d' \
-e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' old > new

That was driving me nuts, so thanks for saving my sanity......
 
Old 10-30-2007, 12:14 PM   #10
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.

The results of running the script in post #9 also deleted the text between the 2 note sequences in my test file.

Are you sure that it is working the way you want? ... cheers, makyo
 
Old 10-31-2007, 06:07 AM   #11
dj_bridges
LQ Newbie
 
Registered: Oct 2007
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by makyo View Post
Hi.

The results of running the script in post #9 also deleted the text between the 2 note sequences in my test file.

Are you sure that it is working the way you want? ... cheers, makyo
Well spotted makyo - think I need to pay closer attention. OK so I have been fiddling around and the problem as you say is greedy matching when the two patterns are on one line. One (clumsy) workaround is to run sed twice as follows:

sed -e '/^\tnote.*\},./D' file > file1

Then again:


sed -e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' \
-e '/^\tnote/,/\},.$/d' file1 > file2

I thought that had solved the problem, but this has just shifted the problem on to where the patterns are spread over two lines e.g.

Input:

First Line
\tnote = blah blah blah
blah blah},
Next Line
Last Line

Output

First Line
Last Line

Perhaps I need to start again with a different tool e.g. Awk or Perl, but the thought of starting again from scratch while learning something else is not very appealing....
 
Old 10-31-2007, 06:18 AM   #12
dj_bridges
LQ Newbie
 
Registered: Oct 2007
Posts: 5

Original Poster
Rep: Reputation: 0
Ok I think it is finally solved. This solution will probably offend all the pure programmers, but I just need something that works. Anyway this script seems to work:

sed -e '/^\tnote.*\},./D' FILE1 > FILE2

sed -e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' \
-e '/^\tnote.*\},./D' \
-e '/^\tnote/,/\},.$/d' FILE2 > FILE3

I have tested it with a large file and it works throughout.

Fingers crossed that I haven't missed anything...
 
Old 10-31-2007, 09:06 AM   #13
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.

Yes, that seems to work. No doubt you would have thought of this minor improvement to combine both sed commands into a pipeline to avoid the extra intermediate file:
Code:
#!/bin/sh -

# @(#) user3    Demonstrate delete across lines with piping.

FILE=${1-data2}

echo
echo " Input file:"
my-nl $FILE

echo
echo " Results from sed (piped):"
sed -e '/^\tnote.*\},./D' $FILE |
sed -e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' \
-e '/^\tnote.*\},./D' \
-e '/^\tnote/,/\},.$/d'

exit 0
Producing:
Code:
% ./user3

 Input file:

==> data2 <==

  1 beginning of text sample
  2 note - text that should not be deleted.
  3     note - text  that SHOULD be deleted all on one line },.
  4 Another line
  5     note - text that SHOULD be
  6 deleted and crosses lines (stuff) },.
  7 More lines - one
  8 two
  9 end of text sample

 Results from sed (piped):
beginning of text sample
note - text that should not be deleted.
Another line
More lines - one
two
end of text sample
I generally advise people to make it right, then -- if necessary -- make it run faster. The same goes for elegance, beauty, etc ... cheers, makyo
 
  


Reply

Tags
awk, delete, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
last pattern with sed? xpto09 Linux - Newbie 6 10-04-2007 09:01 PM
Replacing a bunch of lines between a pattern Namachivayam Programming 1 05-21-2007 08:23 PM
AWK/SED Multiple pattern matching over multiple lines issue GigerMalmensteen Programming 15 12-03-2006 06:08 PM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 10:19 AM
replacement with sed: replace pattern with multiple lines Hcman Programming 5 11-18-2004 08:40 AM


All times are GMT -5. The time now is 12:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration