Problems defining SED pattern over multiple lines
I am trying to run a SED expression that will remove data contained within two patterns. Unfortunately I can't seem to get SED to recognise the two unless they are on the same line. I believe that there is a way to concatenate lines, but am struggling. Anyway the data file is a bibtex file of references, and I want to remove all occurences of the note field. I therefore want to match:
First Pattern = ^\tnote Second pattern = \},.$ and anything in between = .* This works perfectly when there aren't any line breaks in between the first and second pattern. The question is how do I get SED to recognise the two patterns over multiple lines. Any help / tips would be greatly appreciated..... |
You can use the N operator to concatenate lines. For more Info see http://www.unix.org.ua/orelly/unix/sedawk/ch06_01.htm
|
have a read here
|
Thanks for the links guys. I had been trying to use the N command, but have been failing miserably. Here are some of my attempts (replacing the matched lines with TEST for clarity):
sed -e /^\tnote/N s/^\tnote*/n*,.$/TEST/ Old > New sed -e 's/^\tnote*/n*'\},.'$/TEST/' Old > New Am going to try and replace using simpler terms e.g. words, over multiple lines, but would appreciate any other pointers.... |
best to show your sample file and your expected output.
|
SED does all of its pattern matching in the "pattern space". The default is to read in one line to the pattern space, perform a test, then read in the next line.
To look for a pattern crossing more that one line, you would have to first use "N" to append one or more lines, then perform the tests. Here's a crude example (not tested): cat filename | sed '{/^The/ {N; s/\n//; s/a.*b/_/g}}' Translation: Read filename into sed For each line beginning with "The": ...Append another line ...Remove all newlines ...find all occurences of "a.*b" and replace with "_" "a.*b" = "the letter a, followed by any # of characters, then the letter b" Here is the best tutorial on SED that I have seen: http://www.grymoire.com/Unix/Sed.html |
Hi.
An alternative with awk: Code:
#!/usr/bin/env sh Code:
% ./s2 |
Hi.
Here are two aproaches with sed: Code:
#!/bin/sh - Code:
% ./s1 |
Makyo and everyone else,
Thank you so so much for taking the time out to help me. I managed to get the script to working using the following commands: sed -e '/^\tnote/,/\},.$/d' \ -e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' old > new That was driving me nuts, so thanks for saving my sanity...... |
Hi.
The results of running the script in post #9 also deleted the text between the 2 note sequences in my test file. Are you sure that it is working the way you want? ... cheers, makyo |
Quote:
sed -e '/^\tnote.*\},./D' file > file1 Then again: sed -e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' \ -e '/^\tnote/,/\},.$/d' file1 > file2 I thought that had solved the problem, but this has just shifted the problem on to where the patterns are spread over two lines e.g. Input: First Line \tnote = blah blah blah blah blah}, Next Line Last Line Output First Line Last Line Perhaps I need to start again with a different tool e.g. Awk or Perl, but the thought of starting again from scratch while learning something else is not very appealing.... |
Ok I think it is finally solved. This solution will probably offend all the pure programmers, but I just need something that works. Anyway this script seems to work:
sed -e '/^\tnote.*\},./D' FILE1 > FILE2 sed -e '{/^\tnote/ {N; s/\n//; /\tnote.*},\./d}}' \ -e '/^\tnote.*\},./D' \ -e '/^\tnote/,/\},.$/d' FILE2 > FILE3 I have tested it with a large file and it works throughout. Fingers crossed that I haven't missed anything... |
Hi.
Yes, that seems to work. No doubt you would have thought of this minor improvement to combine both sed commands into a pipeline to avoid the extra intermediate file: Code:
#!/bin/sh - Code:
% ./user3 |
All times are GMT -5. The time now is 11:11 AM. |