sed: delete lines after last occurrence of a pattern in a file

zugvogel · 11-16-2009, 06:40 PM

Hello,

I'm having trouble finding how to delete all lines after the last occurrence of a pattern.

I know this deletes all lines after it finds PATTERN:
sed '/PATTERN/q' file.in > file.out

but if I have a file like:
qwe
PATTERN
rty
PATTERN
uiop

the result is:
qwe
PATTERN

when in fact I want:
qwe
PATTERN
rty
PATTERN

Can anyone tell me how I achieve this? I found lots of references with google to replacing the last occurrence of one word with another within a line, but not something like the case I have.

Thanks!

Telemachos · 11-16-2009, 08:05 PM

I'm not entirely sure that sed is the best tool for this job. The problem is that sed is so line-oriented, and you need to maintain state (at least at some level) to do this.

A quick stab at how I might do this (probably in Perl): track through the file line by line; every time I see PATTERN, note the line number (just one variable for this $last_seen); then rewind to the top of the file and only print back out up to the line number that I was left with.

The general problem is that whatever tool you use has no obvious way of knowing whether the next line is another occurrence of PATTERN or EOF.

An example, not very fancy:

Code:

#!/usr/bin/env perl
use strict;
use warnings;

open my $fh, '<', 'file.txt'
	or die "Can't open 'file.txt' for reading: $!";

my $last_seen;

while (<$fh>) {
	$last_seen = $. if $_ =~ /PATTERN/;
	print $., "\n";
}

open my $out, '>', 'new_file.txt'
	or die "Can't open 'new_file.txt' for writing: $!";

seek($fh, 0, 0);
$. = 0;

while (<$fh>) {
	print $out $_;
	print $., "\n";
	last if $. == $last_seen;
}

zugvogel · 11-16-2009, 10:18 PM

Hi Telemachos,

I suppose the difficulty of doing this with sed is why I was unable to find an appropriate sed-based solution with google.

Thank you for your kind help and demonstrating a perl-based solution. In the end, taking note about what you said about "rewind", I have written a fortran program to selectively read in data, using the "backspace" command to go back through the file when needed.

Thank you again!

ghostdog74 · 11-16-2009, 11:16 PM

@OP,you should learn how to use gawk instead.

Code:

$ more file
qwe
PATTERN
rty
PATTERN
uiop
blah blah PATTERN
lksf
lasd
PATTERN
end

$ gawk -vRS="PATTERN" 'NR>1{print s RT} {s=$0}' ORS=""  file
qwe
PATTERN
rty
PATTERN
uiop
blah blah PATTERN
lksf
lasd

$ more file
qwe
PATTERN
rty
PATTERN
uiop
blah blah PATTERN
lksf
lasd
end

$ gawk -vRS="PATTERN" 'NR>1{print s RT} {s=$0}' ORS=""  file
qwe
PATTERN
rty
PATTERN
uiop
blah blah

Kenhelm · 11-17-2009, 01:49 AM

tac can simplify the problem for sed by temporally reversing the order of the lines.
The '0' address is a GNU sed extension and is needed here in case PATTERN is on line 1 of the reversed file.

Code:

echo 'qwe
PATTERN
rty
PATTERN
uiop' | tac | sed  '0,/PATTERN/{/PATTERN/!d}' | tac

qwe
PATTERN
rty
PATTERN

This uses a loop ':a N;$!ba' to put all the lines into the sed pattern space then a 's' command to delete anything after the last PATTERN.

Code:

echo 'qwe
PATTERN
rty
PATTERN
uiop' | sed ':a N;$!ba; s/\(.*PATTERN\).*/\1/'

qwe
PATTERN
rty
PATTERN