batch editing multiple lines

statmobile · 09-14-2004, 03:14 PM

Hey, I'm looking for a little help here, and I'm having a little trouble finding the information. Basically, I need to edit a bunch of files in a directory, and all of its subdirectories. While I have done this before using sed, here is my current problem.

All of my pages have a row in a table that I would like to remove (in html). Here is what it looks like:

Code:

<tr>
<td>left</td>
<td>cell 1>foo1</td>
<td>right</td>
</tr>

<tr>
<td>left</td>
<td>cell 1>foo2</td>
<td>right</td>
</tr>

Now I only want to remove the row containing foo2, and leave all other rows in the table alone. Therein lies the problem I have using sed, because it works on a per line basis. I have no problem removing the line containing foo2, but I also need to delete everything in that table row from <tr>...</tr>. I can't have sed remove all lines with <tr>, because this would also affect the table row containing foo1. Any help would be much appreciated here, or maybe some advice on how web maintainers update the links that they have throughout their websites.

druuna · 09-14-2004, 04:31 PM

The following code will remove paragraphs containing an expression (foo2 in the above example), all other paragraphs will be printed.

Called it rem-par.sh

Usage: rem-par.sh infile expression

Code:

#!/bin/sh
sed -n '
# if an empty line, check the paragraph
/^$/ b para
# else add it to the hold buffer
H
# at end of file, check paragraph
$ b para
# now branch to end of script
b
# this is where a paragraph is checked for the pattern
: para 
# return the entire paragraph
# into the pattern space
x
# look for the pattern, if there - delete, print all other paragraphs
/'$2'/{d}
p
' $1

statmobile · 09-14-2004, 08:40 PM

I'm not quite sure if I follow your code. How exactly would I put the entire three lines in that code, including the returns? And I'm not quite following this buffer you are referring to, both how I'm supposed to pipe what's not in the paragraph into this? Could you maybe use the code I have above in your example? Thanks so much for the help!

theYinYeti · 09-15-2004, 02:26 AM

<tr>
<td>left</td>
<td>cell 1>foo1</td>
<td>right</td>
</tr>

<tr>
<td>left</td>
<td>cell 1>foo2</td>
<td>right</td>
</tr>

line 1: not empty so added to hold buffer:
H='<tr>'

line 2: not empty so added to hold buffer:
H='<tr>
<td>left</td>'

and so on until H='<tr>
<td>left</td>
<td>cell 1>foo1</td>
<td>right</td>
</tr>'

line 6: empty so branch to para, where H is put back (x) into the pattern space.
If pattern space contains "$2", it is deleted (d), else it is printed (p).

All this is done for all of the "$1" file. So usage: put all your HTML lines into a file, and run rem-par.sh file.html "foo2".
This will only work if your <tr>...</tr> blocks don't contain empty lines, and <tr> is first on line and not after a non-empty line, and </tr> is last on line and is not before a non-empty line.

Good script, cluuna!

Yves.

druuna · 09-15-2004, 03:24 AM

Just read the replies to this thread. theYinYeti gave a good explenation of the working of this little sed sript.

Just one minor thing about running it: The double quotes around foo2 are not necessary.

statmobile · 09-15-2004, 10:41 AM

I see what you're talking about now, and it's a great idea. Now I just need to implement it. I want to make sure to keep all the regular blank lines in the file, but that's just some tweaking. Thanks so much for your help you two, you're lifesavers.

statmobile · 09-15-2004, 08:01 PM

New problem, I'm not on a linux machine but on a Sun machine at work. The version of gsed is just too old, and it doesn't work. This code worked beautifully on my linux machine, but the files are all located on the servers at work. It looks like I'll have to figure out perl to get this done.

theYinYeti · 09-16-2004, 04:08 AM

You may try this script:

Code:

#!/bin/bash
# $1 (mandatory): pattern to look for.
# $2 (optional): input file (else standard input).
# $3 (optional): output file (else standard output).

PAT="$1"

filterTR() {
	local block="$1"
	local tmp=
	local line=
	while [ "$line" != "</tr>" ]; do
		read line || break
		if echo "$line" | grep -i '<tr\([[:space:]][^>]*\)*>' >/dev/null 2>&1; then
			tmp=$(filterTR "$line")
			[ -n "$tmp" ] && block="$block
$tmp"
		elif echo "$line" | grep "$PAT" >/dev/null 2>&1; then
			block=
			while [ "$line" != "</tr>" ]; do
				read line || break
			done
			break
		else
			block="$block
$line"
		fi
	done
	[ -n "$block" ] && echo "$block"
}

if [ -n "$2" ]; then
	exec 0<"$2"
	[ $? -eq 0 ] || exit 1
fi
if [ -n "$3" ]; then
	exec 1>"$3"
	[ $? -eq 0 ] || exit 1
fi

sed -e 's|\(.\)\(</*tr\([[:space:]][^>]*\)*>\)|\1\n\2|gi' \
	-e 's|\(</*tr\([[:space:]][^>]*\)*>\)\(.\)|\1\n\3|gi' \
| while read line; do
	if echo "$line" | grep -i '<tr\([[:space:]][^>]*\)*>' >/dev/null 2>&1; then
		filterTR "$line"
	else
		echo "$line"
	fi
done

I did not test a lot, but it can't do harm.

Yves.

statmobile · 09-16-2004, 08:05 PM

I'm going out of town for a wedding this weekend, I'll try it as soon as I get back. Thanks for keeping up with me.