LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-14-2004, 03:14 PM   #1
statmobile
Member
 
Registered: Aug 2003
Location: Chapel Hill, NC
Distribution: Gentoo, Windows 95 2000 & XP
Posts: 160

Rep: Reputation: 30
batch editing multiple lines


Hey, I'm looking for a little help here, and I'm having a little trouble finding the information. Basically, I need to edit a bunch of files in a directory, and all of its subdirectories. While I have done this before using sed, here is my current problem.

All of my pages have a row in a table that I would like to remove (in html). Here is what it looks like:
Code:
<tr>
<td>left</td>
<td>cell 1>foo1</td>
<td>right</td>
</tr>

<tr>
<td>left</td>
<td>cell 1>foo2</td>
<td>right</td>
</tr>
Now I only want to remove the row containing foo2, and leave all other rows in the table alone. Therein lies the problem I have using sed, because it works on a per line basis. I have no problem removing the line containing foo2, but I also need to delete everything in that table row from <tr>...</tr>. I can't have sed remove all lines with <tr>, because this would also affect the table row containing foo1. Any help would be much appreciated here, or maybe some advice on how web maintainers update the links that they have throughout their websites.
 
Old 09-14-2004, 04:31 PM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
The following code will remove paragraphs containing an expression (foo2 in the above example), all other paragraphs will be printed.

Called it rem-par.sh

Usage: rem-par.sh infile expression

Code:
#!/bin/sh
sed -n '
# if an empty line, check the paragraph
/^$/ b para
# else add it to the hold buffer
H
# at end of file, check paragraph
$ b para
# now branch to end of script
b
# this is where a paragraph is checked for the pattern
: para 
# return the entire paragraph
# into the pattern space
x
# look for the pattern, if there - delete, print all other paragraphs
/'$2'/{d}
p
' $1

Last edited by druuna; 09-14-2004 at 04:33 PM.
 
Old 09-14-2004, 08:40 PM   #3
statmobile
Member
 
Registered: Aug 2003
Location: Chapel Hill, NC
Distribution: Gentoo, Windows 95 2000 & XP
Posts: 160

Original Poster
Rep: Reputation: 30
I'm not quite sure if I follow your code. How exactly would I put the entire three lines in that code, including the returns? And I'm not quite following this buffer you are referring to, both how I'm supposed to pipe what's not in the paragraph into this? Could you maybe use the code I have above in your example? Thanks so much for the help!
 
Old 09-15-2004, 02:26 AM   #4
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 66
<tr>
<td>left</td>
<td>cell 1>foo1</td>
<td>right</td>
</tr>

<tr>
<td>left</td>
<td>cell 1>foo2</td>
<td>right</td>
</tr>

line 1: not empty so added to hold buffer:
H='<tr>'

line 2: not empty so added to hold buffer:
H='<tr>
<td>left</td>'

and so on until H='<tr>
<td>left</td>
<td>cell 1>foo1</td>
<td>right</td>
</tr>'

line 6: empty so branch to para, where H is put back (x) into the pattern space.
If pattern space contains "$2", it is deleted (d), else it is printed (p).

All this is done for all of the "$1" file. So usage: put all your HTML lines into a file, and run rem-par.sh file.html "foo2".
This will only work if your <tr>...</tr> blocks don't contain empty lines, and <tr> is first on line and not after a non-empty line, and </tr> is last on line and is not before a non-empty line.

Good script, cluuna!

Yves.
 
Old 09-15-2004, 03:24 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Just read the replies to this thread. theYinYeti gave a good explenation of the working of this little sed sript.

Just one minor thing about running it: The double quotes around foo2 are not necessary.
 
Old 09-15-2004, 10:41 AM   #6
statmobile
Member
 
Registered: Aug 2003
Location: Chapel Hill, NC
Distribution: Gentoo, Windows 95 2000 & XP
Posts: 160

Original Poster
Rep: Reputation: 30
I see what you're talking about now, and it's a great idea. Now I just need to implement it. I want to make sure to keep all the regular blank lines in the file, but that's just some tweaking. Thanks so much for your help you two, you're lifesavers.
 
Old 09-15-2004, 08:01 PM   #7
statmobile
Member
 
Registered: Aug 2003
Location: Chapel Hill, NC
Distribution: Gentoo, Windows 95 2000 & XP
Posts: 160

Original Poster
Rep: Reputation: 30
New problem, I'm not on a linux machine but on a Sun machine at work. The version of gsed is just too old, and it doesn't work. This code worked beautifully on my linux machine, but the files are all located on the servers at work. It looks like I'll have to figure out perl to get this done.
 
Old 09-16-2004, 04:08 AM   #8
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 66
You may try this script:

Code:
#!/bin/bash
# $1 (mandatory): pattern to look for.
# $2 (optional): input file (else standard input).
# $3 (optional): output file (else standard output).

PAT="$1"

filterTR() {
	local block="$1"
	local tmp=
	local line=
	while [ "$line" != "</tr>" ]; do
		read line || break
		if echo "$line" | grep -i '<tr\([[:space:]][^>]*\)*>' >/dev/null 2>&1; then
			tmp=$(filterTR "$line")
			[ -n "$tmp" ] && block="$block
$tmp"
		elif echo "$line" | grep "$PAT" >/dev/null 2>&1; then
			block=
			while [ "$line" != "</tr>" ]; do
				read line || break
			done
			break
		else
			block="$block
$line"
		fi
	done
	[ -n "$block" ] && echo "$block"
}

if [ -n "$2" ]; then
	exec 0<"$2"
	[ $? -eq 0 ] || exit 1
fi
if [ -n "$3" ]; then
	exec 1>"$3"
	[ $? -eq 0 ] || exit 1
fi

sed -e 's|\(.\)\(</*tr\([[:space:]][^>]*\)*>\)|\1\n\2|gi' \
	-e 's|\(</*tr\([[:space:]][^>]*\)*>\)\(.\)|\1\n\3|gi' \
| while read line; do
	if echo "$line" | grep -i '<tr\([[:space:]][^>]*\)*>' >/dev/null 2>&1; then
		filterTR "$line"
	else
		echo "$line"
	fi
done
I did not test a lot, but it can't do harm.

Yves.
 
Old 09-16-2004, 08:05 PM   #9
statmobile
Member
 
Registered: Aug 2003
Location: Chapel Hill, NC
Distribution: Gentoo, Windows 95 2000 & XP
Posts: 160

Original Poster
Rep: Reputation: 30
I'm going out of town for a wedding this weekend, I'll try it as soon as I get back. Thanks for keeping up with me.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
echo multiple lines of text twistedpair Linux - Software 9 08-08-2007 06:07 PM
[bash] removing or editing lines in file Erhnam Programming 12 03-15-2005 07:25 AM
ISDN multiple lines schimmelpilz Linux - Newbie 1 02-24-2004 05:39 PM
combining multiple dsl lines BaudRacer General 3 01-12-2004 09:15 AM
Multiple redundant WAN lines. hubergeek Linux - Networking 8 07-25-2003 09:27 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:28 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration