LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-28-2008, 12:18 PM   #1
unihiekka
Member
 
Registered: Aug 2005
Distribution: SuSE Linux / Scientific Linux / [K|X]ubuntu
Posts: 273

Rep: Reputation: 32
sed: replace same number of characters between tags


I am having trouble getting the command right for replacing a series of characters between two tags with spaces with the aid of sed. The idea is the following:

Replace
Quote:
<TAG ONE>*****</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>********</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>*</TAG ONE>
with

Quote:
<TAG ONE>xxxxx</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>xxxxxxxx</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>x</TAG ONE>
that is replace each asterisk * between TAG ONE delimiters with one x (actually a space, but that is not visible in the QUOTE environment here at LQ.com), and leave the other * alone.

I have an HTML ASCII art file, which I would like to "convert" with sed to a simple text file that I could use in other areas without the tags. I'd have to remove all tags and add the appropriate new lines, but with sed that is a piece of cake. The only problem that remains is the "between-tags" thingy.

Thanks!

Last edited by unihiekka; 12-28-2008 at 12:21 PM.
 
Old 12-28-2008, 12:30 PM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
How about posting what you have tried?

One way to approach this is with addressing. Example:

sed '/TAG ONE/s/\**/ /g'
Translation: For all lines containing the string "TAG ONE", replace a string containing any number of "*"s with a single space. Do this for all occurrences on the same line. Take out the extra "*" to replace each "*" with a space.

Best SED tutorial here: http://www.grymoire.com/Unix/Sed.html
 
Old 12-28-2008, 12:59 PM   #3
mk27
Member
 
Registered: Sep 2008
Distribution: fedora, gentoo, ubuntu
Posts: 148

Rep: Reputation: 23
I think perl is better for this multi-line stuff but I may be ignorant, I never use sed anyway:

Code:
#!/usr/bin/perl -w 
use strict;

while (<DATA>) {
	if ($_ =~ /^<TAG ONE>/) {$_ =~ s/\*/x/g}
	print $_;
}
	

__DATA__
<TAG ONE>*****</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>********</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>*</TAG ONE>
Output:
Code:
<TAG ONE>xxxxx</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>xxxxxxxx</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>x</TAG ONE>
If that's what you want and you're unfamiliar with perl, I can modify this to accept a file as input. I'm here all afternoon. Mostly.

There's almost certainly a similar way to do it with some shell code (using sed) tho. But you won't get it from me
 
Old 12-28-2008, 02:08 PM   #4
mk27
Member
 
Registered: Sep 2008
Distribution: fedora, gentoo, ubuntu
Posts: 148

Rep: Reputation: 23
Sudden after thought -- if you don't want to use a script this will work from the command line:

Code:
perl -pi'orig_*' -e 's/\*/x/g if /^<TAG ONE>/' yourfile.txt
 
Old 12-29-2008, 08:40 AM   #5
unihiekka
Member
 
Registered: Aug 2005
Distribution: SuSE Linux / Scientific Linux / [K|X]ubuntu
Posts: 273

Original Poster
Rep: Reputation: 32
OK, thanks. Sometimes there are several different tag environments on one line and then it changes all * into spaces instead of the ones between <TAG ONE> and </TAG ONE> only.
 
Old 12-29-2008, 12:32 PM   #6
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
Using GNU sed
Code:
sed ':a s/\(<TAG ONE> *\)\*/\1 /;ta' infile > outfile

# Input line
<TAG ONE>***</TAG ONE><TAG TWO>***</TAG TWO><TAG ONE>***</TAG ONE>
# Output line
<TAG ONE>   </TAG ONE><TAG TWO>***</TAG TWO><TAG ONE>   </TAG ONE>

Last edited by Kenhelm; 12-29-2008 at 02:09 PM. Reason: Removed unnecessary g flag:- s///g to s///
 
Old 12-30-2008, 03:51 AM   #7
unihiekka
Member
 
Registered: Aug 2005
Distribution: SuSE Linux / Scientific Linux / [K|X]ubuntu
Posts: 273

Original Poster
Rep: Reputation: 32
Thanks, I would have never come up with the last one! Many thanks.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 11:54 AM
Replace Ctrl-M (^M) characters with spaces.... visitnag Linux - Newbie 3 04-16-2008 09:05 AM
how to use sed to print text between two tags new_2_unix Linux - Newbie 4 01-07-2008 12:10 PM
How to modify the names of files and replace characters with other characters or symb peter88 Linux - General 2 12-10-2006 03:05 AM
replace null characters in a file Philipp Programming 2 09-20-2001 02:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration