LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sed: replace same number of characters between tags (https://www.linuxquestions.org/questions/linux-newbie-8/sed-replace-same-number-of-characters-between-tags-693477/)

unihiekka 12-28-2008 12:18 PM

sed: replace same number of characters between tags
 
I am having trouble getting the command right for replacing a series of characters between two tags with spaces with the aid of sed. The idea is the following:

Replace
Quote:

<TAG ONE>*****</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>********</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>*</TAG ONE>
with

Quote:

<TAG ONE>xxxxx</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>xxxxxxxx</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>x</TAG ONE>
that is replace each asterisk * between TAG ONE delimiters with one x (actually a space, but that is not visible in the QUOTE environment here at LQ.com), and leave the other * alone.

I have an HTML ASCII art file, which I would like to "convert" with sed to a simple text file that I could use in other areas without the tags. I'd have to remove all tags and add the appropriate new lines, but with sed that is a piece of cake. The only problem that remains is the "between-tags" thingy.

Thanks!

pixellany 12-28-2008 12:30 PM

How about posting what you have tried?

One way to approach this is with addressing. Example:

sed '/TAG ONE/s/\**/ /g'
Translation: For all lines containing the string "TAG ONE", replace a string containing any number of "*"s with a single space. Do this for all occurrences on the same line. Take out the extra "*" to replace each "*" with a space.

Best SED tutorial here: http://www.grymoire.com/Unix/Sed.html

mk27 12-28-2008 12:59 PM

I think perl is better for this multi-line stuff but I may be ignorant, I never use sed anyway:

Code:

#!/usr/bin/perl -w
use strict;

while (<DATA>) {
        if ($_ =~ /^<TAG ONE>/) {$_ =~ s/\*/x/g}
        print $_;
}
       

__DATA__
<TAG ONE>*****</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>********</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>*</TAG ONE>

Output:
Code:

<TAG ONE>xxxxx</TAG ONE>
<TAG TWO>**</TAG TWO>
<TAG ONE>xxxxxxxx</TAG ONE>
<TAG THREE>*********************</TAG THREE>
<TAG ONE>x</TAG ONE>

If that's what you want and you're unfamiliar with perl, I can modify this to accept a file as input. I'm here all afternoon. Mostly.

There's almost certainly a similar way to do it with some shell code (using sed) tho. But you won't get it from me ;)

mk27 12-28-2008 02:08 PM

Sudden after thought -- if you don't want to use a script this will work from the command line:

Code:

perl -pi'orig_*' -e 's/\*/x/g if /^<TAG ONE>/' yourfile.txt

unihiekka 12-29-2008 08:40 AM

OK, thanks. Sometimes there are several different tag environments on one line and then it changes all * into spaces instead of the ones between <TAG ONE> and </TAG ONE> only.

Kenhelm 12-29-2008 12:32 PM

Using GNU sed
Code:

sed ':a s/\(<TAG ONE> *\)\*/\1 /;ta' infile > outfile

# Input line
<TAG ONE>***</TAG ONE><TAG TWO>***</TAG TWO><TAG ONE>***</TAG ONE>
# Output line
<TAG ONE>  </TAG ONE><TAG TWO>***</TAG TWO><TAG ONE>  </TAG ONE>


unihiekka 12-30-2008 03:51 AM

Thanks, I would have never come up with the last one! Many thanks.


All times are GMT -5. The time now is 05:12 AM.