Hi gurus, is there any elegant way how to get rid of html pairing tags and text inside those pairing tags ?
Or just remove text inside tags and preserve html tags ? (I can remove html tags after so this would not be problem)
for example:
Code:
<tag>text to be removed with or without tags</tag>
I tried that regular expression
Code:
<[^>]*>[^<]*</[^>]*>
that works fine until I have "nested tags"
Code:
<tag><nested>text to be removed with or without tags</nested></tag>
that only match string "iniside" <nested> and not whole <tag>
I think using sed's memory to memorize "<tag>" and then "</tag>" could be the way. But I am not sure if that is possible only in replace and not match section. Something like this
Code:
sed -n 's/<([^>]*>)[^<]*<\/\1//gp'
PS: Just for clear <br /> tags should not be treated because it will remove a lot of texts (I know <br> is not pairing... just for clear, also <br> can in first step replace by $$$$$ etc.)
Sorry I have not linux box so I cant test It, but hope you understand what I am looking for. Thank you