Sorry for bringing up this old topic, but I have a similar problem -- i need HTML tags stripped from .html files
The lynx -dump option is nice and tempting (html2text doesn't suit my intentions), but time after time there are files it doesn't work on! Unfortunately I'm no HTML expert and it is almost impossible to determine what goes wrong with lynx. It just outputs the .html file unaltered.
Is there indeed no better option than writing my own tag stripper in c++ (I don't know pearl). Any piece of advice? Please? Anybody?