But a similar question was asked recently (sorry -- can't find it to give a link) and we came up with a few ingenious ways of doing it and then somebody sanely pointed out that HTML allows a lot of variation in formatting (for example, line ends are only token separators) and that automated editing was very much better done with specialist tools that are written to work with HTML syntax. Made a lot of sense.
Quick netsearch got this page
. Might be some use.