As has been mentioned,
the correct way to read text from HTML is with a HTML parser.
But a very quick and dirty solution that
might be good enough for a one-off is:
Code:
$ grep -Po '(?<=title=")[^"]+' file.html
Other than potentially malformed HTML, the other downside to this is HTML entities are not decoded (so a well-formed title with quotes in will appear as
", for example).
If this isn't a one-off then you should explain the general task you're trying to achieve, because there's probably a simpler solution. (Perhaps involving the site's Atom/RSS feed, for example.)