Ouch. Seven levels of nested pipes is not very efficient. A single well-written awk script could certainly replace all of your separate awk and sed commands. And
strings? What do you need
that for?
It might help your parsing to run the file through
htmltidy first, to clean up any formatting problems before extracting the text.
Another option, depending on your exact needs, may be to use
xmlstarlet (or another tool purposely designed for parsing xml/html) instead. One option it has is for converting the input into "pyx" format, which is easier for line-based tools like
sed and
awk to parse. Again, you should run the html source through tidy first to convert it to proper xhtml.
Code:
curl .. | tidy -n -asxml 2>/dev/null | xmlstarlet pyx
This should give you pyx output. It's up to you decide if parsing that is useful to you or not.