Hi there, I am trying to parse some product titles out of a HTML file that can be obtained at
What I am trying to do, is grab the titles of the products (such as fina, mannerly, etc) and print them off one by one. I didnt think this would be difficult at all using regular expressions, but the problem is, that some of the titles are in a huge piece of HTML all stuck on a single line.
Usually I would just use SED, and replace the line with just the part that I wanted, but this wont work in this case since there is a lot of extra crap that will be printed off as well.
My last guess was to use this command
curl 'http://www.moen.com/ecatalog/gallery/bathroom-faucets-sink/_/N-67p?Erp=12' | sed -n 's/.*target="_top"><span class="producttitle">\([a-zA-Z]*\)<\/span>.*/\1/pg'
The output is as follows
Which as you can see by looking at the page in a webbrowser, is the last product on each line.
If you take out the .* on each side of the search part of the regex, and look at the output with a fine tooth comb, you can see that this DOES find and replace every regular expression, its just that I dont know how to print them off without all of the extra HTML I dont need.
if this does not make sense please feel free to tell me and I will make myself clearer