Is it XHTML or HTML4? Is it properly formed (i.e. matching tags&closing tags, nested correctly) XML? Do you know about XSLT?
I'm more of a Python or Java man myself, but I'm a little rusty&tired atm. It could probably be done with a grep statement of course
As you say, if you're writing the file searching yourself, you need to:
create an empty output file buffer
list the files
open them one at a time
start from the beginning, treat the file contents as a long string, look for the start string
if you find it, look from that place onward for the end string
if you find it, what's in between is what to append to the output file buffer OR copy what's after the start string until you maybe find the end string
close input file and move on to next file until none left
write output file
You may want to learn an efficient string matching algorithm if you're DIYing this.