Python: Extract names and values from HTML tags

Dogs · 02-10-2011, 08:33 AM

I'm working on a project at work to automate sending e-mails to customers.

Everything is in place except my ability to extract the useful data from HTML tags to use in the formation of the POST.

Code:

      <td width="25%" bgcolor="bisque"><b><font color="blue">From</font></b></td>
      <td width="25%" bgcolor="bisque"><input type="text" name="TechName" value='MY NAME'></td>
      <td width="25%" bgcolor="bisque"><input type="text" name="TechEmail" style="background-color: #FFFF00" size="30" value='MY EMAIL ADDRESS'></td>
      <td width="25%" bgcolor="bisque"><input type="text" name="TechPhone" size="30" value='MY PHONE NUMBER'></td>

I want to disregard everything but the bolded portions.

so I need to figure out how I can copy the thing in quotes after name=, and then the thing in quotes after value, for each occurence in the file, and some of these items may not be contained on the same line. (perhaps though, with beautiful soup, they would be.)

There are 10 or so standard values that I have to collect that show up only once per e-mail.

Then there is a looping section which contains incremented IDs along with associated content, following that same name="" value =' ' structure, but in some cases name and value are separated by other variables such as size and style, which I do not need (and these are the cases where one line may not contain both the name and the value).

How can I do multi-line searching in Python, and what is a suitable way to tackle this problem?

My current idea is to accept that the values are in order all the time, and do string.find("value="), then step forward in the string to just after the =' and assign that section up to the next ' to the "name" field that represents the actual variable in the POST, but this is a Cish way of doing it with arrays and indexes and whatnot, and it still doesn't address the multiline issue. I'd rather be good at Python than good at making Python behave like C.

kurumi · 02-10-2011, 08:45 AM

and why are not using BeautifulSoup?

Dogs · 02-10-2011, 08:56 AM

Ok, I got acquainted with BeautifulSoup, but of a document that is appx 20kb, only the first 1kb or so is utilized...

Is there a tag or something in there that makes BeautifulSoup stop?