LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-10-2011, 08:33 AM   #1
Dogs
Member
 
Registered: Aug 2009
Location: Houston
Distribution: Slackware 13.37 x64
Posts: 105

Rep: Reputation: 25
Python: Extract names and values from HTML tags


I'm working on a project at work to automate sending e-mails to customers.

Everything is in place except my ability to extract the useful data from HTML tags to use in the formation of the POST.


Code:
      <td width="25%" bgcolor="bisque"><b><font color="blue">From</font></b></td>
      <td width="25%" bgcolor="bisque"><input type="text" name="TechName" value='MY NAME'></td>
      <td width="25%" bgcolor="bisque"><input type="text" name="TechEmail" style="background-color: #FFFF00" size="30" value='MY EMAIL ADDRESS'></td>
      <td width="25%" bgcolor="bisque"><input type="text" name="TechPhone" size="30" value='MY PHONE NUMBER'></td>

I want to disregard everything but the bolded portions.

so I need to figure out how I can copy the thing in quotes after name=, and then the thing in quotes after value, for each occurence in the file, and some of these items may not be contained on the same line. (perhaps though, with beautiful soup, they would be.)

There are 10 or so standard values that I have to collect that show up only once per e-mail.

Then there is a looping section which contains incremented IDs along with associated content, following that same name="" value =' ' structure, but in some cases name and value are separated by other variables such as size and style, which I do not need (and these are the cases where one line may not contain both the name and the value).


How can I do multi-line searching in Python, and what is a suitable way to tackle this problem?

My current idea is to accept that the values are in order all the time, and do string.find("value="), then step forward in the string to just after the =' and assign that section up to the next ' to the "name" field that represents the actual variable in the POST, but this is a Cish way of doing it with arrays and indexes and whatnot, and it still doesn't address the multiline issue. I'd rather be good at Python than good at making Python behave like C.

Last edited by Dogs; 02-10-2011 at 08:38 AM.
 
Old 02-10-2011, 08:45 AM   #2
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
and why are not using BeautifulSoup?
 
Old 02-10-2011, 08:56 AM   #3
Dogs
Member
 
Registered: Aug 2009
Location: Houston
Distribution: Slackware 13.37 x64
Posts: 105

Original Poster
Rep: Reputation: 25
Ok, I got acquainted with BeautifulSoup, but of a document that is appx 20kb, only the first 1kb or so is utilized...

Is there a tag or something in there that makes BeautifulSoup stop?

Last edited by Dogs; 02-10-2011 at 02:19 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract Data between XML tags aharrison Linux - Newbie 13 11-17-2010 07:28 PM
extract values from array PHP Randall Slack Programming 2 07-02-2009 06:52 AM
Script to extract the fields in the agiml tags akhtar.bhat Linux - Software 1 12-17-2008 06:13 AM
Need to extract certain values from kudzu-output. MheAd Linux - Newbie 3 07-02-2008 05:02 AM
strip html tags rblampain Programming 6 08-07-2005 06:22 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration