Python: Logging into a site

wsande · 03-13-2006, 04:26 PM

I am trying to do the same as Baix. My code is not working. On the site I'm trying to log into, the login pane has "Login" and a "Quick Access" tab. I usually use the "quick access" tab.

After loggin in , I want to go to the "All In One" (gallinone.aspx) and parse the tables to stuff some of the data into a spreadsheet.

The site is:
www_dot_poolexpert_dot_com
(because I'm new, the board won't let me post URL's...)

I can tell you my login, (it's only a Hockey Pool!)
Pool Name: TNHP
password: tropic
(This isn't the admin login, just to view results...)

The code I have so far is this:

Code:

#COOKIEFILE = 'c:/local_documents/python/cookies.lwp'
import os.path
import urllib
import urllib2
import cookielib            
import HTMLParser

class MyParser(HTMLParser.HTMLParser):
	def __init__(self):
		HTMLParser.HTMLParser.__init__(self)
		self.data_type = ""
	def handle_data(self, data):
		if not self.data_type:
			if data.lower() == "point balance":
				self.data_type = "balance"
			elif data.lower() == "points available to redeem":
				self.data_type = "points available to redeem"
			elif data.lower() == "pending points":
				self.data_type = "pending points"
		else:
			print "%s: %s" % (self.data_type, data)
			self.data_type = ""


#Create empty cookie jar.
cj = cookielib.LWPCookieJar()
#cj.load(COOKIEFILE)

#Install cookie handler for urllib2.
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)

login_url = 'www1_dot_poolexpert_dot_com/gallinone_dot_aspx/'  

# txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}   

#Create initial request -- This is like when you first browse to the page.  Since the cookie jar was empty, it will
#be like you initially cleared them from your browser.
#Cookies may set at this point.
request = urllib2.Request("www1_dot_poolexpert_dotcom", None)
page_handle = urllib2.urlopen(request)
page_handle.close()

#Now you have to make a request like you submitted the form on the page.  
#Notice that two hidden fields plus the email and password fields are sent to the form processing page.
txdata = urllib.urlencode({'poolname':'TNHP', 'poolpwd':'tropic'})
request = urllib2.Request(login_url, txdata)
page_handle = urllib2.urlopen(request)

print 'Here are the headers of the page :'
print page_handle.info()                             # page_handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
print "The true url of the page is  :",
print page_handle.geturl()
print "The page is :"
page_html = page_handle.read()
print page_html

page_handle.close

parser = MyParser()
parser.feed(page_html)

This seems to be the same as Baix ended up with, but I can't get mine to work. I did look at the page code with DOM Inspector, but I don't really know what to look for. I think the fact that there are really 4 inputs , 2 for the "Login" tab and 2 for the "Quick Access" tab may be what is messing me up, but I can't get it to work, through several permutations.

Any help would be greatly appreciated.

carl.waldbieser · 03-13-2006, 04:47 PM

Quote:

Originally Posted by wsande

I am trying to do the same as Baix. My code is not working. On the site I'm trying to log into, the login pane has "Login" and a "Quick Access" tab. I usually use the "quick access" tab.

After loggin in , I want to go to the "All In One" (gallinone.aspx) and parse the tables to stuff some of the data into a spreadsheet.

The site is:
www_dot_poolexpert_dot_com
(because I'm new, the board won't let me post URL's...)

I can tell you my login, (it's only a Hockey Pool!)
Pool Name: TNHP
password: tropic
(This isn't the admin login, just to view results...)

I tried the login, but it didn't seem to work, so I didn't get too far. Do you have any experience using a packet sniffer like ethereal or tcpdump (wintcpdump for windows, I think)? They could help you see what is really being sent back and forth if you filter on just the HTTP stream.

What kind of outputs are you getting? Can you see if you are making it past the login page or is you are just being redirected back?

wsande · 03-14-2006, 08:23 AM

I need to make a few posts first...

wsande · 03-14-2006, 08:24 AM

... cause I can't post url's until I've made 5 posts...

wsande · 03-14-2006, 08:25 AM

... and I want to include the HTTP traces I captured...

wsande · 03-14-2006, 08:25 AM

... and they are full of url's...

wsande · 03-14-2006, 08:27 AM

I did use HttpDetect to look at what is going on.

Here's the trace from a sucessful login from my browser:

When I first open "www.poolexpert.com":

Code:

GET / HTTP/1.1
Host: www.poolexpert.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: mm841=2477; lang=en; email=TNHP; ce=1; IFsess=1; iu=21471; x3=841; h8=1

HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:51:02 GMT
X-Powered-By: ASP.NET
Pragma: no-cache
Content-Type: text/html
Set-Cookie: ce=1; domain=.poolexpert.com; path=/
Set-Cookie: lang=en; expires=Thu, 13-Apr-2006 12:51:02 GMT; domain=.poolexpert.com; path=/
Cache-control: no-cache
Content-Encoding: gzip
Transfer-Encoding: chunked
Expires: Wed, 01 Jan 1997 12:00:00 GMT
Vary: Accept-Encoding


GET /data/?if_nt_TimeZoneOffset=-300&Customer=PoolExpert&SportsType=None&Section=Home&PageName=default.asp&Language=en&tax25_CookieAccept=Y&tax0_SiteID=2&if_nt_URL=http%3A//www.poolexpert.com/ HTTP/1.1
Host: fsncollect.247realmedia.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Accept: image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.poolexpert.com/
Cookie: RMID=8e2ecd6243d94e10; forbes072804=999; RMFS=011F40nlU306s5I; fsn051404=1369094286725476093; lycos041904=1873497468237572963; NSC_403a51140050=0a0ff01c0050

HTTP/1.1 200 OK
Date: Tue, 14 Mar 2006 13:51:03 GMT
Server: Apache
Set-Cookie: fsn051404=1369094286725476093; expires=Wed, 14-Mar-2007 13:51:03 GMT; path=/; domain=.247realmedia.com
Content-Length: 125
Pragma: no-cache
Cache-Control: no-cache
P3P: CP="NOI DSP COR NID CUR OUR NOR",policy_ref="http://www.247realmedia.com/w3c/p3p.xml"
Keep-Alive: timeout=300
Connection: Keep-Alive
Content-Type: image/gif
Set-Cookie: NSC_403a51140050=0a0ff01c0050;path=/

Then, when I select the "Quick Access" tab, type in my login info and click "Sign In", I get:

Code:

GET /userlogin.aspx?co=1&ui=pc3&redirect=default.asp&rs=gpoollogin.aspx%3Fpoolname%3DTNHP%26pwd%3Dtropic&rf=default.asp&email=support%40poolexpert.com&pwd=test&poolname=TNHP&poolpwd=tropic&x=35&y=13 HTTP/1.1
Host: www1.poolexpert.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.poolexpert.com/
Cookie: mm841=2477; lang=en; email=TNHP; ce=1; IFsess=5

HTTP/1.1 302 Found
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:54:26 GMT
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Pragma: no-cache
Location: /gpoollogin.aspx?poolname=TNHP&pwd=tropic
Set-Cookie: lang=en; domain=.poolexpert.com; expires=Thu, 13-Apr-2006 12:54:27 GMT; path=/
Set-Cookie: iu=21471; domain=.poolexpert.com; path=/
Set-Cookie: 9x=; domain=.poolexpert.com; expires=Sun, 14-Mar-1976 13:54:27 GMT; path=/
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
Content-Type: text/html; charset=iso-8859-1
Content-Length: 162


GET /gpoollogin.aspx?poolname=TNHP&pwd=tropic HTTP/1.1
Host: www1.poolexpert.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.poolexpert.com/
Cookie: mm841=2477; lang=en; email=TNHP; ce=1; IFsess=5; iu=21471

HTTP/1.1 302 Found
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:54:26 GMT
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Pragma: no-cache
Location: /ginfocenter.aspx
Set-Cookie: lang=en; domain=.poolexpert.com; expires=Thu, 13-Apr-2006 12:54:27 GMT; path=/
Set-Cookie: x3=841; domain=.poolexpert.com; path=/
Set-Cookie: h8=1; domain=.poolexpert.com; path=/
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
Content-Type: text/html; charset=iso-8859-1
Content-Length: 134


GET /ginfocenter.aspx HTTP/1.1
Host: www1.poolexpert.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.poolexpert.com/
Cookie: mm841=2477; lang=en; email=TNHP; ce=1; IFsess=5; iu=21471; x3=841; h8=1

HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:54:26 GMT
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Pragma: no-cache
Pragma: no-cache
Pragma: no-cache
Pragma: no-cache
Pragma: no-cache
Pragma: no-cache
Set-Cookie: lang=en; domain=.poolexpert.com; expires=Thu, 13-Apr-2006 12:54:27 GMT; path=/
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=iso-8859-1
Content-Encoding: gzip
Transfer-Encoding: chunked
Expires: Wed, 01 Jan 1997 12:00:00 GMT
Vary: Accept-Encoding

(more HTTP request/responses to load other parts of the page...}

When the script tries to log in, I get a page that says:
"You tried to use a PoolExpert feature that require to be loged in a PoolExpert account"
If you already own your PoolExpert account, use the form
on the left to login into your account...."

Here's the HTTP trace from running the script:

Code:

GET / HTTP/1.1
Accept-Encoding: identity
Host: www1.poolexpert.com
Connection: close
User-agent: Python-urllib/2.4

HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:59:56 GMT
X-Powered-By: ASP.NET
Connection: close
Pragma: no-cache
Content-Length: 16476
Content-Type: text/html
Expires: Tue, 14 Mar 2006 13:59:56 GMT
Set-Cookie: ce=1; domain=.poolexpert.com; path=/
Set-Cookie: lang=en; expires=Thu, 13-Apr-2006 12:59:56 GMT; domain=.poolexpert.com; path=/
Cache-control: no-cache


POST /gallinone.aspx/ HTTP/1.1
Accept-Encoding: identity
Content-length: 28
Host: www1.poolexpert.com
User-agent: Python-urllib/2.4
Connection: close
Cookie: lang=en; ce=1
Content-type: application/x-www-form-urlencoded

HTTP/1.1 100 Continue
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:59:56 GMT
X-Powered-By: ASP.NET


GET /userinfo.asp?rd=gallinone.aspx%3f HTTP/1.1
Accept-Encoding: identity
Host: www1.poolexpert.com
Cookie: lang=en; ce=1
Connection: close
User-agent: Python-urllib/2.4

HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:59:56 GMT
X-Powered-By: ASP.NET
Connection: close
Pragma: no-cache
Content-Length: 23446
Content-Type: text/html
Expires: Tue, 14 Mar 2006 13:59:57 GMT
Set-Cookie: ce=1; domain=.poolexpert.com; path=/
Set-Cookie: lang=en; expires=Thu, 13-Apr-2006 12:59:56 GMT; domain=.poolexpert.com; path=/
Cache-control: no-cache

I notice that the browser login does not use POST, only GET. Is a POST the same as a GET with data inserted in the url?

Perhaps I need to try opening the "/userlogin" page before trying to go to the "/ginforcenter" or "/gallinone" pages.

Your help is much appreciated.

(By the way, your concise explanation eariler in this thread of HTTP and cookies was one of the best I've seen for beginners.)

carl.waldbieser · 03-14-2006, 05:24 PM

Quote:

Originally Posted by wsande

I notice that the browser login does not use POST, only GET. Is a POST the same as a GET with data inserted in the url?

Perhaps I need to try opening the "/userlogin" page before trying to go to the "/ginforcenter" or "/gallinone" pages.

Your help is much appreciated.

(By the way, your concise explanation eariler in this thread of HTTP and cookies was one of the best I've seen for beginners.)

I finally realized you were logging in on the "Quick Access" tab. You are correct about an HTTP GET. Your research shows that all the form data is being sent on the URL query string, including your username and password. These lines:

Code:

GET /userlogin.aspx?co=1&ui=pc3&redirect=default.asp&rs=gpoollogin.aspx%3Fpoolname%3DTNHP%26pwd%3Dtropic&rf=default.asp&email=support%40poolexpert.com&pwd=test&poolname=TNHP&poolpwd=tropic&x=35&y=13 HTTP/1.1
Host: www1.poolexpert.com

I have bolded where your username & password are appearing in the URL. You should just need to change the code to do a GET instead of a POST.

Here is a simple shell script using curl can log in and get the page:

Code:

#!/bin/bash

curl -c cj.txt 'http://www.poolexpert.com'
curl -b cj.txt -L 'http://www.poolexpert.com/userlogin.aspx?co=1&ui=pc3&redirect=default.asp&rs=gpoollogin.aspx%3Fpoolname%3DTNHP%26pwd%3Dtropic&rf=default.asp&email=support%40poolexpert.com&pwd=test&poolname=TNHP&poolpwd=tropic&x=35&y=13'

For the first command, I use curl's -c option to accept cookies and write them to a cookie jar file I called "cj.txt". In the second command, I use curl's -b option to send back the cookies in the cookie jar file, and the -L option to follow the redirect (I think urllib will follow the 302 redirect without you having to specify that explicitly).

wsande · 03-15-2006, 03:36 PM

Thanks to your help, it's now working.

Much appreciated.

wsande

wsande · 04-23-2007, 01:06 PM

After having my script work happily for over a year, we have some IT infrastructure change, and now all external web access goes through a proxy server.

Every time I start a browser session manually, I have to login to the proxy server with a password.

I'm wondering how to make this work in my script (see previous posts).

Any help would be greatly appreciated.

ws

carl.waldbieser · 04-29-2007, 09:57 AM

Quote:

Originally Posted by wsande

After having my script work happily for over a year, we have some IT infrastructure change, and now all external web access goes through a proxy server.

Every time I start a browser session manually, I have to login to the proxy server with a password.

I'm wondering how to make this work in my script (see previous posts).

Any help would be greatly appreciated.

ws

What kind of authentication is required? If it is NTLM (Windows) authentication, there is an intermediate proxy server you can set up. If it is a simple proxy server, I think the urlib2 module might be able to handle it.

wsande · 04-30-2007, 09:41 AM

It requires Basic Proxy authentication. I did a bit more searching, and it seems that the urllib2 proxy auth handling has some bugs (at least in Python 2.4 and earlier).

http://www.thescripts.com/forum/thread474088.html

But I did get something that works. Based on the thread above, I added a proxy handler to the opener:

Code:

proxy_handler = urllib2.ProxyHandler({"http": "http://my_user:my_passwd@my_proxy_addr:8080"})
opener = urllib2.build_opener(proxy_handler, urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)

This is transparent to the rest of my code, and it works fine.

ws