ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am trying to do the same as Baix. My code is not working. On the site I'm trying to log into, the login pane has "Login" and a "Quick Access" tab. I usually use the "quick access" tab.
After loggin in , I want to go to the "All In One" (gallinone.aspx) and parse the tables to stuff some of the data into a spreadsheet.
The site is:
www_dot_poolexpert_dot_com
(because I'm new, the board won't let me post URL's...)
I can tell you my login, (it's only a Hockey Pool!)
Pool Name: TNHP
password: tropic
(This isn't the admin login, just to view results...)
The code I have so far is this:
Code:
#COOKIEFILE = 'c:/local_documents/python/cookies.lwp'
import os.path
import urllib
import urllib2
import cookielib
import HTMLParser
class MyParser(HTMLParser.HTMLParser):
def __init__(self):
HTMLParser.HTMLParser.__init__(self)
self.data_type = ""
def handle_data(self, data):
if not self.data_type:
if data.lower() == "point balance":
self.data_type = "balance"
elif data.lower() == "points available to redeem":
self.data_type = "points available to redeem"
elif data.lower() == "pending points":
self.data_type = "pending points"
else:
print "%s: %s" % (self.data_type, data)
self.data_type = ""
#Create empty cookie jar.
cj = cookielib.LWPCookieJar()
#cj.load(COOKIEFILE)
#Install cookie handler for urllib2.
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
login_url = 'www1_dot_poolexpert_dot_com/gallinone_dot_aspx/'
# txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
#Create initial request -- This is like when you first browse to the page. Since the cookie jar was empty, it will
#be like you initially cleared them from your browser.
#Cookies may set at this point.
request = urllib2.Request("www1_dot_poolexpert_dotcom", None)
page_handle = urllib2.urlopen(request)
page_handle.close()
#Now you have to make a request like you submitted the form on the page.
#Notice that two hidden fields plus the email and password fields are sent to the form processing page.
txdata = urllib.urlencode({'poolname':'TNHP', 'poolpwd':'tropic'})
request = urllib2.Request(login_url, txdata)
page_handle = urllib2.urlopen(request)
print 'Here are the headers of the page :'
print page_handle.info() # page_handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
print "The true url of the page is :",
print page_handle.geturl()
print "The page is :"
page_html = page_handle.read()
print page_html
page_handle.close
parser = MyParser()
parser.feed(page_html)
This seems to be the same as Baix ended up with, but I can't get mine to work. I did look at the page code with DOM Inspector, but I don't really know what to look for. I think the fact that there are really 4 inputs , 2 for the "Login" tab and 2 for the "Quick Access" tab may be what is messing me up, but I can't get it to work, through several permutations.
I am trying to do the same as Baix. My code is not working. On the site I'm trying to log into, the login pane has "Login" and a "Quick Access" tab. I usually use the "quick access" tab.
After loggin in , I want to go to the "All In One" (gallinone.aspx) and parse the tables to stuff some of the data into a spreadsheet.
The site is:
www_dot_poolexpert_dot_com
(because I'm new, the board won't let me post URL's...)
I can tell you my login, (it's only a Hockey Pool!)
Pool Name: TNHP
password: tropic
(This isn't the admin login, just to view results...)
I tried the login, but it didn't seem to work, so I didn't get too far. Do you have any experience using a packet sniffer like ethereal or tcpdump (wintcpdump for windows, I think)? They could help you see what is really being sent back and forth if you filter on just the HTTP stream.
What kind of outputs are you getting? Can you see if you are making it past the login page or is you are just being redirected back?
When the script tries to log in, I get a page that says:
"You tried to use a PoolExpert feature that require to be loged in a PoolExpert account"
If you already own your PoolExpert account, use the form
on the left to login into your account...."
Here's the HTTP trace from running the script:
Code:
GET / HTTP/1.1
Accept-Encoding: identity
Host: www1.poolexpert.com
Connection: close
User-agent: Python-urllib/2.4
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:59:56 GMT
X-Powered-By: ASP.NET
Connection: close
Pragma: no-cache
Content-Length: 16476
Content-Type: text/html
Expires: Tue, 14 Mar 2006 13:59:56 GMT
Set-Cookie: ce=1; domain=.poolexpert.com; path=/
Set-Cookie: lang=en; expires=Thu, 13-Apr-2006 12:59:56 GMT; domain=.poolexpert.com; path=/
Cache-control: no-cache
POST /gallinone.aspx/ HTTP/1.1
Accept-Encoding: identity
Content-length: 28
Host: www1.poolexpert.com
User-agent: Python-urllib/2.4
Connection: close
Cookie: lang=en; ce=1
Content-type: application/x-www-form-urlencoded
HTTP/1.1 100 Continue
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:59:56 GMT
X-Powered-By: ASP.NET
GET /userinfo.asp?rd=gallinone.aspx%3f HTTP/1.1
Accept-Encoding: identity
Host: www1.poolexpert.com
Cookie: lang=en; ce=1
Connection: close
User-agent: Python-urllib/2.4
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Tue, 14 Mar 2006 13:59:56 GMT
X-Powered-By: ASP.NET
Connection: close
Pragma: no-cache
Content-Length: 23446
Content-Type: text/html
Expires: Tue, 14 Mar 2006 13:59:57 GMT
Set-Cookie: ce=1; domain=.poolexpert.com; path=/
Set-Cookie: lang=en; expires=Thu, 13-Apr-2006 12:59:56 GMT; domain=.poolexpert.com; path=/
Cache-control: no-cache
I notice that the browser login does not use POST, only GET. Is a POST the same as a GET with data inserted in the url?
Perhaps I need to try opening the "/userlogin" page before trying to go to the "/ginforcenter" or "/gallinone" pages.
Your help is much appreciated.
(By the way, your concise explanation eariler in this thread of HTTP and cookies was one of the best I've seen for beginners.)
I notice that the browser login does not use POST, only GET. Is a POST the same as a GET with data inserted in the url?
Perhaps I need to try opening the "/userlogin" page before trying to go to the "/ginforcenter" or "/gallinone" pages.
Your help is much appreciated.
(By the way, your concise explanation eariler in this thread of HTTP and cookies was one of the best I've seen for beginners.)
I finally realized you were logging in on the "Quick Access" tab. You are correct about an HTTP GET. Your research shows that all the form data is being sent on the URL query string, including your username and password. These lines:
Code:
GET /userlogin.aspx?co=1&ui=pc3&redirect=default.asp&rs=gpoollogin.aspx%3Fpoolname%3DTNHP%26pwd%3Dtropic&rf=default.asp&email=support%40poolexpert.com&pwd=test&poolname=TNHP&poolpwd=tropic&x=35&y=13 HTTP/1.1
Host: www1.poolexpert.com
I have bolded where your username & password are appearing in the URL. You should just need to change the code to do a GET instead of a POST.
Here is a simple shell script using curl can log in and get the page:
For the first command, I use curl's -c option to accept cookies and write them to a cookie jar file I called "cj.txt". In the second command, I use curl's -b option to send back the cookies in the cookie jar file, and the -L option to follow the redirect (I think urllib will follow the 302 redirect without you having to specify that explicitly).
After having my script work happily for over a year, we have some IT infrastructure change, and now all external web access goes through a proxy server.
Every time I start a browser session manually, I have to login to the proxy server with a password.
I'm wondering how to make this work in my script (see previous posts).
After having my script work happily for over a year, we have some IT infrastructure change, and now all external web access goes through a proxy server.
Every time I start a browser session manually, I have to login to the proxy server with a password.
I'm wondering how to make this work in my script (see previous posts).
Any help would be greatly appreciated.
ws
What kind of authentication is required? If it is NTLM (Windows) authentication, there is an intermediate proxy server you can set up. If it is a simple proxy server, I think the urlib2 module might be able to handle it.
It requires Basic Proxy authentication. I did a bit more searching, and it seems that the urllib2 proxy auth handling has some bugs (at least in Python 2.4 and earlier).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.