Carl,
Thanks for that. I think the site uses form based authentication :-
Code:
<h1>Helpdesk Login</h1> Authorised xxxxx staff and clients may login here.<br><br> <form target="_self" action="index.php" method="POST" id="form_form" > <input type='hidden' name='node' value='578'> <input type='hidden' name='form_refresh' value='0'>
I tried a script that you posted somewhere else :-
import urllib
import urllib2
import cookielib
#Create empty cookie jar.
cj = cookielib.LWPCookieJar()
#Install cookie handler for urllib2.
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
#For ClientCookie module(?)
# opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
# ClientCookie.install_opener(opener)
#Create initial request -- This is like when you first browse to the page. Since the cookie jar was empty, it will
#be like you initially cleared them from your browser.
#Cookies may set at this point.
request = urllib2.Request("http://intranetatwork/", None)
f = urllib2.urlopen(request)
f.close()
#Now you have to make a request like you submitted the form on the page.
#ClientForms would be good for this, but I don't have the docs handy. I will just do it the hard way. Assume
#the form action is "http://www.mypoints.com/login.cgi" and the method is POST.
#Further assume the names of the login and password fields are "login" and "password".
data = urllib.urlencode({"login": "loginname", "password" : "password"})
request = urllib2.Request("http:/intranetatwork/index.php?node=2371&pagetree=&mode=ticket_view&objectid=26121", data)
f = urllib2.urlopen(request)
#I am assuming that at this point you log into the screen you want to scrape.
#If not, you will have to request the page you want to scrape at this point.
#Read the page.
html = f.read()
f.close()
newfile = open("newfile.html",'w')
newfile.write(html)
newfile.close()
#Parse the html here (html contains the page markup)
print 'finished'
And I just downloaded the welcome page not the page at :-
http:/intranetatwork/index.php?node=2371&pagetree=&mode=ticket_view&objectid=26121
Any ideas ?