LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-13-2014, 05:30 PM   #1
methodtwo
Member
 
Registered: May 2007
Posts: 146

Rep: Reputation: 18
python mechanize scraping questions


Hi
I'm using python mechanise to login to a site and make a booking. I can login over SSL. I can follow links and get in few pages in to the process of making a booking. The "link" that i need to select/"click on" to proceed is, i believe, in a form. Here it is:
<
Code:
table class="tableResult" cellspacing="0" rules="all" summary etc etc
                        <input type="submit" name="ctl00$MainContent$etc etc etc" value="Squash" id="ctl00_MainContentetc etc etc" class="BookingLinkButton" />
The one that needs "clicking" is the one with the value of "Squash".
The form that needs selecting looks opens like this:
Code:
<form name="Form" method="post" action="file.aspx" onsubmit="javascript:return Submit();" id="Form">
I know that python mechanize has no javascript engine. But the login,which worked perfectly well, opened with the same xhtml
That's what made me think it is possible to grab this form with mechanize and then "submit", somehow, a value of squash. I've tried:
Code:
self.br.form = list(self.br.forms())[0]	
self.resp2 = self.br.submit(name="ctl00$MainContent$etc etc etc", label='Squash')
And all the normal variants like:
Code:
self.br.form = self.br.select_form(name="Form")
#ORů..
self.br.form = self.br.form().next()
I always get this error:
Code:
if self.value is None: self.value = ""
  File "/Library/Python/2.7/site-packages/mechanize/_form.py", line 1221, in __setattr__
    raise AttributeError("control '%s' is disabled" % self.name)
AttributeError: control 'ctl00$Search1$_etc_etc$_searchBtn' is disabled
I'm really at a loss. I don't know if this is the fix. But If it is i don't know how to apply the "monkey patch" do i just paste it in to _form.py? But replacing which class' __init__?
Thank you very much for any help
 
Old 03-13-2014, 05:58 PM   #2
Sydney
Member
 
Registered: Mar 2012
Distribution: Scientific Linux
Posts: 147

Rep: Reputation: 36
Because you said any help. You may want to try and post the form data after logged in without using the button and Javascript. The nice thing about Javascript is it is client side so you may be able to read it, find out what it does and then process the form to the end destination yourself. One way to do this is to make an offline form that once you are logged in you can navigate to and do a post from or you can just use a straight post. Unless you are interested in the return page from the submit this often works to get around stuff like this.
 
Old 03-13-2014, 06:25 PM   #3
methodtwo
Member
 
Registered: May 2007
Posts: 146

Original Poster
Rep: Reputation: 18
Thank you very much for the reply. I'm not sure i understand.sorry.So i just look at the form that would have been sent normally? Then post a 'fake' one? without getting a 'real' one with something like:
Code:
self.br.form = list(self.br.forms())[0]
Right? So it would be something like:
Code:
self.home.made.form['ctletc etc'] = 'make selection with assignment'
self.home.made.form['ctletc etc'] ='other values'

Last edited by methodtwo; 03-13-2014 at 06:37 PM.
 
Old 03-13-2014, 09:39 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,520

Rep: Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820
Quote:
Originally Posted by methodtwo View Post
i don't know how to apply the "monkey patch" do i just paste it in to _form.py? But replacing which class' __init__?
The way a "monkey patch" works is that you evaluate some code which updates the definition of a broken function/class in memory. So just call the monkeypatch_mechanize() function before you use the library.

It appears from the github page that the library has many bugs and hasn't seen any updates for 2 years; you might think about looking for a different one.

Last edited by ntubski; 03-13-2014 at 09:40 PM. Reason: missed word
 
Old 03-14-2014, 11:57 AM   #5
methodtwo
Member
 
Registered: May 2007
Posts: 146

Original Poster
Rep: Reputation: 18
Silly me. The login worked because it didn't use javascript. Selecting the right links for further navigation didn't work because mechanise doesn't have a javascript engine. I just assumed it would because it was awesome in so many other ways.Shame
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Shell Scripting: Scraping Public IP and Emailing - [: too many arguments zer0signal Programming 20 07-06-2011 09:28 AM
LXer: Web scraping with Python (Part 2) LXer Syndicated Linux News 0 09-04-2009 10:00 PM
LXer: Web Scraping with Python LXer Syndicated Linux News 0 12-03-2008 04:40 PM
HTML scraping meadensi Programming 2 06-09-2005 02:17 AM
WWW::Mechanize dexter_modem Programming 0 06-23-2003 05:49 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration