LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-05-2011, 05:43 PM   #1
tbgclark
LQ Newbie
 
Registered: Nov 2010
Location: Warwickshire, UK.
Distribution: Fedora
Posts: 3

Rep: Reputation: 0
Automating access to interactive sites which use HTML and javascript


Is there a modern equivalent of "Expect", which was developed for the UNIX environment to automate access to programs which had been designed to interact only with a human?

Now with many facilities only accessible by HTML and javascript, what solutions are there to access such websites from a Perl script for example?

The need to access interactive systems by automated agents is the same now as when Don Libes originally wrote Expect, described well in his paper to the Summer 1990 Usenix Conference, Expect: Curing Those Uncontrollable Fits of Interaction.

I want my computer to access the web - I don't believe I should have to be chained to it by mouse and keyboard before it can interact with web sites. Surely there are some tools for achieving this. I've searched, but haven't been able to find any. Although I'd like to do it from a Perl script, I'd consider any language: C, PHP, anything.

Tim Clark
(How often am I on the Internet? I'm never on it. My computers are though, and they're on it 24/7)
 
Old 03-06-2011, 03:41 AM   #2
j-ray
Senior Member
 
Registered: Jan 2002
Location: germany
Distribution: ubuntu, mint, suse
Posts: 1,591

Rep: Reputation: 145Reputation: 145
Actually I don'get what Expect is doing or was intended to do.

With Perl's LWP you can create user agents and crawl the web with it. You can fetch content of all formats, i. e. html, xml or whatever. There are modules for using XML, webservices like SOAP::WSDL...

http://search.cpan.org

If you tell us a bit more detailed what you intend we probably can help you better.
 
Old 03-06-2011, 03:07 PM   #3
tbgclark
LQ Newbie
 
Registered: Nov 2010
Location: Warwickshire, UK.
Distribution: Fedora
Posts: 3

Original Poster
Rep: Reputation: 0
OK, let's take a simple example. To download my statement from my bank I have to go to its website, go through the security checks, choose which of my accounts I want the details of, enter dates I want the statement to cover, tell it what format I want it in (CSV), and then download it. I would far prefer to have my computer do all that without me having to actually be sat at it and then doing it all myself.

Expect was a UNIX tool designed to interact with interactive applications in the days when interaction was by fairly simple character streams. The idea was that one could get a program to do the work, making appropriate responses depending on what output the interactive application produced. I used the Perl version of it (and its cousin "chat") extensively.

With the move to the web, "wget" and "lynx" could be used. Then as web sites got more complex and started using forms, it required more ingeneous use, sending the posted information via HTTP, saving cookies, parsing the HTML which came back, but with some effort it was achievable. The final nail in the coffin was when websites started using Javscript extensively. It seems that having a human chained to the computer is the only way to use such sites.

There are numerous other cases where I need to fetch data regularly, and the modern trend is to make the data available only via the web, and increasingly with Javascript in the way. But let me use another illustration, this time with data going in the other direction, a task I perform at work. I (or rather programs I've written, running automatically) regularly calculate some values, as the result of data gathered. These must be added to graphs on the company's website. The company has what its website providers laughingly call a "content management system", apparently designed to get in the way and stop data being loaded onto the website, or being managed in any way. It is accessed via the web, and to add new values to the graph, a human has to negotiate the Javascript-ridden site and eventually type in the values to be added to the graph. To have a human needlessly inserted in the process is not only mind-numbingly boring for them, but an obvious source of error. Since the website providers won't allow the data to be provided in any other way, it cries out for having it done by a computer program rather than a human.
 
Old 03-07-2011, 12:59 AM   #4
j-ray
Senior Member
 
Registered: Jan 2002
Location: germany
Distribution: ubuntu, mint, suse
Posts: 1,591

Rep: Reputation: 145Reputation: 145
It will be quite difficult to write a script that can logon to your bank account probably because login and password fields nowadays are not named "login" and "password" or whatever you can easily imagine. Names are given dynamically for security reasons so they change with every session, hmm.
Are you allowed to connect to the database your CMS is working with? If not why don't you change the provider?
 
Old 03-19-2011, 04:50 AM   #5
gnuweenie
Member
 
Registered: Oct 2010
Posts: 35

Rep: Reputation: Disabled
It's a good question. When it comes to banking, it's quite annoying for technical users to have to point-click-point-click-point-click-wait-point-click-point-click-type-point-click for each bank statement (which is why I started a thread on pgp-aware banks).

I wonder if it would work well to use a tool like Xrunner. It's automation, but does not cut out the GUI, so the bank can change the field names all they want, as long as they don't move the location of the fields.

Xrunner also supports context sensitive instructions, so it could look for the fields by object names, and if the names have changed it could revert to playing back a click at a particular screen coordinate. Xrunner is designed for cases where all the GUI widgets are part of the application. It would be interesting to find out if it can handle a browser, where the GUI is constructed from html, not the application itself.

If you like the GUI test tool approach, check out the lists of tools here and here. It appears Selenium may be a good option from that list (seems to be web-centric).

Another possibility is Beautifulsoup. So far I'm avoiding it because python makes me cringe.

Be aware that with some of the automation approaches, you could be vulnerable to a MITM attack, because you are not visually checking the SSL cert. Someone using sslstrip could see all your packets.

Last edited by gnuweenie; 03-19-2011 at 05:20 AM.
 
Old 03-20-2011, 02:39 PM   #6
tbgclark
LQ Newbie
 
Registered: Nov 2010
Location: Warwickshire, UK.
Distribution: Fedora
Posts: 3

Original Poster
Rep: Reputation: 0
Thanks for those useful pointers, gnuweenie. I looked for Xrunner, and from what I read about it, it seems a very good basis for the right generic solution to the problem.

When UNIX programs used character stream input and output for interaction, Expect was the solution, as described by its author, Don Libes of NIST: in his presentation Expect: Curing Those Uncontrollable Fits of Interaction given at the Summer 1990 USENIX Conference:
Quote:
UNIX programs used to be designed so that they could be connected with pipes created by a shell. This paradigm is insufficient when dealing with many modern programs that demand to be used interactively. Expect is a program designed to control interactive programs.
The way that the original Expect could work with all forms of UNIX interactive programs was to get them running on a pseudo-tty so that it could insert itself into both the input (keyboard to interactive program) and output (interactive program to screen) streams. Roll forward 20 years, and in place of a pseudo-tty something which operates as an X server is the obvious way to achieve the same sort of functionality. So Xrunner seems just the right approach.

The three big downsides are firstly that Xrunner sees to have disappeared - fragments of what I found Googling it:
Quote:
Mercury Interactive's XRunner test suite ... was a really good tool for testing X-Windows applications. ... HP bought Mercury Interactive in 2006. There is no mention what they did XRunner. It appears to have disappeared from sight, which is sad because it was a really effective tool.
Secondly, although the X server seems absolutely the right place to put something which is going to insert itself into the computer - human interaction path, I shudder to imagine the work involved in handling even some of the simpler such tasks.

Finally, the writing seems to be on the wall for X, due to Wayland.

So, I'll turn away from the purist way of doing it, and investigate some of the useful alternatives you give, which might be the more pragmatic way of getting a solution, even if I do have to tear myself away from writing Perl, PHP, and the occasional bit of C to get tangled up in Python.

Although I don't want to solve the problem anew each time I come across an instance of interactive-itis getting in the way, a pragamatic approach is what's called for. After all, at work my complaints that it's difficult to upload data into our web service provider's CMS are not likely to have a great influence and cause a change of providers. Even where it's totally my choice, about the bank I use, I put other things ahead of the ease with which I can download statements.

My aim is to build up an armoury of appropriate tools I can use when I want my computer to do the work instead of me. I'm tired of being a slave to the modern-day luddites who insist that a human be chained to the computer by mouse and keyboard before it can do any useful work! gnuweenie's post was most useful, thanks.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Access HTML elements via Javascript with internal ID action_owl Programming 4 04-02-2010 09:39 AM
HTML/Javascript - Can I access a user's local SQL server (with their permission) fcdev Programming 5 09-07-2009 03:08 AM
Automating Interactive script soupbone38 Programming 1 04-16-2009 09:50 AM
Html Javascript help apt Programming 3 03-20-2005 11:46 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:06 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration