LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 07-31-2015, 03:31 PM   #16
wh33t
Member
 
Registered: Oct 2003
Location: Canada
Posts: 893

Original Poster
Rep: Reputation: 61

Quote:
Originally Posted by syg00 View Post
The other side is that the OPs supplier shouldn't be so bloody anal about this; give them the data.
And if the supplier is happy with the trawling of the pages, what can be the (general) objection ?.
For them it's more so about them not wanting to invest resources to fill one of their clients needs. They are old school and feel that manual data entry is where it's at I guess lol.

As for slimmer and phantom, it's a budget thing. Solve the problem for the least resources. I just need something that will process javascript, a browser already does that and there is already a whole toolkit we've designed to isolate string values based upon a pattern, and insert into our database etc.

Thanks everyone for your help!
 
Old 07-31-2015, 07:21 PM   #17
rigor
Member
 
Registered: Sep 2003
Location: 19th moon ................. ................Planet Covid ................Another Galaxy;............. ................Not Yours
Posts: 705

Rep: Reputation: Disabled
WRT using the least resources, AFAIK slimerjs is free. If the resources your concerned about is the time it would take someone to learn to use slimerjs, then there might be another way.

If toolkit that's been designed to isolate string values based upon a pattern is intended to work with a file on disk, and that's you talked about "dump out the html source that it has rendered to a text file", then it sounds like your headed into "The Kludge Zone". If so, and since you mentioned AJAX, if you don't have a cleaner way to do what you need, readily available to you, you might want to consider using the linux xdotool command. The xdotool command can manipulate windows, move them, resize them, provide them with text input, mouse events, etc.

It sounds as if might be used like this in your situation:

1) Get a list of item ID's for which you wish to get new data, put the ID's into a file.
2) Use the file as input to a shell script which behaves as described in the following steps.
3) Start a browser telling it to bring up a particular web page. With a Firefox profile named 'simple' that could look like this:
Code:
firefox  -p simple  'http://www.your_providers_domain.com/item_query.html' &
4) Use the sleep command to wait plenty of seconds expecting that the browser will be finished loading by then.
5) Run the xdotool command with options to find the browser's window which is open to the web page, then have it position and size the browser's window exactly.
6) With the browser window given a fixed size and position on the screen, you can use a "pixel ruler" such as KRuler to determine the exact X and Y pixel coordinates on the screen of the web page's input fields.
7) Read an item ID from the file containing a list of item ID's.
8) Run the xdotool command with options to enter the item ID at the exact X and Y pixel coordinates on the screen of the input for an item ID, and to press the on screen
button to submit the request to the web site.
9) Use the sleep command to wait plenty of seconds expecting that the browser will have the data by then.
10) Run the xdotool command with options to save the web page content to a file with a file name based on the item ID.
11) Loop back to number 7 until there are no more item ID's.
12) Run your toolkit to grab the new data from the saved files and update your database.

WRT item 4, to determine what number of seconds should be given to the sleep command, manually run the browser a few times to get an idea of a reasonable number of seconds to sleep inside the shell script.

WRT item 9, manually request data a few times to get an idea of a reasonable number of seconds to sleep inside the shell script.

If that makes sense for what you're doing but you'd like to make it cleaner, and if you're good with Javascript, use a browser extension such as Scriptish, which can run your own Javascript as it were part of the web page that was loaded. I use Scriptish because when I tried GreaseMonkey ( on which Scriptish is based ) GM didn't seem to work very well.

Have Scriptish run Javascript as if it were part of your Provider's web page, access the real document inside the wrapper ( the concept is discussed in the docs ), and for example, change the title of the web page once the requested data has been loaded; the new title can be almost the same as the old, but contain the item ID.

Instead of using a long fixed sleep while waiting for data, modify the shell script to loop with a much shorter sleep; but each time through the loop, xdotool searches for the browser window title with the desired item ID in it. If it's found, it's known that the new data has been loaded into the web page.

A HUGE KLUDGE to be sure, but it's also an approach that can be thrown together very quickly, is an automated way to update the database with the new info. YES, it DOES depend on the appearance of your provider's web page. So, at least in this regard follow good non-kludge programming practices and put the various X and Y pixel coordinates to be passed to the various uses of xdotool into shell script variables, so they can easily be changed near the beginning of the shell script if your provider's web page structure changes.

Only a simple manual check of the provider's web page structure is needed before subsequent automated updates are done.

I've actually built one or two things this way using a "state machine" pattern matching engine I wrote to find data in text files based on a sequence of patterns. I've given the result to people, it's worked for them, and they liked it.

I would normally prefer a much more connected/cleaner way of doing things, but if time is short and it doesn't have to polished/fancy, just work...

I hope this is in the spirit of how you're trying to accomplish your goal, makes the sense the way I've explained it, helps you, or if not, gives you some ideas that might be useful to you!

Last edited by rigor; 07-31-2015 at 07:24 PM.
 
1 members found this post helpful.
Old 07-31-2015, 07:42 PM   #18
wh33t
Member
 
Registered: Oct 2003
Location: Canada
Posts: 893

Original Poster
Rep: Reputation: 61
Quote:
Originally Posted by rigor View Post
WRT using the least resources, AFAIK slimerjs is free. If the resources your concerned about is the time it would take someone to learn to use slimerjs, then there might be another way.

If toolkit that's been designed to isolate string values based upon a pattern is intended to work with a file on disk, and that's you talked about "dump out the html source that it has rendered to a text file", then it sounds like your headed into "The Kludge Zone". If so, and since you mentioned AJAX, if you don't have a cleaner way to do what you need, readily available to you, you might want to consider using the linux xdotool command. The xdotool command can manipulate windows, move them, resize them, provide them with text input, mouse events, etc.

It sounds as if might be used like this in your situation:

1) Get a list of item ID's for which you wish to get new data, put the ID's into a file.
2) Use the file as input to a shell script which behaves as described in the following steps.
3) Start a browser telling it to bring up a particular web page. With a Firefox profile named 'simple' that could look like this:
Code:
firefox  -p simple  'http://www.your_providers_domain.com/item_query.html' &
4) Use the sleep command to wait plenty of seconds expecting that the browser will be finished loading by then.
5) Run the xdotool command with options to find the browser's window which is open to the web page, then have it position and size the browser's window exactly.
6) With the browser window given a fixed size and position on the screen, you can use a "pixel ruler" such as KRuler to determine the exact X and Y pixel coordinates on the screen of the web page's input fields.
7) Read an item ID from the file containing a list of item ID's.
8) Run the xdotool command with options to enter the item ID at the exact X and Y pixel coordinates on the screen of the input for an item ID, and to press the on screen
button to submit the request to the web site.
9) Use the sleep command to wait plenty of seconds expecting that the browser will have the data by then.
10) Run the xdotool command with options to save the web page content to a file with a file name based on the item ID.
11) Loop back to number 7 until there are no more item ID's.
12) Run your toolkit to grab the new data from the saved files and update your database.

WRT item 4, to determine what number of seconds should be given to the sleep command, manually run the browser a few times to get an idea of a reasonable number of seconds to sleep inside the shell script.

WRT item 9, manually request data a few times to get an idea of a reasonable number of seconds to sleep inside the shell script.

If that makes sense for what you're doing but you'd like to make it cleaner, and if you're good with Javascript, use a browser extension such as Scriptish, which can run your own Javascript as it were part of the web page that was loaded. I use Scriptish because when I tried GreaseMonkey ( on which Scriptish is based ) GM didn't seem to work very well.

Have Scriptish run Javascript as if it were part of your Provider's web page, access the real document inside the wrapper ( the concept is discussed in the docs ), and for example, change the title of the web page once the requested data has been loaded; the new title can be almost the same as the old, but contain the item ID.

Instead of using a long fixed sleep while waiting for data, modify the shell script to loop with a much shorter sleep; but each time through the loop, xdotool searches for the browser window title with the desired item ID in it. If it's found, it's known that the new data has been loaded into the web page.

A HUGE KLUDGE to be sure, but it's also an approach that can be thrown together very quickly, is an automated way to update the database with the new info. YES, it DOES depend on the appearance of your provider's web page. So, at least in this regard follow good non-kludge programming practices and put the various X and Y pixel coordinates to be passed to the various uses of xdotool into shell script variables, so they can easily be changed near the beginning of the shell script if your provider's web page structure changes.

Only a simple manual check of the provider's web page structure is needed before subsequent automated updates are done.

I've actually built one or two things this way using a "state machine" pattern matching engine I wrote to find data in text files based on a sequence of patterns. I've given the result to people, it's worked for them, and they liked it.

I would normally prefer a much more connected/cleaner way of doing things, but if time is short and it doesn't have to polished/fancy, just work...

I hope this is in the spirit of how you're trying to accomplish your goal, makes the sense the way I've explained it, helps you, or if not, gives you some ideas that might be useful to you!
Never heard of Kludge! Thanks for man. I'll definitely take all of that into account. I had no idea half that stuff existed.
 
Old 08-01-2015, 02:49 PM   #19
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,963

Rep: Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216Reputation: 5216
Quote:
Originally Posted by wh33t View Post
Never heard of Kludge! Thanks for man. I'll definitely take all of that into account. I had no idea half that stuff existed.
I think you misunderstood. "Kludge" isn't the name of a software project. It just means "something jury-rigged quickly to solve a particular problem."

http://www.catb.org/jargon/html/K/kluge.html

Last edited by dugan; 08-01-2015 at 02:50 PM.
 
Old 08-01-2015, 02:52 PM   #20
wh33t
Member
 
Registered: Oct 2003
Location: Canada
Posts: 893

Original Poster
Rep: Reputation: 61
Quote:
Originally Posted by dugan View Post
I think you misunderstood. "Kludge" isn't the name of a software project. It just means "something jury-rigged quickly to solve a particular problem."

http://www.catb.org/jargon/html/K/kluge.html
Haha, I knew what you meant. Kludge some kind of solution together.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
A linux you might not have heard about TigerLinux Linux - Distributions 8 01-27-2011 10:35 PM
anyone heard of Foresight Linux? shadowfx78 Linux - General 10 06-11-2007 06:43 AM
Ever heard of Alinux and HOW do you burn a 800mb CD I've never heard of one BiPolarPenguin General 4 12-19-2006 08:56 PM
LXer: Loan Linux Your Larynx - Let Your Voice Be Heard…No, REALLY Heard LXer Syndicated Linux News 0 01-29-2006 11:03 PM
Anyone heard about Linux MZ? bigstorm Linux - Newbie 8 11-01-2005 01:27 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration