retrieve web site data automatically
My objective: automatically retrieve stock prices or stock charts from web sites (such as finance.yahoo). I understand that cron and wget can bring in a web page and even get past the password requirement. But how to automatically navigate the web page site to find the data on separate pages and enter the required selections?
Is it possible to do that? Without being privy to the details of the web site construction? Will a program such as httpanalyzer enable it? Or is it possible to create a macro that will record all the keystrokes used in iceweasel/firefox to find the data? |
Quote:
|
Retrieve web site data automatically
Thanks, that's very helpful. I realize that my question was kind of stupid in that the URL of the page being sought can be readily obtained in the browser navigation bar.
|
Quote:
|
I use the apache commons libraries for Java to parse html pages. Back in the good old days Yahoo used to provide a csv download url for stocks. Now you have to provide the tickers you want in the URL to get the CSV file. So I wrote a program to generate the URL and then download the csv. I then parse the CSV file into a database of stock prices. It works pretty well, written in Java ran by cron.
Good luck, its possible and fun. :) Phil |
I've done this in Perl ... I will try to look up my old code and if it still works post it here.
|
You could try coding it with Ruby and its module httpclient. It's pretty easy if you get used to it.
|
I've had to rearrange this a little because the page layout has changed since I used this a few years ago.
Obtaining timestamp and share prices gets you on the way to plotting a graph if you wanted. This expects a file called INPUT with on each line a share symbol and a quantity. Code:
#!/usr/bin/perl -Tw |
Nice one...thanks for sharing.
|
mite be worth checking if xbmc has a python add-on for scraping stock tickers.
i know they have an rss feed aggregator as well as most other video site scrapers including youtube. |
In fact share prices can contain commas like 1,583.50 so for that and some other details I'm editing the above.
|
All times are GMT -5. The time now is 08:50 PM. |