LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   retrieve web site data automatically (https://www.linuxquestions.org/questions/linux-newbie-8/retrieve-web-site-data-automatically-4175458363/)

hilltownboy 04-16-2013 03:30 PM

retrieve web site data automatically
 
My objective: automatically retrieve stock prices or stock charts from web sites (such as finance.yahoo). I understand that cron and wget can bring in a web page and even get past the password requirement. But how to automatically navigate the web page site to find the data on separate pages and enter the required selections?

Is it possible to do that? Without being privy to the details of the web site construction? Will a program such as httpanalyzer enable it?

Or is it possible to create a macro that will record all the keystrokes used in iceweasel/firefox to find the data?

TB0ne 04-16-2013 03:39 PM

Quote:

Originally Posted by hilltownboy (Post 4932686)
My objective: automatically retrieve stock prices or stock charts from web sites (such as finance.yahoo). I understand that cron and wget can bring in a web page and even get past the password requirement. But how to automatically navigate the web page site to find the data on separate pages and enter the required selections?

Is it possible to do that? Without being privy to the details of the web site construction? Will a program such as httpanalyzer enable it? Or is it possible to create a macro that will record all the keystrokes used in iceweasel/firefox to find the data?

You can use curl and wget to download the pages. The simplest way to figure out WHAT pages to download, is to just visit them manually, and note the URL. From there, you have two options, depending on what is on the page. Either:
  1. The data is just text on the webpage. Simple...download the page, and write a script to parse the relevant sections out, and do whatever you want with them.
  2. The data is in a table or frame ON the page. A little harder, but view the page source to figure out what element(s) you want. Depending on the type, it may be a chore. Once you find it, download/parse as before.
That said, there are some sites where you can build a personalized stock list. Some even have download options, which may be the easiest way to go.

hilltownboy 04-17-2013 08:13 AM

Retrieve web site data automatically
 
Thanks, that's very helpful. I realize that my question was kind of stupid in that the URL of the page being sought can be readily obtained in the browser navigation bar.

TB0ne 04-17-2013 09:06 AM

Quote:

Originally Posted by hilltownboy (Post 4933165)
Thanks, that's very helpful. I realize that my question was kind of stupid in that the URL of the page being sought can be readily obtained in the browser navigation bar.

Well, both curl and wget are fairly complex, so don't thank me until you realize how much of a hairball it can be to script for them. :) That's why I suggested a stock-site with a download feature..much easier to grab one downloadable file and parse it.

sniff 04-17-2013 10:09 AM

I use the apache commons libraries for Java to parse html pages. Back in the good old days Yahoo used to provide a csv download url for stocks. Now you have to provide the tickers you want in the URL to get the CSV file. So I wrote a program to generate the URL and then download the csv. I then parse the CSV file into a database of stock prices. It works pretty well, written in Java ran by cron.

Good luck, its possible and fun. :)
Phil

linosaurusroot 04-17-2013 10:34 AM

I've done this in Perl ... I will try to look up my old code and if it still works post it here.

konsolebox 04-17-2013 10:42 AM

You could try coding it with Ruby and its module httpclient. It's pretty easy if you get used to it.

linosaurusroot 04-17-2013 05:21 PM

I've had to rearrange this a little because the page layout has changed since I used this a few years ago.
Obtaining timestamp and share prices gets you on the way to plotting a graph if you wanted.
This expects a file called INPUT with on each line a share symbol and a quantity.

Code:

#!/usr/bin/perl  -Tw                                                                                                                                                         
# read share prices                                                                                                                   

use Socket;
use IO::Handle;                                                                                                                                             
use POSIX ":sys_wait_h";
use Carp;                                                                                                                                                                 
my $EOL = "\015\01";                                                                                                                                                       
                                                                                                                                                                           
# define how many shares of each type ideally read from input source
open(STDIN, "<INPUT")or die("open $!");                                                                                                                                                     
while (<>) {                                                                                                                                                                                 
    chomp();                                                                                                                                                                                 
    next if (/^\s*#/);
    next if (/^\s*$/);                                                                                                                                                                       
    if (/^(\S+)\s+(\S+)$/) {                                                                                                                                                                 
        # yahoo share ticker and quantity e.g.
        #HSBA.L    1018                                                                                                                                                                                           
        if (defined ($shares{$1})) {                                                                                                                                                                             
            $shares{$1} += $2;
        }else {
            $shares{$1}=$2;
        }
    }
}
close(STDIN);
foreach $symbol (sort keys %shares) {
    undef $price;
    undef $curr;
    $iaddr=inet_aton("192.168.0.8"); $port= 3128;  # address of squid proxy
    $proto  =  getprotobyname('tcp');
    $paddr  = sockaddr_in($port, $iaddr);
    socket(SOCK, PF_INET, SOCK_STREAM, $proto)  || die "socket: $!";
    connect(SOCK, $paddr)    || die "connect: $!";

    SOCK->autoflush(1);
    printf(SOCK "GET http://uk.finance.yahoo.com/q?s=%s HTTP/1.0\n", $symbol);
    print SOCK "Host: uk.finance.yahoo.com\n";
    print SOCK "\n";
    shutdown(SOCK, 1); # close outbound stream now

    READPAGE: while ($line=<SOCK>) {
      # print $line;
        chomp($line);
        while (defined($line = <SOCK>)) {
            if ($line =~ /class="time_rtq_ticker"><span id="[\w.]+">([\d,.]+)<\/span><\/span> /) {
                printf("For %s found price %s\n", $symbol, $1);
                $_=$1;
                s/,//g;
                $price=$price{$symbol}=$_;
            }
            if ($line =~ /"ticker_currency_sym" : "([^"]+)" /){
                # Also need to find the currency in this page
                $curr=$1;
            }
            if (defined($price) && defined($curr)){
                if ("\$" eq $curr) {
                    $curr="GBP";
                    $price{$symbol}/=2;  # XXX needs to get exchange rate
                }
                if ("GBp" eq $curr) {
                    $curr="GBP";
                    $price{$symbol}/=100;
                }
                printf("%d %s %9.2f  %9.2f %s\n",
                        $^T,  $symbol,  $price{$symbol}, ($shares{$symbol} * $price{$symbol}), $curr);
                last READPAGE;
            }
        }
    }
    close (SOCK) || die "close: $!";
}
printf("\nT=%d\n",  $^T);
$sum=0;
foreach $symbol (sort keys %shares) {
    if (!defined($price{$symbol})) {
      printf("        No price known for %s\n", $symbol);
      next;
    }
    printf("%s=%9.2f\n",  $symbol, ($shares{$symbol} * $price{$symbol}));
    $sum += $shares{$symbol} * $price{$symbol} ;
}
printf("\nSUM=%9.2f\n", $sum);
exit(0);


TB0ne 04-18-2013 10:47 AM

Nice one...thanks for sharing.

schneidz 04-18-2013 11:47 AM

mite be worth checking if xbmc has a python add-on for scraping stock tickers.

i know they have an rss feed aggregator as well as most other video site scrapers including youtube.

linosaurusroot 04-18-2013 08:23 PM

In fact share prices can contain commas like 1,583.50 so for that and some other details I'm editing the above.


All times are GMT -5. The time now is 08:50 PM.