LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-16-2013, 03:30 PM   #1
hilltownboy
Member
 
Registered: Jan 2008
Location: Ashfield, MA
Distribution: Debian 9 "Stretch", Arch
Posts: 104

Rep: Reputation: 15
retrieve web site data automatically


My objective: automatically retrieve stock prices or stock charts from web sites (such as finance.yahoo). I understand that cron and wget can bring in a web page and even get past the password requirement. But how to automatically navigate the web page site to find the data on separate pages and enter the required selections?

Is it possible to do that? Without being privy to the details of the web site construction? Will a program such as httpanalyzer enable it?

Or is it possible to create a macro that will record all the keystrokes used in iceweasel/firefox to find the data?
 
Old 04-16-2013, 03:39 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 19,004

Rep: Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333
Quote:
Originally Posted by hilltownboy View Post
My objective: automatically retrieve stock prices or stock charts from web sites (such as finance.yahoo). I understand that cron and wget can bring in a web page and even get past the password requirement. But how to automatically navigate the web page site to find the data on separate pages and enter the required selections?

Is it possible to do that? Without being privy to the details of the web site construction? Will a program such as httpanalyzer enable it? Or is it possible to create a macro that will record all the keystrokes used in iceweasel/firefox to find the data?
You can use curl and wget to download the pages. The simplest way to figure out WHAT pages to download, is to just visit them manually, and note the URL. From there, you have two options, depending on what is on the page. Either:
  1. The data is just text on the webpage. Simple...download the page, and write a script to parse the relevant sections out, and do whatever you want with them.
  2. The data is in a table or frame ON the page. A little harder, but view the page source to figure out what element(s) you want. Depending on the type, it may be a chore. Once you find it, download/parse as before.
That said, there are some sites where you can build a personalized stock list. Some even have download options, which may be the easiest way to go.
 
Old 04-17-2013, 08:13 AM   #3
hilltownboy
Member
 
Registered: Jan 2008
Location: Ashfield, MA
Distribution: Debian 9 "Stretch", Arch
Posts: 104

Original Poster
Rep: Reputation: 15
Retrieve web site data automatically

Thanks, that's very helpful. I realize that my question was kind of stupid in that the URL of the page being sought can be readily obtained in the browser navigation bar.
 
Old 04-17-2013, 09:06 AM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 19,004

Rep: Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333
Quote:
Originally Posted by hilltownboy View Post
Thanks, that's very helpful. I realize that my question was kind of stupid in that the URL of the page being sought can be readily obtained in the browser navigation bar.
Well, both curl and wget are fairly complex, so don't thank me until you realize how much of a hairball it can be to script for them. That's why I suggested a stock-site with a download feature..much easier to grab one downloadable file and parse it.
 
Old 04-17-2013, 10:09 AM   #5
sniff
Member
 
Registered: Jan 2003
Location: Durham UK
Distribution: openSUSE/Debian
Posts: 328

Rep: Reputation: 42
I use the apache commons libraries for Java to parse html pages. Back in the good old days Yahoo used to provide a csv download url for stocks. Now you have to provide the tickers you want in the URL to get the CSV file. So I wrote a program to generate the URL and then download the csv. I then parse the CSV file into a database of stock prices. It works pretty well, written in Java ran by cron.

Good luck, its possible and fun.
Phil
 
Old 04-17-2013, 10:34 AM   #6
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
I've done this in Perl ... I will try to look up my old code and if it still works post it here.
 
1 members found this post helpful.
Old 04-17-2013, 10:42 AM   #7
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
You could try coding it with Ruby and its module httpclient. It's pretty easy if you get used to it.
 
1 members found this post helpful.
Old 04-17-2013, 05:21 PM   #8
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
I've had to rearrange this a little because the page layout has changed since I used this a few years ago.
Obtaining timestamp and share prices gets you on the way to plotting a graph if you wanted.
This expects a file called INPUT with on each line a share symbol and a quantity.

Code:
#!/usr/bin/perl  -Tw                                                                                                                                                           
# read share prices                                                                                                                    

use Socket;
use IO::Handle;                                                                                                                                               
use POSIX ":sys_wait_h";
use Carp;                                                                                                                                                                   
my $EOL = "\015\01";                                                                                                                                                        
                                                                                                                                                                            
# define how many shares of each type ideally read from input source
open(STDIN, "<INPUT")or die("open $!");                                                                                                                                                       
while (<>) {                                                                                                                                                                                  
    chomp();                                                                                                                                                                                  
    next if (/^\s*#/);
    next if (/^\s*$/);                                                                                                                                                                        
    if (/^(\S+)\s+(\S+)$/) {                                                                                                                                                                  
        # yahoo share ticker and quantity e.g.
        #HSBA.L    1018                                                                                                                                                                                            
        if (defined ($shares{$1})) {                                                                                                                                                                               
            $shares{$1} += $2;
        }else {
            $shares{$1}=$2;
        }
    }
}
close(STDIN);
foreach $symbol (sort keys %shares) {
    undef $price;
    undef $curr;
    $iaddr=inet_aton("192.168.0.8"); $port= 3128;   # address of squid proxy
    $proto  =  getprotobyname('tcp'); 
    $paddr   = sockaddr_in($port, $iaddr);
    socket(SOCK, PF_INET, SOCK_STREAM, $proto)  || die "socket: $!";
    connect(SOCK, $paddr)    || die "connect: $!";

    SOCK->autoflush(1);
    printf(SOCK "GET http://uk.finance.yahoo.com/q?s=%s HTTP/1.0\n", $symbol);
    print SOCK "Host: uk.finance.yahoo.com\n";
    print SOCK "\n";
    shutdown(SOCK, 1); # close outbound stream now

    READPAGE: while ($line=<SOCK>) {
      # print $line;
        chomp($line);
        while (defined($line = <SOCK>)) { 
            if ($line =~ /class="time_rtq_ticker"><span id="[\w.]+">([\d,.]+)<\/span><\/span> /) {
                printf("For %s found price %s\n", $symbol, $1);
                $_=$1;
                s/,//g;
                $price=$price{$symbol}=$_;
            }
            if ($line =~ /"ticker_currency_sym" : "([^"]+)" /){
                # Also need to find the currency in this page
                $curr=$1;
            }
            if (defined($price) && defined($curr)){
                if ("\$" eq $curr) {
                    $curr="GBP";
                    $price{$symbol}/=2;  # XXX needs to get exchange rate
                }
                if ("GBp" eq $curr) {
                    $curr="GBP";
                    $price{$symbol}/=100;
                }
                printf("%d %s %9.2f  %9.2f %s\n", 
                        $^T,  $symbol,  $price{$symbol}, ($shares{$symbol} * $price{$symbol}), $curr);
                last READPAGE;
            }
        }
    } 
    close (SOCK) || die "close: $!";
}
printf("\nT=%d\n",  $^T);
$sum=0;
foreach $symbol (sort keys %shares) {
    if (!defined($price{$symbol})) {
       printf("         No price known for %s\n", $symbol);
       next;
    }
    printf("%s=%9.2f\n",  $symbol, ($shares{$symbol} * $price{$symbol}));
    $sum += $shares{$symbol} * $price{$symbol} ;
}
printf("\nSUM=%9.2f\n", $sum);
exit(0);

Last edited by linosaurusroot; 04-18-2013 at 08:26 PM.
 
1 members found this post helpful.
Old 04-18-2013, 10:47 AM   #9
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 19,004

Rep: Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333Reputation: 4333
Nice one...thanks for sharing.
 
Old 04-18-2013, 11:47 AM   #10
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 5,135

Rep: Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876
mite be worth checking if xbmc has a python add-on for scraping stock tickers.

i know they have an rss feed aggregator as well as most other video site scrapers including youtube.
 
Old 04-18-2013, 08:23 PM   #11
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
In fact share prices can contain commas like 1,583.50 so for that and some other details I'm editing the above.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
using curl to retrieve data from a web-site - HELP please rholme Linux - Software 4 04-11-2012 07:22 AM
Can I check data being uploaded in a Firefox connection to a web site? frank4360 Linux - Security 7 07-29-2010 05:32 AM
LXer: Create an AJAX Web site using dhtmlxGrid to present data LXer Syndicated Linux News 0 01-28-2008 01:50 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration