CLI command to pull Stock Price and volume from Google

dmchess · 11-06-2017, 09:39 AM

I have an app that has been pulling stock data from yahoo. The part that pulls it is a
Pascal program that is run by cron and executes the following CLI command:

curl -s "http://download.finance.yahoo.com/d/quotes.csv?s=' + TickerStr + '&f=svl1c6j2s6f6"

This was executed as an external process to the Pascal program and then Pascal Progam would then capture the data and insert it into a mysql table. Then the data could be viewed through a GUI App.

Unfortunately yahoo has shut down the ability to pull data with scripts. The last day it worked was 10/31/2017. I am looking at replacing the Yahoo command with a Google command. I can get a stock quote with a:

wget -q -O - http://www.google.com/finance?q=F | grep ref_ -m 1 | sed 's|<[^>]*>||g'

But I don't know how to pull the volume. Does anyone have this information?

Terry

dannybpng · 11-06-2017, 12:04 PM

This would get you both values (if Google does not change its HTML style codes)

Quote:

wget -q -O - http://www.google.com/finance?q=F | \
awk 'BEGIN{first=1;vol=0}
/ref_/ { if(first) { gsub( /<[^>]*>/, "");print;first=0}}
/data-snapfield="vol_and_avg"/ { vol=1}
/class="val"/ && vol == 1 {sub( ".*>","");sub( "/.*","");print;exit}'

It grabs the first "ref_" id line and removes the markup. Then it finds the "vol_and_avg" data-snapfield and grabs the "val" information line and removes the markup and the average value. This is a quick hack. There is most likely a better way to do this.

schneidz · 11-06-2017, 12:32 PM

does this give the needful:

Code:

[schneidz@hyper stock]$ wget -q http://www.nasdaq.com/symbol/aet -O - | egrep "qwidget_pageheader|qwidget-symbol|qwidget-dollar|qwidget_netchange" 
			    <div id="qwidget_pageheader"><h1>Aetna Inc. Common Stock Quote & Summary Data</h1> <b></b></div>
			        <div class="qwidget-symbol">AET&nbsp;</div>
				    <div id="qwidget_lastsale" class="qwidget-dollar">$176.94</div>
	                <div class="qwidget-dollar"><div>*&nbsp;&nbsp;</div></div>
				    <div id="qwidget_netchange" class="qwidget-cents qwidget-Red">0.05</div>

rknichols · 11-06-2017, 12:42 PM

Did you try just having curl use a user-agent string from your regular browser ("-A" or "--user-agent" option)? I've been doing that for a couple of years now, since nasdaq.com started rejecting the default lynx user agent.

To see your browser's current user agent, visit https://www.whatismybrowser.com/dete...-my-user-agent

Turbocapitalist · 11-06-2017, 12:46 PM

It's rather easy in Perl if you use the LWP::UserAgent and HTML::TreeBuilder::XPath modules. It'd be just a few lines to fetch the page and then a few more to scrape the volume.

The only hard part would be guessing the right XPath formula. One way to express the location of the data is like this:

Code:

//tr[td[@data-snapfield="vol_and_avg"]]/td[2]

However, anything with XPath capabilities would do, not just perl.

Edit: that is for the Google source

ondoho · 11-06-2017, 12:52 PM

Quote:

Originally Posted by rknichols

Did you try just having curl use a user-agent string from your regular browser ("-A" or "--user-agent" option)? I've been doing that for a couple of years now, since nasdaq.com started rejecting the default lynx user agent.

To see your browser's current user agent, visit https://www.whatismybrowser.com/dete...-my-user-agent

opening the yahoo link in a mozilla browser displays a clear message.
i don't see how adding a useragentstring to curl would help.

dmchess · 11-06-2017, 01:23 PM

Thankyou, that seems to have worked. I have my program working again.

rknichols · 11-06-2017, 05:19 PM

Quote:

Originally Posted by ondoho

opening the yahoo link in a mozilla browser displays a clear message.
i don't see how adding a useragentstring to curl would help.

The original post indicated a problem downloading from a script and did not mention any issue with using a browser. Thus, my belief that the user-agent might be the problem, as was indeed the case on nasdaq.com.

Turbocapitalist · 11-06-2017, 11:46 PM

Quote:

Originally Posted by rknichols

The original post indicated a problem downloading from a script and did not mention any issue with using a browser. Thus, my belief that the user-agent might be the problem, as was indeed the case on nasdaq.com.

Hmm. I tried a few browsers and all were served this by Yahoo:

"It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com."

So I figure using Yahoo as a data source is a lost cause. The HTML in the corresponding Google page is quite easy to work with using the tools perl has for XPath and fetching web pages.

Code:

#!/usr/bin/perl

use LWP::UserAgent;
use HTML::TreeBuilder::XPath;

use strict;
use warnings;

my $ua = LWP::UserAgent->new;
$ua->agent('dmchess-scraper/0.1 ');                                             
$ua->timeout(10);
$ua->env_proxy;

my $response = $ua->get('http://finance.google.com/finance?q=F');

die("Failed : $response->status_line\n")
    if ( ! $response->is_success);

my $contents = $response->decoded_content;

my $xhtml = HTML::TreeBuilder->new;

$xhtml->parse( $contents )
    or die( "Could not parse HTML : $! \n");

my ( $volume )
    = ( $xhtml->findnodes( '//tr[td[@data-snapfield="vol_and_avg"]]/td[2]' ) );

my ( $vol, $avg ) = split(/\//,$volume->as_trimmed_text());

print qq(OK\t$vol, $avg\n);

exit(0);

ondoho · 11-07-2017, 12:27 AM

Quote:

Originally Posted by rknichols

The original post indicated a problem downloading from a script and did not mention any issue with using a browser. Thus, my belief that the user-agent might be the problem, as was indeed the case on nasdaq.com.

had you clicked on the link, you'd have found this:

Quote:

Originally Posted by Turbocapitalist

Hmm. I tried a few browsers and all were served this by Yahoo:

"It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com."

that was my point.

rknichols · 11-07-2017, 08:43 AM

Quote:

Originally Posted by rknichols

The original post indicated a problem downloading from a script and did not mention any issue with using a browser. Thus, my belief that the user-agent might be the problem, as was indeed the case on nasdaq.com.

Quote:

Originally Posted by ondoho

had you clicked on the link, you'd have found this:

Quote:

Originally Posted by Turbocapitalist

Hmm. I tried a few browsers and all were served this by Yahoo:

"It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com."

that was my point.

Change "clicked on the link" to "copied and pasted the link into a browser".
The only clickable link in that original post was to google.com. That curl command did not appear to be complete as it did not request any particular ticker symbol(s).

ondoho · 11-07-2017, 12:35 PM

Quote:

Originally Posted by rknichols

Change "clicked on the link" to "copied and pasted the link into a browser".
The only clickable link in that original post was to google.com. That curl command did not appear to be complete as it did not request any particular ticker symbol(s).

conceded.

dmchess · 12-10-2017, 09:46 AM

I am not getting your script to run:

./Stock.sh F
Can't locate HTML/TreeBuilder/XPath.pm in @INC (you may need to install the HTML::TreeBuilder::XPath module) (@INC contains: /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at ./Stock.sh line 4.
BEGIN failed--compilation aborted at ./Stock.sh line 4.

I first tried installing XPath
sudo apt-get install libxml-xpath-perl

That didn't work so I googled around and some one said I had to run cpan command, so I tried that
sudo cpan XML::XPath

And I am still getting:

./Stock.sh F
Can't locate HTML/TreeBuilder/XPath.pm in @INC (you may need to install the HTML::TreeBuilder::XPath module) (@INC contains: /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at ./Stock.sh line 4.
BEGIN failed--compilation aborted at ./Stock.sh line 4.

I am not knowledgeable on perl. I don't normally use it. I am ok with awk, pascal or c, but not perl.

Turbocapitalist · 12-10-2017, 09:51 AM

Quote:

Originally Posted by dmchess

Can't locate HTML/TreeBuilder/XPath.pm in @INC (you may need to install the HTML::TreeBuilder::XPath module)

That would be libhtml-treebuilder-xpath-perl in your repository. The names do matter a lot.

There is also a generic XML Xpath module, but it's not relevant here. Sine you are dealing with HTML you'll need one that can handle all the mistakes found even today in HTML. Thank M$ for that and Netscape for playing along with M$ in their 'browser war' that went on for so long starting over 20 years ago. It ended but we're still paying the cost for that with lots of non-standard, hard-to-parse, pseudo HTML.

dmchess · 12-10-2017, 10:06 AM

Ok, we are getting some where. Now how do I add ending stock value to the script?

Terry H.