ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Your first problem with that is Google detects you're not a normal web browser because of the headers you've sent - so you're not going to get it to work without sending a User-Agent header.
Your second problem is parsing the response - my solution is horrible and hackish, using regular expressions. If you want something better (and even if you don't) I'd read http://www.codinghorror.com/blog/200...hulhu-way.html. Without further ado:
Gorgeous penultimate post for the google images. It works and allows to wget them Nice.
Would you eventually know to paste the link of a regular google search (non images) like 10-25 results of research?
Code:
URL="strings to search"
#"http://www.google.com/search?q=$URL"
I'm sorry, I don't quite understand your question... You've put the search URL in your post. Could you perhaps give an example of what you want the code to do?
5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.
As far as I know, if google detects an access via "automated means", you'll be banned (although temporarily) very quickly.
As far as I know, if google detects an access via "automated means", you'll be banned (although temporarily) very quickly.
Lot of developers of programs/web interface uses Google via other way: - example: what about those web monitoring tools, that daily let you up-to-date about various topics? - OK, you buy the software, you have a licence that protects the users.
Let's make phylosophy. Btw, what is the difference to do the same as firefox with a script? - It is against human rights/ or the freedom of using the tool that you would like, no? You can use firefox, iexplore, and other browsers, right? - I exaggerate but in someway why not?
he is definitely right.
Quote:
So Google can web crawl web sites to gather data, but not their users
I regularly find onto my website the crawling of google, yahoo, ...crawlers, and - what? Google crawl the web - automatically .
Let's make phylosophy. Btw, what is the difference to do the same as firefox with a script? - It is against human rights/ or the freedom of using the tool that you would like, no?
The difference is that it is against TOS. Human rights do not cover software, and human rights do not grant to you access to google services. Same kind of reasoning is frequently used by people that pirate software, by the way. No offense.
Quote:
Originally Posted by frenchn00b
You can use firefox, iexplore, and other browsers, right? - I exaggerate but in someway why not?
Yes, you exaggerate. You can use firefox and other browsers because their makers allow you to do so as long as you honor license agreement. Think about it this way: google generates revenue from advertising, which is the only reason why their service is free and not subscription-based. When you use a script, nobody reads ads (although script requests them) somebody paid to show. This is why scripts are forbidden in TOS.
It is possible that another search engine exists that explicitly allow you to use scripts. Also, it is possible that google provides some kind of API to extract search results you want. You should research the subject a bit.
The difference is that it is against TOS. Human rights do not cover software, and human rights do not grant to you access to google services. Same kind of reasoning is frequently used by people that pirate software, by the way. No offense.
Yes, you exaggerate. You can use firefox and other browsers because their makers allow you to do so as long as you honor license agreement. Think about it this way: google generates revenue from advertising, which is the only reason why their service is free and not subscription-based. When you use a script, nobody reads ads (although script requests them) somebody paid to show. This is why scripts are forbidden in TOS.
It is possible that another search engine exists that explicitly allow you to use scripts. Also, it is possible that google provides some kind of API to extract search results you want. You should research the subject a bit.
I agree with you.
Well, what does really means TOS for a website, if for instance I write that I do not allow Robots and Crawler onto my website? Does my TOS protects me from robots and mis-use? I mean I can give you the IP of those, and it is really annoying me to track and see that so much access anyhow occurs on any website. Is that normal?
There are so much robots that even logging does not protect you. You can even have sometimes difficulties to really distinguish what is the difference between real hacks and robots/crawlers/automatic scripts of webproviders/search engines... -Well, the only thing that protects you is the strength of Apache and the IP trackers (i.e. banners). I had an ftp, and guess what? Have you ever tried to leave an ftp service unattended...? might be risky... - I preferred to remove it.
pff. Internet is a mess, or a jungle according to me. Luckily that services and high security standards exists for most OS's to protect data. When I got XP, - before, I have been victim of a powerful virus that deleted (killed my hdd, i.e. clusters defect) and I had no backup at that time. pff. It has been sad.
Last edited by frenchn00b; 11-03-2011 at 12:49 PM.
Well, what does really means TOS for a website, if for instance I write that I do not allow Robots and Crawler onto my website? Does my TOS protects me from robots and mis-use? I mean I can give you the IP of those, and it is really annoying me to track and see that so much access anyhow occurs on any website. Is that normal?
Well... No. Your "TOS" would be your robots.txt file, as the robots can't understand your actual TOS. Google certainly respects the robots.txt file, and as such you should really be respecting their TOS (I hadn't read through it, thanks for pointing it out SigTerm)
Google does indeed have an images API but unfortunately it has been deprecated and may not work for much longer.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.