Get the URL of Google search

frenchn00b · 10-05-2011, 12:46 AM

Hi,

Would it be possible to retrieve the URL (real one, not google) of the 5 first results of google?

Code:

read $SEARCHSSTRING
 wget "http://www.google.com/search?hl=en&client=iceweasel-a&rls=org.mozilla:en-US:unofficial&q=$SEARCHSSTRING&um=1&ie=UTF-8&tbm=isch&source=og&sa=N&tab=wi"

thank you !

Snark1994 · 10-05-2011, 11:33 AM

Your first problem with that is Google detects you're not a normal web browser because of the headers you've sent - so you're not going to get it to work without sending a User-Agent header.

Your second problem is parsing the response - my solution is horrible and hackish, using regular expressions. If you want something better (and even if you don't) I'd read http://www.codinghorror.com/blog/200...hulhu-way.html. Without further ado:

Code:

#!/bin/bash

SEARCHSTRING="Search"
wget --header='User-Agent: Mozilla/5.0 X11 Linux x86_64 rv 7.0.1 Gecko/20100101 Firefox/7.0.1' \
              "http://www.google.co.uk/search?tbm=isch&hl=en&source=hp&biw=&bih=&q=$SEARCHSTRING&btnG=Search+Images&gbv=1" -O out.html -o /dev/null

grep "\"/imgres?[^\"]*\?\"" out.html -o | \
      grep "imgurl=.*&amp;imgrefurl" -o | \
      sed 's/^......    .//' | \
      sed 's/..............$//' | \
      head -n 5

rm out.html

Hope this helps,

Proud · 10-05-2011, 11:53 AM

http://code.google.com/apis/customse.../overview.html

frenchn00b · 11-01-2011, 08:07 PM

Gorgeous penultimate post for the google images. It works and allows to wget them

Nice.
Would you eventually know to paste the link of a regular google search (non images) like 10-25 results of research?

Code:

URL="strings to search"
#"http://www.google.com/search?q=$URL"

Snark1994 · 11-02-2011, 05:56 PM

I'm sorry, I don't quite understand your question... You've put the search URL in your post. Could you perhaps give an example of what you want the code to do?

SigTerm · 11-02-2011, 06:40 PM

Quote:

Originally Posted by frenchn00b

Hi,

Would it be possible to retrieve the URL (real one, not google) of the 5 first results of google?

That's against their terms of service.

Quote:

5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.

As far as I know, if google detects an access via "automated means", you'll be banned (although temporarily) very quickly.

Cedrik · 11-02-2011, 08:24 PM

So Google can web crawl web sites to gather data, but not their users

frenchn00b · 11-03-2011, 12:51 AM

Quote:

Originally Posted by SigTerm

That's against their terms of service.

As far as I know, if google detects an access via "automated means", you'll be banned (although temporarily) very quickly.

Lot of developers of programs/web interface uses Google via other way: - example: what about those web monitoring tools, that daily let you up-to-date about various topics? - OK, you buy the software, you have a licence that protects the users.

Let's make phylosophy. Btw, what is the difference to do the same as firefox with a script? - It is against human rights/ or the freedom of using the tool that you would like, no? You can use firefox, iexplore, and other browsers, right? - I exaggerate but in someway why not?

he is definitely right.

Quote:

So Google can web crawl web sites to gather data, but not their users

I regularly find onto my website the crawling of google, yahoo, ...crawlers, and - what? Google crawl the web - automatically .

And tell me why this is allowed? - Right? https://addons.mozilla.org/en-US/firefox/addon/unplug/

I did not know that it was against rules of google, so the idea of creating a program is not feasible.

@Moderator: Well, it states into the rules of using Google, so please close this thread.

SigTerm · 11-03-2011, 04:35 AM

Quote:

Originally Posted by frenchn00b

Let's make phylosophy. Btw, what is the difference to do the same as firefox with a script? - It is against human rights/ or the freedom of using the tool that you would like, no?

The difference is that it is against TOS.
Human rights do not cover software, and human rights do not grant to you access to google services. Same kind of reasoning is frequently used by people that pirate software, by the way. No offense.

Quote:

Originally Posted by frenchn00b

You can use firefox, iexplore, and other browsers, right? - I exaggerate but in someway why not?

Yes, you exaggerate. You can use firefox and other browsers because their makers allow you to do so as long as you honor license agreement. Think about it this way: google generates revenue from advertising, which is the only reason why their service is free and not subscription-based. When you use a script, nobody reads ads (although script requests them) somebody paid to show. This is why scripts are forbidden in TOS.

It is possible that another search engine exists that explicitly allow you to use scripts. Also, it is possible that google provides some kind of API to extract search results you want. You should research the subject a bit.

frenchn00b · 11-03-2011, 12:39 PM

Quote:

Originally Posted by SigTerm

The difference is that it is against TOS.
Human rights do not cover software, and human rights do not grant to you access to google services. Same kind of reasoning is frequently used by people that pirate software, by the way. No offense.

Yes, you exaggerate. You can use firefox and other browsers because their makers allow you to do so as long as you honor license agreement. Think about it this way: google generates revenue from advertising, which is the only reason why their service is free and not subscription-based. When you use a script, nobody reads ads (although script requests them) somebody paid to show. This is why scripts are forbidden in TOS.

It is possible that another search engine exists that explicitly allow you to use scripts. Also, it is possible that google provides some kind of API to extract search results you want. You should research the subject a bit.

I agree with you.

Well, what does really means TOS for a website, if for instance I write that I do not allow Robots and Crawler onto my website? Does my TOS protects me from robots and mis-use? I mean I can give you the IP of those, and it is really annoying me to track and see that so much access anyhow occurs on any website. Is that normal?

There are so much robots that even logging does not protect you. You can even have sometimes difficulties to really distinguish what is the difference between real hacks and robots/crawlers/automatic scripts of webproviders/search engines... -Well, the only thing that protects you is the strength of Apache and the IP trackers (i.e. banners). I had an ftp, and guess what? Have you ever tried to leave an ftp service unattended...? might be risky... - I preferred to remove it.
pff. Internet is a mess, or a jungle according to me. Luckily that services and high security standards exists for most OS's to protect data. When I got XP, - before, I have been victim of a powerful virus that deleted (killed my hdd, i.e. clusters defect) and I had no backup at that time. pff. It has been sad.

SigTerm · 11-03-2011, 02:09 PM

Quote:

Originally Posted by frenchn00b

Well, what does really means TOS for a website, if for instance I write that I do not allow Robots and Crawler onto my website?

It is a legal issue. Ask a lawyer.

Snark1994 · 11-04-2011, 05:36 AM

Quote:

Well, what does really means TOS for a website, if for instance I write that I do not allow Robots and Crawler onto my website? Does my TOS protects me from robots and mis-use? I mean I can give you the IP of those, and it is really annoying me to track and see that so much access anyhow occurs on any website. Is that normal?

Well... No. Your "TOS" would be your robots.txt file, as the robots can't understand your actual TOS. Google certainly respects the robots.txt file, and as such you should really be respecting their TOS (I hadn't read through it, thanks for pointing it out SigTerm)

Google does indeed have an images API but unfortunately it has been deprecated and may not work for much longer.