googlebot unstoppable?

randomx · 11-12-2005, 02:28 PM

Hi,

Would I be able to block googlebot from indexing my site just by using the robots.txt and/or page headers?

I think the answer is yes (for the rest of the spiders) but not sure for googlebot...

I mean, with all the copyright activist bitching about google indexing books, journals, blogs, and what not...

I'm sure companies have thought about using the right robots.txt /page headers but for some reason they still complain about google indexing their pages.

Do the blocking entries in robots.txt work and the activitist are complaining just because? or googlebot is just unstoppable?

guideweb · 11-12-2005, 02:49 PM

I dont remember the robots.txt syntax but i know you can block indexing of page with this :

<META HTTP-EQUIV = "robots" CONTENT="noindex;nofollow ">

(insert in the head section of the html page )

cs-cam · 11-12-2005, 04:37 PM

Why do you want to stop Googlebot again? Last time I checked Google provided a nice stream of traffic, to my site anyway. However should a moment of insanity grip you, being a non-malicious bot there is no reason that it would disobey robots.txt

Matir · 11-12-2005, 04:58 PM

http://www.google.com/webmasters/bot.html#robotsinfo

Google it.

randomx · 11-14-2005, 10:01 PM

So does googlebot obey the robots.txt and the headers on the pages?

They claim they do. But still not sure if I should trust them.

Yeah, I had read the link http://www.google.com/webmasters/bot.html#robotsinfo a while back.

I just wanted to see if somebody else had any experience regarding the issue.

Randomx

bulliver · 11-14-2005, 11:15 PM

I exclude a couple directories in my robots.txt, and I can say that I have never seen a bot disobey in my logs. I can assure you that google does respect the file.

If you are seriously paranoid, this site describes how to trap and ban bots that ignore robots.txt:
http://www.fleiner.com/bots/

PS: keep in mind that googlebot and all the other bots will not know about your robots.txt file until the next time they index your site, which may actually take a while. So anything on your site previously indexed will still be available until then...

AlexV · 11-15-2005, 10:32 PM

Quote:

Originally posted by randomx
They claim they do. But still not sure if I should trust them.

I, for one, welcome our new web-indexing overlords!

But seriously, what have you got against GoogleBots? If the information on your site is intended to be public, then you want Google and every other search engine to index it so that people can find it. And if it's private, than the Internet was probably a bad place to put it in the first place

Matir · 11-16-2005, 06:22 AM

Quote:

Originally posted by AlexV
I, for one, welcome our new web-indexing overlords!

But seriously, what have you got against GoogleBots? If the information on your site is intended to be public, then you want Google and every other search engine to index it so that people can find it. And if it's private, than the Internet was probably a bad place to put it in the first place

Or if it's private, you could wrap the page in some PHP that checks if the visitor is a googlebot, and if so, return a blank page. :-P