LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 11-12-2005, 02:28 PM   #1
randomx
Member
 
Registered: Feb 2003
Location: Hawaii
Distribution: Debian
Posts: 130

Rep: Reputation: 16
Question googlebot unstoppable?


Hi,

Would I be able to block googlebot from indexing my site just by using the robots.txt and/or page headers?

I think the answer is yes (for the rest of the spiders) but not sure for googlebot...

I mean, with all the copyright activist bitching about google indexing books, journals, blogs, and what not...

I'm sure companies have thought about using the right robots.txt /page headers but for some reason they still complain about google indexing their pages.

Do the blocking entries in robots.txt work and the activitist are complaining just because? or googlebot is just unstoppable?
 
Old 11-12-2005, 02:49 PM   #2
guideweb
Member
 
Registered: Mar 2004
Location: /planet/earth
Posts: 110

Rep: Reputation: 15
I dont remember the robots.txt syntax but i know you can block indexing of page with this :

<META HTTP-EQUIV = "robots" CONTENT="noindex;nofollow ">

(insert in the head section of the html page )
 
Old 11-12-2005, 04:37 PM   #3
cs-cam
Senior Member
 
Registered: May 2004
Location: Australia
Distribution: Gentoo
Posts: 3,545

Rep: Reputation: 57
Why do you want to stop Googlebot again? Last time I checked Google provided a nice stream of traffic, to my site anyway. However should a moment of insanity grip you, being a non-malicious bot there is no reason that it would disobey robots.txt
 
Old 11-12-2005, 04:58 PM   #4
Matir
LQ Guru
 
Registered: Nov 2004
Location: San Jose, CA
Distribution: Debian, Arch
Posts: 8,507

Rep: Reputation: 128Reputation: 128
http://www.google.com/webmasters/bot.html#robotsinfo

Google it.
 
Old 11-14-2005, 10:01 PM   #5
randomx
Member
 
Registered: Feb 2003
Location: Hawaii
Distribution: Debian
Posts: 130

Original Poster
Rep: Reputation: 16
so...

So does googlebot obey the robots.txt and the headers on the pages?

They claim they do. But still not sure if I should trust them.

Yeah, I had read the link http://www.google.com/webmasters/bot.html#robotsinfo a while back.

I just wanted to see if somebody else had any experience regarding the issue.

Randomx

Last edited by randomx; 11-14-2005 at 10:03 PM.
 
Old 11-14-2005, 11:15 PM   #6
bulliver
Senior Member
 
Registered: Nov 2002
Location: Edmonton AB, Canada
Distribution: Gentoo x86_64; Gentoo PPC; FreeBSD; OS X 10.9.4
Posts: 3,760
Blog Entries: 4

Rep: Reputation: 78
I exclude a couple directories in my robots.txt, and I can say that I have never seen a bot disobey in my logs. I can assure you that google does respect the file.

If you are seriously paranoid, this site describes how to trap and ban bots that ignore robots.txt:
http://www.fleiner.com/bots/

PS: keep in mind that googlebot and all the other bots will not know about your robots.txt file until the next time they index your site, which may actually take a while. So anything on your site previously indexed will still be available until then...

Last edited by bulliver; 11-14-2005 at 11:19 PM.
 
Old 11-15-2005, 10:32 PM   #7
AlexV
Member
 
Registered: May 2004
Location: New Lenox, IL
Distribution: Fedora Core 4; Ubuntu 5.10 (Breezy Preview); CentOS 4
Posts: 81

Rep: Reputation: 15
Re: so...

Quote:
Originally posted by randomx
They claim they do. But still not sure if I should trust them.
I, for one, welcome our new web-indexing overlords!

But seriously, what have you got against GoogleBots? If the information on your site is intended to be public, then you want Google and every other search engine to index it so that people can find it. And if it's private, than the Internet was probably a bad place to put it in the first place
 
Old 11-16-2005, 06:22 AM   #8
Matir
LQ Guru
 
Registered: Nov 2004
Location: San Jose, CA
Distribution: Debian, Arch
Posts: 8,507

Rep: Reputation: 128Reputation: 128
Re: Re: so...

Quote:
Originally posted by AlexV
I, for one, welcome our new web-indexing overlords!

But seriously, what have you got against GoogleBots? If the information on your site is intended to be public, then you want Google and every other search engine to index it so that people can find it. And if it's private, than the Internet was probably a bad place to put it in the first place
Or if it's private, you could wrap the page in some PHP that checks if the visitor is a googlebot, and if so, return a blank page. :-P
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unstoppable Copier windowlicker Linux - Software 2 11-11-2005 09:25 AM
Random unstoppable alt-tab loop gavinbeatty Mandriva 2 05-15-2004 01:56 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 12:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration