GeneralThis forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Would I be able to block googlebot from indexing my site just by using the robots.txt and/or page headers?
I think the answer is yes (for the rest of the spiders) but not sure for googlebot...
I mean, with all the copyright activist bitching about google indexing books, journals, blogs, and what not...
I'm sure companies have thought about using the right robots.txt /page headers but for some reason they still complain about google indexing their pages.
Do the blocking entries in robots.txt work and the activitist are complaining just because? or googlebot is just unstoppable?
Why do you want to stop Googlebot again? Last time I checked Google provided a nice stream of traffic, to my site anyway. However should a moment of insanity grip you, being a non-malicious bot there is no reason that it would disobey robots.txt
I exclude a couple directories in my robots.txt, and I can say that I have never seen a bot disobey in my logs. I can assure you that google does respect the file.
If you are seriously paranoid, this site describes how to trap and ban bots that ignore robots.txt: http://www.fleiner.com/bots/
PS: keep in mind that googlebot and all the other bots will not know about your robots.txt file until the next time they index your site, which may actually take a while. So anything on your site previously indexed will still be available until then...
Originally posted by randomx They claim they do. But still not sure if I should trust them.
I, for one, welcome our new web-indexing overlords!
But seriously, what have you got against GoogleBots? If the information on your site is intended to be public, then you want Google and every other search engine to index it so that people can find it. And if it's private, than the Internet was probably a bad place to put it in the first place
Originally posted by AlexV I, for one, welcome our new web-indexing overlords!
But seriously, what have you got against GoogleBots? If the information on your site is intended to be public, then you want Google and every other search engine to index it so that people can find it. And if it's private, than the Internet was probably a bad place to put it in the first place
Or if it's private, you could wrap the page in some PHP that checks if the visitor is a googlebot, and if so, return a blank page. :-P
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.