Fail2ban noscript jail is banning googlebot...should I make an exception?
fail2ban apparently bans the googlebot every now and then for attempting to access non-existent web pages thanks to my noscript jail.
I can't help but wonder *why* googlebot would come looking for scripts that do not exist. I'm concerned about my search engine ranking but at the same time wonder how to handle a bot when a non-existing script is requested. I've made an effort to send 400/401/403/404/410 requests but this doesn't seem to help. Any advice on sending a more assertive don't ask for this page again would be quite welcome. I know that I could remove the rule to allow google full access but this would also allow bad guys to probe my server. I'm wondering if it's possible to add exceptions to this particular jail or how I might be able to deal with this. I'm also wondering if this exception can safely allow googlebot (or other well-behaved bots). |
Quote:
Code:
ignoreregex = ^<HOST> -.*"GET.*HTTP.*Googlebot/2\.1.*"$ |
I'm sure it's the noscript jail. This is the content of the ban email that I receive.
Code:
Hi, As for scanning for errors and adding files to a robots.txt, I understand how robots.txt work and I could easily formulate a PHP script to write more detail to the robots.txt file, but I'm concerned a) about how complex it would be to efficiently scan apache logs (a very large amount of data) and b) about my robots.txt file growing without bound due to varying query strings or unique-but-non-existent urls, etc. |
bump, as I would like to see a coherent answer for this one as well.
I suspect, but don't know for sure, that attackers are using the Google search engine to query those URLs ... the GoogleBot is just a "dumb" middleman. I say this because some of the URLs being requested are just too specific and suspicious. Anyway, if anyone has a suggestion, so that we don't get delisted by Google while trying to protect our servers using Fail2ban, I would love to hear it as well. |
Quote:
Quote:
|
All times are GMT -5. The time now is 01:13 AM. |