LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Security
User Name
Password
Linux - Security This forum is for all security related questions.
Questions, tips, system compromises, firewalls, etc. are all included here.

Notices



Reply
 
Search this Thread
Old 09-24-2012, 03:52 PM   #1
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Rep: Reputation: 50
Fail2ban noscript jail is banning googlebot...should I make an exception?


fail2ban apparently bans the googlebot every now and then for attempting to access non-existent web pages thanks to my noscript jail.

I can't help but wonder *why* googlebot would come looking for scripts that do not exist. I'm concerned about my search engine ranking but at the same time wonder how to handle a bot when a non-existing script is requested. I've made an effort to send 400/401/403/404/410 requests but this doesn't seem to help. Any advice on sending a more assertive don't ask for this page again would be quite welcome.

I know that I could remove the rule to allow google full access but this would also allow bad guys to probe my server. I'm wondering if it's possible to add exceptions to this particular jail or how I might be able to deal with this. I'm also wondering if this exception can safely allow googlebot (or other well-behaved bots).
 
Old 09-24-2012, 09:30 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,766
Blog Entries: 54

Rep: Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976
Quote:
Originally Posted by sneakyimp View Post
fail2ban apparently bans the googlebot every now and then for attempting to access non-existent web pages (..) I'm wondering if it's possible to add exceptions to this particular jail or how I might be able to deal with this. I'm also wondering if this exception can safely allow googlebot (or other well-behaved bots).
Are you sure it's the noscript jail that blocks Googlebot and not jail.conf "[apache-badbots]" entry? In any case adding a line to apache-noscript.conf (add one to apache-badbots.conf too if unsure):
Code:
ignoreregex = ^<HOST> -.*"GET.*HTTP.*Googlebot/2\.1.*"$
and then reloading the configuration with 'fail2ban-client reload' should keep it from blocking, but do note other User-Agent versions exist: http://support.google.com/webmasters...answer=1061943. Also Googlebot originates from Google's AS15169 AFAIK (66.249.65.0/24) so any evasion should be easy to spot. Wrt pages it shouldn't visit or look for maybe also at what it errors out on and put the ones with the most hits in a robots.txt? (See google.com/webmasters/ for more as it's not a security issue.)
 
1 members found this post helpful.
Old 10-09-2012, 04:06 PM   #3
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Original Poster
Rep: Reputation: 50
I'm sure it's the noscript jail. This is the content of the ban email that I receive.
Code:
Hi,

The IP 66.249.71.112 has just been banned by Fail2Ban after
6 attempts against apache-noscript.


Here are more information about 66.249.71.112:

#
# Query terms are ambiguous.  The query is assumed to be:
#     "n 66.249.71.112"
#
# Use "?" to get help.
#

#
# The following results may also be obtained via:
# http://whois.arin.net/rest/nets;q=66.249.71.112?showDetails=true&showARIN=false&ext=netref2
#

NetRange:       66.249.64.0 - 66.249.95.255
CIDR:           66.249.64.0/19
OriginAS:       
NetName:        GOOGLE
NetHandle:      NET-66-249-64-0-1
Parent:         NET-66-0-0-0-0
NetType:        Direct Allocation
RegDate:        2004-03-05
Updated:        2012-02-24
Ref:            http://whois.arin.net/rest/net/NET-66-249-64-0-1


OrgName:        Google Inc.
OrgId:          GOGL
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
PostalCode:     94043
Country:        US
RegDate:        2000-03-30
Updated:        2011-09-24
Ref:            http://whois.arin.net/rest/org/GOGL

OrgAbuseHandle: ZG39-ARIN
OrgAbuseName:   Google Inc
OrgAbusePhone:  +1-650-253-0000 
OrgAbuseEmail:  arin-contact@google.com
OrgAbuseRef:    http://whois.arin.net/rest/poc/ZG39-ARIN

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google Inc
OrgTechPhone:  +1-650-253-0000 
OrgTechEmail:  arin-contact@google.com
OrgTechRef:    http://whois.arin.net/rest/poc/ZG39-ARIN

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#

Regards,

Fail2Ban
I'd prefer not to add exceptions based on the user-agent alone because this information is easily spoofed. I would like to provide an exception to the noscript jail based on remote addresses that can be reliably attributed to Google's bots.

As for scanning for errors and adding files to a robots.txt, I understand how robots.txt work and I could easily formulate a PHP script to write more detail to the robots.txt file, but I'm concerned a) about how complex it would be to efficiently scan apache logs (a very large amount of data) and b) about my robots.txt file growing without bound due to varying query strings or unique-but-non-existent urls, etc.
 
Old 12-08-2012, 11:53 AM   #4
linuxtester
LQ Newbie
 
Registered: Jan 2008
Posts: 4

Rep: Reputation: 0
bump, as I would like to see a coherent answer for this one as well.

I suspect, but don't know for sure, that attackers are using the Google search engine to query those URLs ... the GoogleBot is just a "dumb" middleman. I say this because some of the URLs being requested are just too specific and suspicious.

Anyway, if anyone has a suggestion, so that we don't get delisted by Google while trying to protect our servers using Fail2ban, I would love to hear it as well.
 
Old 12-08-2012, 02:01 PM   #5
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,766
Blog Entries: 54

Rep: Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976
Quote:
Originally Posted by linuxtester View Post
I suspect, but don't know for sure, that attackers are using the Google search engine to query those URLs ... the GoogleBot is just a "dumb" middleman. I say this because some of the URLs being requested are just too specific and suspicious.
Details please.


Quote:
Originally Posted by linuxtester View Post
(..) if anyone has a suggestion, so that we don't get delisted by Google (..)
As I already stated Googlebot operates out of AS 15169. Correct me if I'm wrong but AFAIK fail2ban only has a global ignore list so apart from mucking with per-service ignoreregexes or using custom scripts to add offending IP addresses to the chain IMO the easiest way to avoid Googlebot being rejected would be to have an -j ACCEPT rule for --state NEW to TCP/80 from 66.249.65.0/24 in the fail2ban chain above the other rules.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] fail2ban - not banning apache scanners djsmiley2k Linux - Server 1 08-26-2010 05:27 AM
fail2ban log errors for ssh jail linuxlover.chaitanya Linux - Security 2 07-24-2010 08:01 AM
How to make a chroot non jail? johnsfine Linux - General 3 05-11-2010 09:41 AM
make an exception when doing rsync procfs Linux - General 9 08-10-2006 01:00 AM
make an exception when doing rsync procfs Linux - General 1 08-01-2006 07:02 AM


All times are GMT -5. The time now is 02:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration