Linux - SecurityThis forum is for all security related questions.
Questions, tips, system compromises, firewalls, etc. are all included here.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
We just find that there are many http connection to one of our web server , over 200,000 hits from the same subnet of IP address per day ,
I know this may be crawler , but if I block these IP , my web site may be disappear from search engine ( I guess ).
I know there is tool fail2ban it can protect the apache , is it good to use such tool ? is there other method may reduct the duplicated http connnection ?
You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$.
Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider.
SSHguard is another option from Fail2ban, but you'd probably have better results with the above to approaches.
You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$.
Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider.
SSHguard is another option from Fail2ban, but you'd probably have better results with the above to approaches.
You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$. <<== complain to the owner ? I know these IP is from a ISP in U.S. , I guess it should be from search engine , you mean I can do nothing if it is from search engine ?
Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider. <<== that means reduce the traffic allowed to that network ( the subnet of these incoming IP ) ? if yes , by what way / linux tool can do that ? the firewall have such function ?
know these IP is from a ISP in U.S. , I guess it should be from search engine , you mean I can do nothing if it is from search engine ?
Just because IP is from an ISP in US, that does not mean its from search engine. As mentioned by Turbocapitalist, you can go to whois site to lookup the owner of IP.
If you wish to block that IP, you can also setup allow/deny rules in apache; see: https://httpd.apache.org/docs/2.4/howto/access.html
Just because IP is from an ISP in US, that does not mean its from search engine. As mentioned by Turbocapitalist, you can go to whois site to lookup the owner of IP.
If you wish to block that IP, you can also setup allow/deny rules in apache; see: https://httpd.apache.org/docs/2.4/howto/access.html
I have used whois , checked this IP is from US , I think this is search engine
I do not want to block it , as I worry if the IP is blocked , my web will be disappear from search engine .
I didn't contact the abuse address , if I tell them the issue , they will stop the searching , is there any impact to my web site ?
How do you know they will stop searching? How do you know this is actually a seach engine and not a garden variety M$ botnet? Do you want them to stop or to slow down?
Make a short list of the actual facts that you have on hand along with supporting data. Ignore any opinions, guesses, conjectures, or worries. Decide what you want and then use the contact information to deal with the misbehaving address range informing them of what you want and of the facts.
How do you know they will stop searching? How do you know this is actually a seach engine and not a garden variety M$ botnet? Do you want them to stop or to slow down?
Make a short list of the actual facts that you have on hand along with supporting data. Ignore any opinions, guesses, conjectures, or worries. Decide what you want and then use the contact information to deal with the misbehaving address range informing them of what you want and of the facts.
The below is whois result
Code:
Organization: Google LLC (GOGL)
Do you want them to stop or to slow down? <<== I am not know what they will do if I report the issue to them , I am worry if they stop it , our web site will not be search on the google .
So you want to limit http access by IPs, but not permanently block?
For this, fail2ban will work. You can set it up to block for some time (can be configured) after some tries (can be configured)... For example, after 1,000 hits, block IP for 24-hours?
You can perhaps also reach out to the technical contact (should be listed in whois) and simply ask...
100 lines of actual code from the apache access.log would tell us enough.
Sanitize your server's IP if you choose to share.
I have found that most of Google's crawlers are using 66.240.x.x
There are good crawlers and bad ones.
Good crawlers honor robots.txt, bad one don't GAF.
Restricting abusive netblocks is all part of "the game".
Email to abuse@ is usually sufficient to start the "Documentation" process.
If it Google-owned, you have "resources" at https://www.google.com/webmasters/tools
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.