many http connection to apache server

catiewong · 10-18-2018, 10:10 AM

We use centos 7 and apache 2.4 .

We just find that there are many http connection to one of our web server , over 200,000 hits from the same subnet of IP address per day ,

I know this may be crawler , but if I block these IP , my web site may be disappear from search engine ( I guess ).

I know there is tool fail2ban it can protect the apache , is it good to use such tool ? is there other method may reduct the duplicated http connnection ?

Turbocapitalist · 10-18-2018, 10:22 AM

You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$.

Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider.

SSHguard is another option from Fail2ban, but you'd probably have better results with the above to approaches.

catiewong · 10-18-2018, 10:37 AM

Quote:

Originally Posted by Turbocapitalist

You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$.

Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider.

SSHguard is another option from Fail2ban, but you'd probably have better results with the above to approaches.

You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$. <<== complain to the owner ? I know these IP is from a ISP in U.S. , I guess it should be from search engine , you mean I can do nothing if it is from search engine ?

Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider. <<== that means reduce the traffic allowed to that network ( the subnet of these incoming IP ) ? if yes , by what way / linux tool can do that ? the firewall have such function ?

dc.901 · 10-19-2018, 06:43 AM

Quote:

Originally Posted by catiewong

know these IP is from a ISP in U.S. , I guess it should be from search engine , you mean I can do nothing if it is from search engine ?

Just because IP is from an ISP in US, that does not mean its from search engine. As mentioned by Turbocapitalist, you can go to whois site to lookup the owner of IP.
If you wish to block that IP, you can also setup allow/deny rules in apache; see: https://httpd.apache.org/docs/2.4/howto/access.html

catiewong · 10-21-2018, 11:34 PM

Quote:

Originally Posted by dc.901

Just because IP is from an ISP in US, that does not mean its from search engine. As mentioned by Turbocapitalist, you can go to whois site to lookup the owner of IP.
If you wish to block that IP, you can also setup allow/deny rules in apache; see: https://httpd.apache.org/docs/2.4/howto/access.html

I have used whois , checked this IP is from US , I think this is search engine

I do not want to block it , as I worry if the IP is blocked , my web will be disappear from search engine .

Is there other way to fix it ?

thanks

Turbocapitalist · 10-21-2018, 11:36 PM

What was the reply when you sent an e-mail to the abuse address or called their phone?

catiewong · 10-21-2018, 11:39 PM

Quote:

Originally Posted by Turbocapitalist

What was the reply when you sent an e-mail to the abuse address or called their phone?

I didn't contact the abuse address , if I tell them the issue , they will stop the searching , is there any impact to my web site ?

Turbocapitalist · 10-21-2018, 11:43 PM

Quote:

Originally Posted by catiewong

I didn't contact the abuse address , if I tell them the issue , they will stop the searching , is there any impact to my web site ?

How do you know they will stop searching? How do you know this is actually a seach engine and not a garden variety M$ botnet? Do you want them to stop or to slow down?

Make a short list of the actual facts that you have on hand along with supporting data. Ignore any opinions, guesses, conjectures, or worries. Decide what you want and then use the contact information to deal with the misbehaving address range informing them of what you want and of the facts.

catiewong · 10-22-2018, 01:25 AM

Quote:

Originally Posted by Turbocapitalist

How do you know they will stop searching? How do you know this is actually a seach engine and not a garden variety M$ botnet? Do you want them to stop or to slow down?

Make a short list of the actual facts that you have on hand along with supporting data. Ignore any opinions, guesses, conjectures, or worries. Decide what you want and then use the contact information to deal with the misbehaving address range informing them of what you want and of the facts.

The below is whois result

Code:

Organization:   Google LLC (GOGL)

Do you want them to stop or to slow down? <<== I am not know what they will do if I report the issue to them , I am worry if they stop it , our web site will not be search on the google .

dc.901 · 10-22-2018, 08:40 AM

So you want to limit http access by IPs, but not permanently block?

For this, fail2ban will work. You can set it up to block for some time (can be configured) after some tries (can be configured)... For example, after 1,000 hits, block IP for 24-hours?

You can perhaps also reach out to the technical contact (should be listed in whois) and simply ask...

scasey · 10-22-2018, 09:07 AM

Two thoughts:
If Google is hitting your site 200,000 times per day, something is (might be) wrong.
How many of those are 404 hits?

catiewong · 10-31-2018, 11:34 PM

Quote:

Originally Posted by scasey

Two thoughts:
If Google is hitting your site 200,000 times per day, something is (might be) wrong.
How many of those are 404 hits?

Yes , there are many 404 hits also.

Turbocapitalist · 11-01-2018, 02:40 AM

Quote:

Originally Posted by catiewong

Yes , there are many 404 hits also.

Are the same missing pages getting hit repeatedly, or just once?

Did you perhaps rearrange your site completely in the recent past and forget to configure 301 redirection?

TenTenths · 11-01-2018, 04:46 AM

For google you can change the crawl rate in the search console for sites you've verified as owning:

https://support.google.com/webmaster...er/48620?hl=en

Habitual · 11-02-2018, 10:16 AM

100 lines of actual code from the apache access.log would tell us enough.
Sanitize your server's IP if you choose to share.

I have found that most of Google's crawlers are using 66.240.x.x

There are good crawlers and bad ones.
Good crawlers honor robots.txt, bad one don't GAF.
Restricting abusive netblocks is all part of "the game".

Email to abuse@ is usually sufficient to start the "Documentation" process.
If it Google-owned, you have "resources" at https://www.google.com/webmasters/tools

You need an Admin!