LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Security
User Name
Password
Linux - Security This forum is for all security related questions.
Questions, tips, system compromises, firewalls, etc. are all included here.

Notices


Reply
  Search this Thread
Old 10-18-2018, 10:10 AM   #1
catiewong
Member
 
Registered: Aug 2018
Posts: 190

Rep: Reputation: Disabled
many http connection to apache server


We use centos 7 and apache 2.4 .

We just find that there are many http connection to one of our web server , over 200,000 hits from the same subnet of IP address per day ,

I know this may be crawler , but if I block these IP , my web site may be disappear from search engine ( I guess ).

I know there is tool fail2ban it can protect the apache , is it good to use such tool ? is there other method may reduct the duplicated http connnection ?
 
Old 10-18-2018, 10:22 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$.

Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider.

SSHguard is another option from Fail2ban, but you'd probably have better results with the above to approaches.
 
1 members found this post helpful.
Old 10-18-2018, 10:37 AM   #3
catiewong
Member
 
Registered: Aug 2018
Posts: 190

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$.

Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider.

SSHguard is another option from Fail2ban, but you'd probably have better results with the above to approaches.

You can look up who owns the netblock using whois and complain to the owner about abuse. That might causea more reasonable traffic load once they adjust their machines. If it is a search engine, I can't see how or why it would be so misconfigured unless the owner is M$. <<== complain to the owner ? I know these IP is from a ISP in U.S. , I guess it should be from search engine , you mean I can do nothing if it is from search engine ?

Another option, which is not mutually exclusive, is to throttle outgoing traffic to that network. Slow it way down and it shoud somewhat lighten the incoming requests if it is a normal spider. <<== that means reduce the traffic allowed to that network ( the subnet of these incoming IP ) ? if yes , by what way / linux tool can do that ? the firewall have such function ?
 
Old 10-19-2018, 06:43 AM   #4
dc.901
Senior Member
 
Registered: Aug 2018
Location: Atlanta, GA - USA
Distribution: CentOS/RHEL, openSuSE/SLES, Ubuntu
Posts: 1,005

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
Quote:
Originally Posted by catiewong View Post
know these IP is from a ISP in U.S. , I guess it should be from search engine , you mean I can do nothing if it is from search engine ?
Just because IP is from an ISP in US, that does not mean its from search engine. As mentioned by Turbocapitalist, you can go to whois site to lookup the owner of IP.
If you wish to block that IP, you can also setup allow/deny rules in apache; see: https://httpd.apache.org/docs/2.4/howto/access.html
 
1 members found this post helpful.
Old 10-21-2018, 11:34 PM   #5
catiewong
Member
 
Registered: Aug 2018
Posts: 190

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by dc.901 View Post
Just because IP is from an ISP in US, that does not mean its from search engine. As mentioned by Turbocapitalist, you can go to whois site to lookup the owner of IP.
If you wish to block that IP, you can also setup allow/deny rules in apache; see: https://httpd.apache.org/docs/2.4/howto/access.html
I have used whois , checked this IP is from US , I think this is search engine

I do not want to block it , as I worry if the IP is blocked , my web will be disappear from search engine .

Is there other way to fix it ?

thanks
 
Old 10-21-2018, 11:36 PM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
What was the reply when you sent an e-mail to the abuse address or called their phone?
 
1 members found this post helpful.
Old 10-21-2018, 11:39 PM   #7
catiewong
Member
 
Registered: Aug 2018
Posts: 190

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
What was the reply when you sent an e-mail to the abuse address or called their phone?
I didn't contact the abuse address , if I tell them the issue , they will stop the searching , is there any impact to my web site ?
 
Old 10-21-2018, 11:43 PM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by catiewong View Post
I didn't contact the abuse address , if I tell them the issue , they will stop the searching , is there any impact to my web site ?
How do you know they will stop searching? How do you know this is actually a seach engine and not a garden variety M$ botnet? Do you want them to stop or to slow down?

Make a short list of the actual facts that you have on hand along with supporting data. Ignore any opinions, guesses, conjectures, or worries. Decide what you want and then use the contact information to deal with the misbehaving address range informing them of what you want and of the facts.
 
2 members found this post helpful.
Old 10-22-2018, 01:25 AM   #9
catiewong
Member
 
Registered: Aug 2018
Posts: 190

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
How do you know they will stop searching? How do you know this is actually a seach engine and not a garden variety M$ botnet? Do you want them to stop or to slow down?

Make a short list of the actual facts that you have on hand along with supporting data. Ignore any opinions, guesses, conjectures, or worries. Decide what you want and then use the contact information to deal with the misbehaving address range informing them of what you want and of the facts.
The below is whois result

Code:
Organization:   Google LLC (GOGL)
Do you want them to stop or to slow down? <<== I am not know what they will do if I report the issue to them , I am worry if they stop it , our web site will not be search on the google .

Last edited by catiewong; 10-22-2018 at 01:28 AM.
 
Old 10-22-2018, 08:40 AM   #10
dc.901
Senior Member
 
Registered: Aug 2018
Location: Atlanta, GA - USA
Distribution: CentOS/RHEL, openSuSE/SLES, Ubuntu
Posts: 1,005

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
So you want to limit http access by IPs, but not permanently block?

For this, fail2ban will work. You can set it up to block for some time (can be configured) after some tries (can be configured)... For example, after 1,000 hits, block IP for 24-hours?

You can perhaps also reach out to the technical contact (should be listed in whois) and simply ask...
 
Old 10-22-2018, 09:07 AM   #11
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Two thoughts:
If Google is hitting your site 200,000 times per day, something is (might be) wrong.
How many of those are 404 hits?
 
Old 10-31-2018, 11:34 PM   #12
catiewong
Member
 
Registered: Aug 2018
Posts: 190

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by scasey View Post
Two thoughts:
If Google is hitting your site 200,000 times per day, something is (might be) wrong.
How many of those are 404 hits?
Yes , there are many 404 hits also.
 
Old 11-01-2018, 02:40 AM   #13
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by catiewong View Post
Yes , there are many 404 hits also.
Are the same missing pages getting hit repeatedly, or just once?

Did you perhaps rearrange your site completely in the recent past and forget to configure 301 redirection?
 
Old 11-01-2018, 04:46 AM   #14
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,475

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
For google you can change the crawl rate in the search console for sites you've verified as owning:

https://support.google.com/webmaster...er/48620?hl=en
 
Old 11-02-2018, 10:16 AM   #15
Habitual
LQ Veteran
 
Registered: Jan 2011
Location: Abingdon, VA
Distribution: Catalina
Posts: 9,374
Blog Entries: 37

Rep: Reputation: Disabled
100 lines of actual code from the apache access.log would tell us enough.
Sanitize your server's IP if you choose to share.

I have found that most of Google's crawlers are using 66.240.x.x

There are good crawlers and bad ones.
Good crawlers honor robots.txt, bad one don't GAF.
Restricting abusive netblocks is all part of "the game".

Email to abuse@ is usually sufficient to start the "Documentation" process.
If it Google-owned, you have "resources" at https://www.google.com/webmasters/tools

You need an Admin!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie with Apache Server question "ervice - The Apache HTTP Server" vs. "Service - The Apache HTTP Server" HYEARWOOD Linux - Newbie 5 10-17-2017 02:38 AM
LXer: Apache HTTP Server Adds HTTP/2 Support for Speed and Security LXer Syndicated Linux News 0 10-19-2015 11:21 PM
Apache 1.3.20 close connection on HTTP 1.1 400 500 errors dafri Linux - Software 4 01-14-2003 08:50 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Security

All times are GMT -5. The time now is 03:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration