LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   Can you help decipher Apache2 server-status output? Are KeepAlives Misconfigured? (http://www.linuxquestions.org/questions/linux-server-73/can-you-help-decipher-apache2-server-status-output-are-keepalives-misconfigured-945793/)

transient 05-18-2012 04:30 PM

Can you help decipher Apache2 server-status output? Are KeepAlives Misconfigured?
 
Hi all-

I have Apache 2.2.14 running on Ubuntu 10.04. The server has been running for almost a year without incident. The server is a front-end web server which passes requests for dynamic content (Java) to Tomcat on other servers using mod_jk. April 24th the web server started responding slugglishly to client requests (as reported by clients) before ultimately becoming unresponsive. It had to be hard rebooted to bring it back up. I found no evidence of what caused the crash after the server came back up, but I had noticed that there were a lot of Apache processes running (~146) right before everything crapped the bed. The MaxClients directive is set to the default of 150 and this has been sufficient all this time. I changed it to 200 and immediately saw that there was a jump in child process numbers almost immediately, like it couldn't wait to launch more child processes. I didn't see anything in the error log about going over the MaxClient limit so I'm not sure if I'm chasing squirrels here. In a nutshell, what I'm looking for is help is figuring out if MaxClients/KeepAlives are potential performance problems in my config, or whether I might be looking at some outside influence (compromised system).

The server is running mod_php so it's using the prefork MPM. Here are the relevant lines from apache2.conf:

Code:

<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers      5
    MaxSpareServers      10
    MaxClients          200
    MaxRequestsPerChild  0
</IfModule>


KeepAlive On
MaxKeepAliveRequests 100

KeepAliveTimeout 15

I enabled the server-status page and briefly turned ExtendedStatus On (didn't want to leave it on if it could decrease performance). Here's what I saw:

Code:

KWKKKKKKKKKKKKKKKKKKKKKKKKKCKKKKKKWKCKKKKKKKKCKKKWSSSSS.........
.....K...................K.....W.......K........................
..K.....K....K..........W...G...KK...................W..........
.C.....K........................................................

I had a lot of KeepAlives that I recognized, like this:

Code:

3-3 886 6/6/6 K 0.01 3 1 7.1 0.01 0.01 9.9.9.9 clientname.com POST /RequestHandler HTTP/1.1
That seems like a lot of open connections not necessarily doing anything. I assume that for every one of these there are corresponding child processes hanging around. Should I decrease the KeepAlive Timeout number?

The other bit is that along with the above lines in server-status I am seeing things like:

Code:

136-2 747 0/0/0 K 0.00 1335452011 0 0.0 0.00 0.00   
141-2 752 0/0/0 K 0.00 1335452011 0 0.0 0.00 0.00   
152-2 764 0/0/0 W 0.00 1335452011 0 0.0 0.00 0.00   
156-2 768 0/0/0 G 0.00 1335452011 0 0.0 0.00 0.00   
160-2 772 0/0/0 K 0.00 1335452011 0 0.0 0.00 0.00   
161-2 773 0/0/0 K 0.00 1335452011 0 0.0 0.00 0.00   
181-2 793 0/0/0 W 0.00 1335452011 0 0.0 0.00 0.00   
193-2 805 0/0/0 C 0.00 1335452011 0 0.0 0.00 0.00   
199-2 811 0/0/0 K 0.00 1335452011 0 0.0 0.00 0.00

If I understand the scoreboard key correctly, this means that there are child processes that have been in either a KeepAlive (mostly), closing, or replying state that have no connection on the other end (since there are no IP addresses and all other stats are 0). It's essentially like they're stuck or hanging. Is this an accurate understanding? I wouldn't think this is a by-product of too high of a KeepAlive value, which is what makes me wonder if this is something else malicious, or a memory leak or something like that.

I also converted the seconds in the SS column to get an idea of how long they've been that way, and it comes out to something like 15,000 days! That's not right obviously. Am I doing something wrong in that calculation, or misunderstanding what it really means?

Thanks in advance,
SC

tronayne 05-19-2012 10:31 AM

My configuration looks like this (these are default values):
Code:

#
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 300

#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On

#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100

#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 5

And
Code:

# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers      5
    MaxSpareServers      10
    MaxClients          150
    MaxRequestsPerChild  0
</IfModule>

The system has only been up for 42 days (usually a lot longer); this is what ps shows:
Code:

root      2215    1  0 Apr06 ?        00:01:01 /usr/sbin/httpd -k start
apache    2244  2215  0 Apr06 ?        00:00:03 /usr/sbin/httpd -k start
apache    2571  2215  0 Apr06 ?        00:00:04 /usr/sbin/httpd -k start
apache  25830  2215  0 Apr18 ?        00:00:21 /usr/sbin/httpd -k start
apache  27192  2215  0 Apr18 ?        00:00:01 /usr/sbin/httpd -k start
apache  27222  2215  0 Apr18 ?        00:00:02 /usr/sbin/httpd -k start
apache  30932  2215  0 Apr19 ?        00:00:03 /usr/sbin/httpd -k start
apache  30937  2215  0 Apr19 ?        00:00:02 /usr/sbin/httpd -k start
apache  30962  2215  0 Apr19 ?        00:00:01 /usr/sbin/httpd -k start
apache  31133  2215  0 Apr19 ?        00:00:00 /usr/sbin/httpd -k start
apache  31138  2215  0 Apr19 ?        00:00:00 /usr/sbin/httpd -k start

The system was rebooted 06 April.

BTW, there are 86,400 seconds in a day (60 * 60 * 24) if that helps.

There have been a couple of updates to httpd over the past few months; httpd should be at least httpd-2.2.22 and PHP at php-5.3.10 (if you're using PHP). If you are not up to that/those levels, you might want to upgrade.

Do you have ntop installed? Have you watched it? You might be getting whacked from China, Korea or some other lawless land; have a look-see in syslog for break-in attempts. If you're interested, go take a look at http://www.ntop.org; ntop.

Enabling the server status doesn't impact performance.

Your configuration looks all right (well, what you showed anyway). You might want to take it back to the defaults (which work just fine in almost every case). Might be worth your time to upgrade httpd and, perhaps, install ntop to see what's what. Also wouldn't hurt to take a look at Tomcat settings (don't know much about Tomcat) and at your web page source, perhaps using Bluefish which can point out problems with a plug-in or two installed in the program.

Hope this helps some.

transient 05-22-2012 08:09 AM

Thanks for the input tronayne. I already see that your default configuration has a smaller keepalive interval than mine. I didn't edit that value so I wonder if there are different defaults based on the distribution or depending on whether or not you install from source vs. repository. I'm gonna try reducing my timeout interval to 5. I will also take a look at ntop, which is not something I've used before. Thanks for the suggestion; I feel like there are a ton of utilities out there and I'll never know them all. :)

Also, I have definitely seen attempts from China and Japan to access our site (lots of fishing for phpmyadmin and various files under /var/www that I assume can be default on some systems). I've been blocking those ranges at the firewall level (ASA) in chunks. Do each of those requests open an Apache process (and keeps it open for the keepalive time), even if there is no data being served? Could that also explain the server-status entries with no IP addresses and the long hold times (which is 15456 days and clearly incorrect since the server hasn't been running that long)?

tronayne 05-23-2012 09:34 AM

It usually turns out that the default values are pretty good at keeping things under control (which is why I don't ever mess with them unless there's a darned good reason for doing so, eh?). Can't hurt.

One thing that I've had running for years is DenyHosts (http://denyhosts.sourceforge.net/), "DenyHosts is a script intended to be run by Linux system administrators to help thwart SSH server attacks (also known as dictionary based attacks and brute force attacks)." There are truly a massing amount of those happening all the time, many of which are aimed at things like phpMyAdmin.

DenyHosts is dynamic -- meaning you don't have to fool with it. It's looking at your logs and detects break-in attempts, making either IPTABLES or /etc/hosts.deny entries that will refuse any further connections from a "bad" IP address. It also, if you want, shares addresses with other DenyHosts sites around the world and records those in your /etc/hosts.deny file (the effective and easy way) of other users' experience.

Where country blocks are effective at blocking the entire country, DenyHosts is effective at blocking the bad actors from both a country IP address and any compromised Windows machines being used to hide behind (which also gets some dodo brain's PC address in Spokane being used in attacks). You can also set it up to send you mail of what's going on.

Might be worth your time to have a look-see.

When you're blocking at /etc/hosts.deny or IPTABLS the attacker doesn't get to Apache; check your access_log and error_log files (probably in /var/log/httpd?). If you see attacks in there, take a look at managing access with htaccess, but you're really better off with IPTABLES or /etc/hosts.deny which are at the network interface rather than the Apache interface.

Also, get the update for HTTPD -- that's one you really need to do.

And, take a look at your traffic analysis with NTOP, as well as the access_log and the error_log which will help identify problem areas.

Hope this helps some.


All times are GMT -5. The time now is 02:48 AM.