LinuxQuestions.org - Trying to troubleshoot sudden server slowdown

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - Trying to troubleshoot sudden server slowdown (https://www.linuxquestions.org/questions/linux-server-73/trying-to-troubleshoot-sudden-server-slowdown-869297/)

Trying to troubleshoot sudden server slowdown

I have a VPS, running Ubuntu 10.04.02 LTS, and ever since I applied some of the security updates several months ago, the server will eventually run out of resources and force me to do a hard reset.

Since it happens so quickly, and once it happens, I can't do anything via the command line, I have not figured out yet what process is causing this. I am guessing it is either Apache or MySQL since there isn't much else running on that server.

free -m output:

Code:

Mem:          1712      1692        20          0        56        396

-/+ buffers/cache:      1238        473

Swap:        1913          0      1913

What would be the best approach to figure this out? Thanks for your input!

Have you checked your Apache and MySQL logs for more info? I don't run Ubuntu so I'm not familiar with its file-system layout, but you could check /var/log and /usr/local/apache2/logs as a start.

When you say it runs out of resources, do you mean it runs out of ram and starts swapping? Do you mean the disk IO is too high? CPU load spike? What I have seen on quite a few VPS/Cloud Servers is a default apache config with too many MaxSpareServers which end up eating all of the available RAM. This then causes the VPS to swap and its all downhill from there. This is just a blind shot in the dark as it could be countless things that are causing issues for you.

I'm pretty sure it is memory or CPU because I did configure webmin to alert me when load values increase rapidly. The problem is, I receive a few e-mails, and by the time I get to the system (usually within a minute), it's in such bad shape that I can't do anything. If I am logged in already, I can try to type a command, but whatever I type appears very slowly, which also makes me believe it is a cpu/memory issue.

The good ol' top(1) command gives you a quick look at what's happening WRT swap, CPU usage, and memory usage (among others). I normally go through the following steps to quickly locate an offender:

fire up top(1), a la

Code:

# top
type 'x', followed by 'b' (just turns on some pretty formatting)
Use the '<' and '>' keys to toggle columns, thereby sorting by CPU %, mem %, or whatever

Hopefully that provides some clues. If not, and/or if this is a regular problem for you, it may be time to install sysstat.

The problem is that it happens pretty quick, and when it happens, I can't run top or any other commands.

Sounds like a fairly normal "no more memory, time to swap" issue. I would work on optimizing apache as it is likely your culprit if you are just running a basic LAMP stack. If you are using WordPress for your sites, I would also recommend installing some caching plugins and maybe xcache for php to lighten the load. In my experience, running a default wp install on your standard 'yum install httpd' apache configs will start eating up memory fast. This is mostly due to the MaxSpareServers settings being too high on a small VPS.

Check the amount of space consumed by your log files and make sure logrotate is working correctly.

It's a fairly optimized Apache/MySQL install, running Joomla, eAccelerator and various other security components. It's the same configuration that has been running ok for years, but I do need to brush up on my Apache/MySQL performance tuning skills since it has been a while.

Thanks for the sysstat pointer, looks pretty interesting, will check it out.

I can't believe anyone would run a "critical" server without a logging monitor.
sysstat is good for over-all trends; if you want to be able to pinpoint particular processes over a (much) more granular record, have a look at collectl (in daemon mode).

I have been running collectl, but am fairly sure that I am dealing with an apache problem, due to memory usage, and restarting apache (when it allows me to) does fix the problem for 24 hours or so.

Code:

root    16835  0.0  7.4 435072 131024 ?      Ss  03:47  0:00 /usr/sbin/apache2 -k start

www-data 16837  0.1  8.6 444188 151988 ?      S    03:47  0:05  \_ /usr/sbin/apache2 -k start

www-data 16838  0.2  9.1 448284 160236 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16840  0.2  9.9 462472 173840 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16841  0.2 10.0 462632 175524 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16842  0.2 10.2 473028 180516 ?      S    03:47  0:10  \_ /usr/sbin/apache2 -k start

www-data 16843  0.2  9.7 462260 171492 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16844  0.2  9.5 457216 167820 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16845  0.2 10.3 471272 181680 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16846  0.1  8.9 448288 157476 ?      S    03:47  0:05  \_ /usr/sbin/apache2 -k start

www-data 16847  0.1  8.9 450052 157768 ?      S    03:47  0:05  \_ /usr/sbin/apache2 -k start

www-data 16848  0.2  9.7 461132 171224 ?      S    03:47  0:09  \_ /usr/sbin/apache2 -k start

www-data 16850  0.1  9.9 466468 174984 ?      S    03:47  0:06  \_ /usr/sbin/apache2 -k start

www-data 16851  0.2  9.3 452928 163956 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16852  0.2 10.0 464896 175556 ?      S    03:47  0:09  \_ /usr/sbin/apache2 -k start

www-data 16853  0.2 10.2 471528 180416 ?      S    03:47  0:08  \_ /usr/sbin/apache2 -k start

www-data 16854  0.1  9.6 461636 170112 ?      S    03:47  0:06  \_ /usr/sbin/apache2 -k start

www-data 16855  0.2  9.6 461924 170024 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16856  0.1  8.9 448316 156980 ?      S    03:47  0:05  \_ /usr/sbin/apache2 -k start

www-data 16857  0.1  9.8 464004 172108 ?      S    03:47  0:06  \_ /usr/sbin/apache2 -k start

www-data 16858  0.1  8.7 445244 153660 ?      S    03:47  0:06  \_ /usr/sbin/apache2 -k start

www-data 16859  0.3  9.9 464304 174820 ?      S    03:47  0:10  \_ /usr/sbin/apache2 -k start

www-data 16865  0.2 10.0 463920 175512 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16869  0.1  9.4 455088 165688 ?      S    03:47  0:06  \_ /usr/sbin/apache2 -k start

www-data 16871  0.2  9.7 460896 170940 ?      S    03:47  0:07  \_ /usr/sbin/apache2 -k start

www-data 16872  0.1  9.6 454892 168448 ?      S    03:47  0:06  \_ /usr/sbin/apache2 -k start

www-data 16873  0.2  9.6 457004 168944 ?      S    03:47  0:08  \_ /usr/sbin/apache2 -k start

www-data 16874  0.1  9.0 448540 158160 ?      S    03:47  0:06  \_ /usr/sbin/apache2 -k start

www-data 16875  0.1  9.1 453932 160020 ?      S    03:47  0:05  \_ /usr/sbin/apache2 -k start

www-data 17426  0.2  8.5 443148 149400 ?      S    04:31  0:01  \_ /usr/sbin/apache2 -k start

www-data 17427  0.3  9.6 460564 168892 ?      S    04:32  0:02  \_ /usr/sbin/apache2 -k start

www-data 17428  0.2  9.1 451880 159632 ?      S    04:32  0:01  \_ /usr/sbin/apache2 -k start

www-data 17429  0.5  9.4 457716 166112 ?      S    04:32  0:04  \_ /usr/sbin/apache2 -k start

www-data 17430  0.1  8.5 445964 149948 ?      S    04:32  0:01  \_ /usr/sbin/apache2 -k start

www-data 17431  0.1  8.0 439336 140856 ?      S    04:32  0:01  \_ /usr/sbin/apache2 -k start

www-data 17432  0.2  9.4 458736 165736 ?      S    04:32  0:02  \_ /usr/sbin/apache2 -k start

If I am reading this right (and 'top' seems to confirm this), Apache is using way too much memory, so I am guessing it's a configuration issue. I am using the prefork mpm, and PHP5 is loaded as a module. Here is the output of vmstat 10 10 (run when the server is mostly inactive, will post again once things get busier, since that's when it seems to crash):

Code:

vmstat 10 10 

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

 r  b  swpd  free  buff  cache  si  so    bi    bo  in  cs us sy id wa

 0  0  46644 119696  60320 353272    0    1  168    61  113  179 10  0 88  2

 0  0  46644 119928  60336 353272    0    0    0    33  35  34  1  0 99  0

 0  0  46644 120432  60356 353348    0    0    5    38  62  58  7  0 92  1

 0  0  46644 120432  60356 353404    0    0    5    0  20  24  0  0 100  0

 0  0  46644 120432  60372 353408    0    0    0    14  34  37  1  0 98  0

 0  0  46644 120432  60380 353492    0    0    8    18  90  46  1  0 98  0

 0  0  46644 120300  60388 353492    0    0    0    8  28  273  3  1 96  0

 0  0  46644 120276  60408 353584    0    0    9    39  70  101  4  0 95  1

 0  0  46644 119804  60424 353672    0    0    8    24  88  120  5  0 95  0

 0  0  46644 116092  60452 353708    0    0    1    20  51  182 14  1 85  0

I guess it could even be a PHP script issue, but since it's loaded as a module, I can't figure out how to tell. ANY input would be appreciated, as I have to force reboot my server on a daily basis.

So months later, I still haven't figured this out. I started logging the output of vmstat to a log while the server was in good shape, figuring I would catch the output of vmstat as the server dies. Snippet of when the server was running ok:

PHP Code:




Thu Dec 8 10:14:57 EST 2011 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----. 
Thu Dec 8 10:14:57 EST 2011 r b swpd free buff cache si so bi bo in cs us sy id wa. 
Thu Dec 8 10:14:57 EST 2011 1 0 239380 43664 26892 100524 11 16 309 99 133 181 14 1 82 3. 
Thu Dec 8 10:15:12 EST 2011 0 0 241500 46448 25588 98668 76 152 363 218 146 687 8 1 83 7. 
Thu Dec 8 10:15:27 EST 2011 1 0 240968 42388 25616 98844 66 0 79 24 58 79 8 0 89 3. 
Thu Dec 8 10:15:42 EST 2011 0 0 240680 44372 25636 99508 25 0 66 61 73 79 8 0 89 2. 
Thu Dec 8 10:15:57 EST 2011 0 0 240248 37304 25688 106020 13 0 449 30 112 146 10 0 83 6. 
Thu Dec 8 10:16:12 EST 2011 0 0 239612 30484 25764 107860 14 0 135 61 263 319 29 1 65 4. 
Thu Dec 8 10:16:27 EST 2011 0 0 239596 30492 25780 108048 0 0 13 34 51 63 3 0 96 0. 
Thu Dec 8 10:16:42 EST 2011 0 0 239428 33020 25792 108040 4 0 71 45 83 147 10 0 87 3.

And here are the last few lines of the log file as the server died (vmstat was supposed to log the output every 15 seconds):

PHP Code:




Sun Dec 11 09:06:31 EST 2011 0 43 1898212 12256 1816 21944 1357 1266 1879 1308 586 321 1 0 0 98. 
Sun Dec 11 09:06:31 EST 2011 0 52 1910868 12516 1932 22384 969 1058 1254 1070 545 234 1 1 0 98. 
Sun Dec 11 09:07:00 EST 2011 0 45 1922788 12304 2012 21996 732 957 923 978 512 225 1 0 0 99. 
Sun Dec 11 09:07:05 EST 2011 0 69 1935644 12148 2016 23264 2276 1440 2765 1481 611 415 2 1 0 97. 
Sun Dec 11 09:07:32 EST 2011 0 80 1950728 12164 2208 25068 1210 1313 1565 1342 544 280 1 1 0 99. 
Sun Dec 11 09:08:24 EST 2011 0 80 1959204 12092 2172 23348 1137 810 1309 821 493 229 0 1 0 99. 
Sun Dec 11 09:10:05 EST 2011 0 70 1959788 14128 200 4248 3229 834 4759 877 975 639 1 21 0 78. 
Sun Dec 11 09:17:30 EST 2011 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----. 
Sun Dec 11 09:19:54 EST 2011 r b swpd free buff cache si so bi bo in cs us sy id wa. 
Sun Dec 11 09:26:56 EST 2011 0 76 1959820 12052 232 6808 2721 704 6315 717 1065 736 0 23 0 77.

The webserver receives 10,000 page visits a day, nothing special really. The system has 2GB of memory (1.7 usable), so was hoping you guys could suggest some good apache conf parameters.

What really does stand out is the memory usage of my apache child processes. It looks pretty excessive:

PHP Code:




4131 www-data  20   0  436m 165m 6112 S  0.0  9.7   0:53.21 apache2                                                                                                                                                                         
 4132 www-data  20   0  426m 155m 6216 S  0.0  9.1   0:45.99 apache2                                                                                                                                                                         
 4133 www-data  20   0  431m 160m 6156 S  0.0  9.4   0:52.18 apache2                                                                                                                                                                         
 4134 www-data  20   0  434m 163m 5892 S  0.0  9.5   0:52.59 apache2                                                                                                                                                                         
 4135 www-data  20   0  430m 161m 5676 S  0.0  9.4   0:53.88 apache2                                                                                                                                                                         
 4136 www-data  20   0  424m 155m 5704 S  0.0  9.1   0:51.70 apache2                                                                                                                                                                         
 4137 www-data  20   0  431m 161m 6584 S  0.0  9.4   0:45.08 apache2                                                                                                                                                                         
 4138 www-data  20   0  433m 162m 6104 S  0.0  9.5   0:49.69 apache2                                                                                                                                                                         
 4139 www-data  20   0  404m 134m 6444 S  0.0  7.8   0:55.98 apache2                                                                                                                                                                         
 4140 www-data  20   0  434m 163m 6844 S  0.0  9.5   0:50.30 apache2

apache is configured to use the pre-fork module, with the following parameters:

PHP Code:




<IfModule mpm_prefork_module> 
StartServers 10 
MinSpareServers 15 
MaxSpareServers 35 
MaxClients          10 
MaxRequestsPerChild 3000 
</IfModule>

MaxClients used to be much higher, I just dropped it again to see if this will make a difference.

Modules loaded:

PHP Code:




Loaded Modules: 
 core_module (static) 
 log_config_module (static) 
 logio_module (static) 
 mpm_prefork_module (static) 
 http_module (static) 
 so_module (static) 
 alias_module (shared) 
 auth_basic_module (shared) 
 authn_file_module (shared) 
 authz_default_module (shared) 
 authz_groupfile_module (shared) 
 authz_host_module (shared) 
 authz_user_module (shared) 
 autoindex_module (shared) 
 cgi_module (shared) 
 deflate_module (shared) 
 dir_module (shared) 
 env_module (shared) 
 mime_module (shared) 
 security2_module (shared) 
 negotiation_module (shared) 
 php5_module (shared) 
 reqtimeout_module (shared) 
 rewrite_module (shared) 
 setenvif_module (shared) 
 ssl_module (shared) 
 status_module (shared) 
 unique_id_module (shared)

I'm hoping this is enough info (and not too overwhelming), as I really would appreciate some helpful pointers here. Thanks!