Hi,
I am having a headache with finding out what is going on my server. It keeps getting random load spikes and apache becoming irresponsible. My system is build with nginx in front and apache in background. Clean reverse proxy with a backup upstream configured on other machine. This works fine until ... yeah this is the point which i can't figure out.
System: CentoOS (64bit) as a VMWare Virtual Machine
Kernel: 2.6.18-194.32.1.el5.centos.plus
File system: XFS
CPU 4x
RAM 6GB
Server version: Apache/2.2.3 (Server built: Aug 30 2010 12:28:40)
nginx version: nginx/0.8.53
PHP 5.1.6 (cli)
There is a webshop application written in PHP (mySQL is hosted on separate server) running on it but also i got mirror on other server as well. NGinx as default servers static files from localhost and using local apache to provide dynamic content. I had to create a mirror (via nginx upstream settings) in case of local apache timeing out ... which is where the problem is.
Most of the time everything works fine but there are these weird load spikes i can't explain. Load goes up to 15-25 and localhost apache stops responding.
I started to collect a lot of stats on this server but i still can't see where the problem is. Please take a look at some stats being dumped during such moments. Maybe you can see something i don't
We gonna user two time point where problem appeared. Take a look at 16:10 and then between 20:00 - 20:30
Here we go:
SAR
Linux 2.6.18-194.32.1.el5.centos.plus (********) 2011-04-13
00:00:01 proc/s
16:00:01 1,22
16:10:01 1,43
16:20:01 1,32
20:00:01 0,84
20:10:01 1,17
20:20:01 0,93
20:30:01 0,93
00:00:01 cswch/s
16:00:01 2383,38
16:10:01 2092,80
16:20:01 1763,54
16:30:01 1725,95
20:00:01 3081,22
20:10:01 2996,59
20:20:01 2602,10
20:30:01 3037,61
00:00:01 CPU %user %nice %system %iowait %steal %idle
16:00:01 all 21,04 0,00 8,20 0,01 0,00 70,75
16:00:01 0 15,99 0,00 4,73 0,01 0,00 79,27
16:00:01 1 16,10 0,00 5,09 0,01 0,00 78,80
16:00:01 2 16,33 0,00 4,89 0,02 0,00 78,76
16:00:01 3 35,74 0,00 18,07 0,00 0,00 46,18
16:10:01 all 18,02 0,00 7,42 0,01 0,00 74,55
16:10:01 0 14,50 0,00 4,52 0,01 0,00 80,97
16:10:01 1 13,82 0,00 4,68 0,00 0,00 81,50
16:10:01 2 13,85 0,00 4,44 0,01 0,00 81,70
16:10:01 3 29,93 0,00 16,04 0,01 0,00 54,02
16:20:01 all 14,81 0,00 6,89 0,05 0,00 78,26
16:20:01 0 11,05 0,00 3,85 0,01 0,00 85,08
16:20:01 1 10,76 0,00 4,09 0,01 0,00 85,15
16:20:01 2 10,88 0,00 4,00 0,15 0,00 84,97
16:20:01 3 26,53 0,00 15,62 0,01 0,00 57,84
20:00:01 all 26,65 0,00 8,32 0,01 0,00 65,02
20:00:01 0 20,57 0,00 4,47 0,01 0,00 74,95
20:00:01 1 20,57 0,00 4,68 0,00 0,00 74,75
20:00:01 2 20,24 0,00 4,54 0,01 0,00 75,20
20:00:01 3 45,21 0,00 19,59 0,00 0,00 35,19
20:10:01 all 28,65 0,00 10,63 0,10 0,00 60,63
20:10:01 0 23,69 0,00 6,88 0,06 0,00 69,37
20:10:01 1 23,26 0,00 7,20 0,03 0,00 69,51
20:10:01 2 23,18 0,00 7,10 0,29 0,00 69,43
20:10:01 3 44,40 0,00 21,28 0,01 0,00 34,30
20:20:01 all 23,68 0,00 8,17 0,17 0,00 67,98
20:20:01 0 17,53 0,00 4,61 0,06 0,00 77,80
20:20:01 1 17,52 0,00 4,62 0,08 0,00 77,79
20:20:01 2 17,62 0,00 4,98 0,52 0,00 76,88
20:20:01 3 42,05 0,00 18,48 0,02 0,00 39,45
20:30:01 all 27,21 0,00 8,79 0,01 0,00 64,00
20:30:01 0 21,45 0,00 5,23 0,01 0,00 73,32
20:30:01 1 21,24 0,00 5,37 0,00 0,00 73,39
20:30:01 2 21,42 0,00 5,09 0,02 0,00 73,48
20:30:01 3 44,71 0,00 19,47 0,00 0,00 35,82
00:00:01 INTR intr/s
16:00:01 sum 2897,75
16:10:01 sum 2654,63
16:20:01 sum 2522,15
20:00:01 sum 3218,31
20:10:01 sum 2776,08
20:20:01 sum 2743,67
20:30:01 sum 2855,21
00:00:01 tps rtps wtps bread/s bwrtn/s
16:00:01 17,04 0,15 16,89 7,68 460,67
16:10:01 16,71 0,08 16,63 6,45 482,71
16:20:01 21,43 1,73 19,70 84,90 614,03
16:30:01 12,14 0,06 12,08 3,10 259,70
20:00:01 15,56 0,13 15,43 12,12 346,32
20:10:01 25,74 3,71 22,03 638,55 431,66
20:20:01 51,41 31,67 19,74 3809,10 371,26
20:30:01 18,05 0,18 17,87 19,48 373,91
00:00:01 frmpg/s bufpg/s campg/s
16:00:01 -96,44 0,06 9,52
16:10:01 118,87 0,04 8,44
16:20:01 -280,55 0,06 13,78
16:30:01 220,07 0,06 7,08
20:00:01 -52,40 0,06 12,92
20:10:01 19,53 0,15 13,20
20:20:01 -9,89 0,23 10,71
20:30:01 -43,41 0,05 12,48
00:00:01 IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
16:00:01 lo 222,22 222,22 415134,50 415134,50 0,00 0,00 0,00
16:00:01 eth0 2009,81 2284,46 847467,03 1536557,96 0,00 0,00 0,00
16:00:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
16:10:01 lo 176,01 176,01 335443,42 335443,42 0,00 0,00 0,00
16:10:01 eth0 1753,58 1886,29 734979,73 1224188,08 0,00 0,00 0,00
16:10:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
16:20:01 lo 130,54 130,54 224751,39 224751,39 0,00 0,00 0,00
16:20:01 eth0 1472,17 1657,99 558715,35 1147156,41 0,00 0,00 0,00
16:20:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
16:30:01 lo 139,64 139,64 248102,76 248102,76 0,00 0,00 0,00
16:30:01 eth0 1422,17 1674,51 584657,81 1258328,53 0,00 0,00 0,00
16:30:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
20:00:01 lo 305,68 305,68 593854,92 593854,92 0,00 0,00 0,00
20:00:01 eth0 2790,26 3340,79 1199849,69 2452070,71 0,00 0,00 0,00
20:00:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
20:10:01 lo 310,87 310,87 613079,93 613079,93 0,00 0,00 0,00
20:10:01 eth0 2882,77 3485,29 1229454,51 2618356,31 0,00 0,00 0,00
20:10:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
20:20:01 lo 272,95 272,95 521194,48 521194,48 0,00 0,00 0,00
20:20:01 eth0 2309,35 2712,83 1001607,87 1951056,64 0,00 0,00 0,00
20:20:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
20:30:01 lo 295,53 295,53 584662,85 584662,85 0,00 0,00 0,00
20:30:01 eth0 2635,89 2986,98 1143017,97 2047382,43 0,00 0,00 0,00
20:30:01 sit0 0,00 0,00 0,00 0,00 0,00 0,00 0,00
00:00:01 pgpgin/s pgpgout/s fault/s majflt/s
16:00:01 1,85 103,68 1905,90 0,00
16:10:01 1,61 110,69 2236,18 0,00
16:20:01 21,23 143,92 2134,28 0,00
16:30:01 0,78 55,49 1334,28 0,00
20:00:01 3,03 74,03 516,50 0,00
20:10:01 159,37 88,73 911,49 0,01
20:20:01 952,27 75,20 487,53 0,00
20:30:01 4,87 78,11 540,69 0,00
00:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
16:00:01 2994036 3112756 50,97 39280 2187416 6094732 108 0,00 0
16:10:01 3278072 2828720 46,32 39380 2207588 6094732 108 0,00 0
16:20:01 2607052 3499740 57,31 39524 2240536 6094732 108 0,00 0
16:30:01 3133432 2973360 48,69 39656 2257472 6094732 108 0,00 0
20:00:01 2713056 3393736 55,57 42432 2712312 6094732 108 0,00 0
20:10:01 2755240 3351552 54,88 42756 2740824 6094732 108 0,00 0
20:20:01 2732140 3374652 55,26 43296 2765860 6094732 108 0,00 0
20:30:01 2628996 3477796 56,95 43408 2795520 6094732 108 0,00 0
00:00:01 dentunusd file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz
16:00:01 287680 3060 137318 0 0,00 0 0,00 0 0,00
16:10:01 287819 2550 136591 0 0,00 0 0,00 0 0,00
16:20:01 288394 2040 137309 0 0,00 0 0,00 0 0,00
16:30:01 288605 2550 136745 0 0,00 0 0,00 0 0,00
20:00:01 297651 3570 137214 0 0,00 0 0,00 0 0,00
20:10:01 298491 3570 137689 0 0,00 0 0,00 0 0,00
20:20:01 299280 2550 137937 0 0,00 0 0,00 0 0,00
20:30:01 300222 3060 138322 0 0,00 0 0,00 0 0,00
00:00:01 totsck tcpsck udpsck rawsck ip-frag
16:00:01 988 820 6 0 0
16:10:01 662 544 6 0 0
16:20:01 920 738 6 0 0
20:00:01 897 793 6 0 0
20:10:01 892 778 6 0 0
20:20:01 807 700 6 0 0
20:30:01 1064 954 6 0 0
00:00:01 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
16:00:01 4 230 1,46 1,52 1,33
16:10:01 3 164 1,55 1,44 1,36
16:20:01 2 227 0,60 0,97 1,20
16:30:01 5 159 1,08 0,83 0,96
20:00:01 4 161 1,39 1,74 1,63
20:10:01 6 167 7,28 5,35 3,16
20:20:01 6 159 1,18 1,92 2,34
20:30:01 4 160 1,95 1,95 2,09
This is a load dump (my little script) output which is launched whenever load goes above 5 but also this time apache gone bad:
TOP 20 PROCESESS:
apache 15423 23.0 0.5 288004 31120 ? S 20:08 0:00 \_ /usr/sbin/httpd
apache 15419 7.8 0.5 289016 32156 ? R 20:08 0:00 \_ /usr/sbin/httpd
apache 15418 7.6 0.5 289040 32068 ? R 20:08 0:00 \_ /usr/sbin/httpd
apache 15396 7.1 0.5 289052 32124 ? S 20:07 0:01 \_ /usr/sbin/httpd
apache 15412 5.6 0.5 289040 32576 ? R 20:07 0:00 \_ /usr/sbin/httpd
apache 15395 5.5 0.5 289040 33068 ? S 20:07 0:01 \_ /usr/sbin/httpd
apache 15378 5.2 0.5 344212 33120 ? S 20:07 0:01 \_ /usr/sbin/httpd
apache 15375 5.2 0.5 289044 32684 ? S 20:07 0:01 \_ /usr/sbin/httpd
apache 15373 5.2 0.5 289052 33024 ? S 20:07 0:02 \_ /usr/sbin/httpd
apache 15391 5.1 0.5 344172 32232 ? S 20:07 0:01 \_ /usr/sbin/httpd
apache 14879 5.1 0.5 345208 34116 ? S 19:57 0:32 \_ /usr/sbin/httpd
apache 14837 5.1 0.6 349332 38084 ? S 19:57 0:32 \_ /usr/sbin/httpd
apache 15309 4.9 0.5 290076 33328 ? R 20:05 0:06 \_ /usr/sbin/httpd
apache 14881 4.9 0.5 349364 34900 ? S 19:57 0:31 \_ /usr/sbin/httpd
apache 14876 4.9 0.5 345208 34204 ? S 19:57 0:31 \_ /usr/sbin/httpd
apache 14878 4.8 0.5 345208 33432 ? S 19:57 0:30 \_ /usr/sbin/httpd
apache 14861 4.8 0.6 353460 38684 ? S 19:57 0:30 \_ /usr/sbin/httpd
apache 15145 4.7 0.5 345208 33920 ? S 20:02 0:14 \_ /usr/sbin/httpd
apache 14882 4.7 0.5 344184 33312 ? S 19:57 0:29 \_ /usr/sbin/httpd
apache 14880 4.7 0.6 349304 37768 ? R 19:57 0:30 \_ /usr/sbin/httpd
IOSTAT:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.89 2.11 0.10 1.82 3.14 15.71 19.58 0.01 4.00 0.66 0.13
sda1 0.87 0.00 0.01 0.00 0.88 0.00 186.85 0.00 6.15 2.80 0.00
sda2 0.02 2.11 0.09 1.82 2.25 15.71 18.75 0.01 3.99 0.65 0.13
sdb 19.27 0.04 0.96 0.37 46.99 28.17 113.34 0.01 7.31 1.99 0.26
sdb1 19.27 0.04 0.96 0.37 46.99 28.17 113.35 0.01 7.31 1.99 0.26
dm-0 0.00 0.00 0.11 3.93 2.25 15.71 8.90 0.15 37.62 0.31 0.13
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 6.13 0.23 0.00
NGINX:
Active connections: 743
server accepts handled requests
1716567 1716567 20129839
Reading: 0 Writing: 65 Waiting: 678
APACHE:
Parent Server Generation: 10
Server uptime: 16 hours 7 minutes 24 seconds
Total accesses: 946844 - Total Traffic: 8.2 GB
CPU Usage: u1459.93 s334.5 cu0 cs0 - 3.09% CPU load
16.3 requests/sec - 147.4 kB/second - 9.0 kB/request
38 requests currently being processed, 0 idle workers
WWWWWWWWWWWWWWCWWWWWWWWWWWW.W..W..W.W..W..W.WW.W...W............
................................................................
W...............................................................
................................................................
............................................
There is no pattern i could find when it happens. It happens at the time where there is almost no ppl browsing products but also happens when site load is medium or high. It looks very random but where is the problem?
There are no cronjobs running during these times (backup is being run late at night). Why this darn Apache stops responding?
During these apache problems server and system remains stable and nginx is working fine using backup apache server on other machine.
Apache is currently processing 16.3 requests/sec but other times it works fine with much more req/s.