LinuxQuestions.org - [SOLVED] HELP: system load on server is very high

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - HELP: system load on server is very high (https://www.linuxquestions.org/questions/linux-server-73/help-system-load-on-server-is-very-high-4175429969/)

HELP: system load on server is very high

My system responds very slow on Apache requests, as far I analyzed top-command it have something to do with high system CPU usage - but how to analyze this further?

Here is the top-output during a standard (small and simple) http request. This runs on another server within 2s, but on the problematic ones it takes minutes:

Code:

top - 01:06:10 up 3 days,  9:54,  3 users,  load average: 2.02, 1.89, 2.28

Tasks:  55 total,  2 running,  53 sleeping,  0 stopped,  0 zombie

Cpu(s): 13.6%us, 83.4%sy,  0.0%ni,  0.2%id,  2.5%wa,  0.0%hi,  0.2%si,  0.0%st

Mem:    61540k total,    58832k used,    2708k free,        0k buffers

Swap:        0k total,        0k used,        0k free,    17632k cached



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

28536 www-data  20  0 46956  11m 5128 R 12.6 19.2  0:18.61 apache2

31953 root      20  0  2820  752  664 S  8.7  1.2  0:00.42 wget

 9370 root      20  0  2536  692  476 R  6.4  1.1  2:32.23 top

 6412 root      20  0  8920  800  304 S  4.3  1.3  0:15.25 sshd

31951 root      20  0  1664  492  436 S  2.9  0.8  0:00.20 sh

The server is a tiny embedded single-core ARM, but this shouldn't matter. On some other servers with identical hardware, the same request runs fine.
The distri ist Debian Squeeze.

I have the gut-feeling that the high system-load is somehow caused by Apache, but can't strengthen this thesis. The apache error.log shows nothing...
How to find the process causing this??

Edit: The best explanation for high %sy in top I found is:
"having higher numbers here may indicate a problem with kernel configs, a driver issue, or any number of other things" here
But this don't help...

Thanks
Achim

Is the load average consistently high?

Can you post the output of some other performance tools? vmstat, iostat, free etc

Quote:

Originally Posted by AchimRS (Post 4794390)

I have the gut-feeling that the high system-load is somehow caused by Apache, but can't strengthen this thesis. The apache error.log shows nothing...

Unlikely. I'd be guessing a driver issue.
Very hard to track down - never looked at embedded. Have a look at /proc/interrupts for hints on what may be playing up.
vmstat might indicate abnormal context switches - also a clue that driver/interrupt is the problem.

Hi all,
the load is not consistently high, it alternates between 2%sy and 90%sy and it looks like it somehow depends from the apache process.
If it is restarted, the %sy is very low for a while, after some hours of requests to apache it will come up more and more on each request and therefore the response become much slower - until it is uselsess. A simple apache restart will than bring back the system to be responsive again... so i Guess it have something to do with PHP or Apache.

Thanks for the good hint with vmstat, here is some output:

The first output was made after Apache processes have been running since several days during a single request which took about 100s:

Code:

~>vmstat -a 2                                                      

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----      

 r  b  swpd  free  inact active  si  so    bi    bo  in  cs us sy id wa      

 2  1      0  11348  20692  19116    0    0    0    0  34  48 21 15 64  0      

 2  1      0  10484  20904  19748    0    0    0    0 2066 4051 24 75  0  2      

 2  1      0  11600  19856  19788    0    0    0    0 1025 1929 54 46  0  1      

 3  1      0  11420  20952  18720    0    0    0    0 2125 4429 19 80  0  0      

 3  1      0  12336  19192  19212    0    0    0    0 2692 5784 12 89  0  0      

 2  0      0  11924  20196  18936    0    0    0    0 3155 6638 13 88  0  0      

 2  2      0  12464  19136  19136    0    0    0    0 3489 7050 16 84  0  0      

 3  0      0  11412  20532  19040    0    0    0    0 3010 5988  8 89  0  3      

 3  2      0  11696  19636  19608    0    0    0    0 3328 6629 16 77  7  2      

 4  1      0  11536  19728  19472    0    0    0    0 2401 5039 11 89  0  0      

 4  1      0  10804  19908  20040    0    0    0    0 2479 5168 15 85  0  0      

 4  1      0  9112  20804  20668    0    0    0    0 2563 5601  8 92  0  0      

 6  1      0  8200  21236  21164    0    0    0    0 2589 5649 11 90  0  0      

 5  0      0  8512  22340  19872    0    0    0    0 2501 5506  7 92  0  2      

 3  1      0  8180  21696  20968    0    0    0    0 2665 5737 23 72  2  3      

 3  2      0  7980  21304  21304    0    0    0    0 2407 5054 14 85  0  0      

 3  3      0  9792  21340  19500    0    0    0    0 2674 5802 11 85  0  4      

 1  1      0  9660  21008  19968    0    0    0    0 2640 5655  9 89  0  3      

 2  1      0  8476  21100  20956    0    0    0    0 2764 5816  7 90  0  3      

 3  1      0  7660  21448  21524    0    0    0    0 2524 5506 13 85  0  3      

 2  1      0  7924  21192  21404    0    0    0    0 2461 5371 11 90  0  0      

 5  1      0  9284  20644  20776    0    0    0    0 2321 4805 22 75  0  2      

 2  1      0  9860  20492  20424    0    0    0    0 2787 5648 14 86  0  0      

 5  1      0  10984  19928  19844    0    0    0    0 2929 5948 18 83  0  0

 4  1      0  11184  19856  19832    0    0    0    0 2924 5830 15 82  0  3

 2  2      0  11356  19748  19528    0    0    0    0 2618 5492  8 88  0  4

 3  1      0  11780  19704  19424    0    0    0    0 3391 6886  5 89  0  6

 2  2      0  11620  19524  19540    0    0    0    0 4273 8572 10 87  0  3

 1  1      0  12184  19356  19096    0    0    0    0 3911 8299  8 89  0  4

 3  1      0  10060  21092  19384    0    0    0    0 2725 5856  2 96  0  2

 3  1      0  8344  22336  19932    0    0    0    0 2725 5785  6 91  0  4

 4  2      0  7652  22336  20652    0    0    0    0 2683 6072  8 91  0  1

 6  1      0  6668  22480  21480    0    0    0    0 2732 5345  8 91  0  0

 6  1      0  5960  22420  22180    0    0    0    0 2480 5363 13 87  0  0

 5  1      0  7332  21800  21500    0    0    0    0 2676 5764 11 89  0  0

 4  2      0  6548  21980  21944    0    0    0    0 2595 5609 11 88  0  2

 4  1      0  6792  21916  21708    0    0    0    0 2716 5954  4 93  0  3

 4  1      0  6624  22180  21908    0    0    0    0 2782 6115  4 95  0  2

 4  1      0  6060  22444  22188    0    0    0    0 2639 5541 13 87  0  0

 3  1      0  7584  22016  21152    0    0    0    0 2531 5328 15 86  0  0

 5  1      0  6756  21964  21820    0    0    0    0 2578 5334 14 86  0  0

 5  1      0  6072  22336  22260    0    0    0    0 2457 5209 13 87  0  0

 4  1      0  5136  22852  22732    0    0    0    0 2420 5178 15 85  0  0

 4  1      0  10992  19920  19872    0    0    0    0 2258 4723 17 84  0  0

 4  1      0  9828  20384  20392    0    0    0    0 1851 3851 32 67  0  1

 4  1      0  11388  19488  19820    0    0    0    0 2316 4901 18 83  0  0

 3  1      0  12276  19012  19440    0    0    0    0 2908 6043 16 81  0  3

 4  1      0  13388  18392  18948    0    0    0    0 2870 5809 14 86  0  1

 2  1      0  15400  17708  17748    0    0    0    0 3771 7527 11 89  0  0

 2  1      0  14328  18548  18060    0    0    0    0 2673 5669 10 90  0  1

 2  0      0  14692  18080  17944    0    0    0    0 2653 5407 18 82  0  0

The next was made immediatelly after a restart of Apache with exactly the same request between line 5 and 10. It took about 10s which is much better:

Code:

~> vmstat -a 2

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

 r  b  swpd  free  inact active  si  so    bi    bo  in  cs us sy id wa

 0  0      0  17648  18612  15128    0    0    0    0  39  58 21 15 64  0

 1  0      0  16660  18612  15972    0    0    0    0 1434 2755 40 19 41  0

 0  0      0  17600  18612  15136    0    0    0    0  745 1388 51 25 25  0

 0  0      0  17608  18612  15136    0    0    0    0  460  900  8  9 84  0

 1  0      0  11248  19736  20364    0    0    0    0 1679 3191 52 26 22  0

 2  0      0  4304  20768  26204    0    0    0    0 1904 3622 57 35  8  0

 2  0      0  2560  21572  27124    0    0    0    0 1656 3183 59 41  1  0

 2  0      0  1708  23952  25532    0    0    0    0  525  923 74 27  0  0

 2  0      0  3584  26312  21284    0    0    0    0 3812 7528 18 69 10  4

 0  0      0  2236  26124  22756    0    0    0    0  814 1504 49 22 29  0

 0  0      0  2248  26160  22792    0    0    0    0  837 1625  2  3 95  0

 6  1      0  3264  25412  22460    0    0    0    0 2057 3989 23 30 35 11

Now a memory analysis during a running request, which should also cover a "free" tool call:

Code:

~>vmstat -s

        61540 K total memory

        57356 K used memory

        21888 K active memory

        25180 K inactive memory

        4184 K free memory

            0 K buffer memory

        21096 K swap cache

            0 K total swap

            0 K used swap

            0 K free swap

      9706553 non-nice user cpu ticks

            0 nice user cpu ticks

      6785115 system cpu ticks

    29369798 idle cpu ticks

      122582 IO-wait cpu ticks

          474 IRQ cpu ticks

        40820 softirq cpu ticks

            0 stolen cpu ticks

            0 pages paged in

            0 pages paged out

            0 pages swapped in

            0 pages swapped out

    405451122 interrupts

    801755817 CPU context switches

  1348837902 boot time

      1930240 forks

A "vmstat -d" running during a request only returns zeros
and "iostat" I can't find for my Debian distribution, it seems not to be available, also not in sysstat, but hopefully it's covered in the output above.

The /proc/interrupts look OK, the imx-i2c is due to a sensor polled regulary via i²c:

Code:

~> cat /proc/interrupts

          CPU0

  3:  305541991          -  imx-i2c

  4:          2          -  imx-i2c

  9:        11          -  sdhci

 24:          0          -  imx-keypad

 25:          0          -  rtc-mx25

 32:          2          -  IMX-uart

 33:  44406920          -  mxc_nd

 34:          1          -  mxc-sdma

 35:          0          -  ehci_hcd:usb1

 37:          0          -  fsl-usb2-udc

 40:          2          -  IMX-uart

 45:        71          -  IMX-uart

 54:  56444439          -  i.MX Timer Tick

 57:    1060771          -  fec

164:          0          -  ESDHCI card 0 detect

168:          1          -  phy_interrupt

Err:          0

Is anybody able to see the reason for the problem in above output???
What also is suspicious: I have several of these systems running, same hardware, same processes, hopefully identically installed (never 100% sure, because it is done manually), but this one is so slow by having such high %sy load...

Thanks a lot
Achim

As per the top output the load is not high & %sys utilization is very high if it is possible to you to install pkg in that box & get the sar report that will give you all essential date to fetch the problem

# iostat -kdx (disk read write performance check the (service time interval & utilization)

# sar -q (give u the load average as per the duration set in cron for the sar logs)

more
http://www.linuxquestions.org/questi...ck-4175427965/

may be required to fine tune the Apache

http://httpd.apache.org/docs/2.2/misc/perf-tuning.html

http://www.supportsages.com/blog/201...r-performance/

Meanwhile I came to the idea comparing an OK system with the NOK one, because they have really the same hardware, running the same software, only a different machine. Here is an output responding on exactly the same request within 6s, seen in log from line 4 to 6 marked in red:

Code:

~> vmstat -a 2

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

 r  b  swpd  free  inact active  si  so    bi    bo  in  cs us sy id wa

 1  0      0  2168  24504  22700    0    0    0    0    2    1 21 14 66  0

 0  0      0  2504  24508  22428    0    0    0    0 1274 2526 14 26 61  0

 1  0      0  2360  24508  22612    0    0    0    0  301  552  9  2 89  0

 1  0      0  1596  23896  23972    0    0    0    0 2673 5244 52 45  4  0

 2  0      0  2060  24520  22736    0    0    0    0 1813 3464 50 49  1  0

 3  0      0  5132  22968  21328    0    0    0    0  473  794 67 33  0  0

 1  0      0  4232  23032  22068    0    0    0    0 1284 2399 17 41 42  0

 0  0      0  7412  21340  20764    0    0    0    0 1761 3439 23 47 30  0

 0  0      0  7404  21356  20764    0    0    0    0 3494 6750  2 21 78  0

I was also able to install iostat, but without result. It seems that the kernel is not maintaining the statistics for the flash devices. So "iostat -kdx" shows nothing, even "iostat -kdx ALL" shows only 0s

sar was running successfully, but I don't see the high load there. In the output below the red colored times are during a slow access, see:

~

Code:

> sar -q 2                                  

Linux 2.6.31 (MyServer-1)        10/04/12        _armv5tejl_    (1 CPU)



15:04:42      runq-sz  plist-sz  ldavg-1  ldavg-5  ldavg-15

15:04:44            4        66      1.66      1.45      1.28

15:04:46            1        64      1.69      1.46      1.28

15:04:48            3        64      1.69      1.46      1.28

15:04:50            3        64      1.69      1.46      1.28

15:04:52            4        64      1.87      1.50      1.29

15:04:54            2        66      1.87      1.50      1.29

15:04:56            6        64      2.28      1.60      1.32

15:04:58            3        64      2.28      1.60      1.32

15:05:00            2        64      2.28      1.60      1.32

15:05:02            5        66      2.58      1.67      1.35

15:05:04            6        70      2.58      1.67      1.35

15:05:06            3        64      2.61      1.69      1.36

15:05:08            3        64      2.61      1.69      1.36

15:05:10            2        64      2.61      1.69      1.36

15:05:12            1        64      2.73      1.73      1.37

15:05:14            3        63      2.73      1.73      1.37

15:05:16            2        64      2.75      1.75      1.38

15:05:18            0        62      2.75      1.75      1.38

15:05:20            1        64      2.75      1.75      1.38

15:05:22            1        64      2.69      1.75      1.38

15:05:24            3        63      2.69      1.75      1.38

15:05:26            1        64      2.63      1.76      1.39

15:05:28            2        64      2.63      1.76      1.39

15:05:30            1        64      2.63      1.76      1.39

15:05:32            1        63      2.42      1.73      1.38

15:05:34            1        64      2.42      1.73      1.38

15:05:36            3        64      2.47      1.75      1.39

15:05:38            0        63      2.47      1.75      1.39

15:05:40            1        64      2.47      1.75      1.39

15:05:42            0        63      2.35      1.74      1.38

15:05:44            1        63      2.35      1.74      1.38

15:05:46            1        64      2.24      1.72      1.38

15:05:48            3        64      2.24      1.72      1.38

15:05:50            1        62      2.24      1.72      1.38

15:05:52            2        65      2.14      1.71      1.38

15:05:54            3        66      2.14      1.71      1.38

15:05:56            2        67      2.14      1.71      1.38

15:05:58            2        64      2.13      1.72      1.38

15:06:00            2        64      2.13      1.72      1.38

15:06:02            4        66      2.28      1.75      1.40

15:06:04            1        66      2.28      1.75      1.40

15:06:06            1        66      2.18      1.74      1.39

So I still struggle totally...

Not sure I would recommend running apache with php on 64M RAM??? Certainly with no swap available!!

Hmmm, it is running on some other systems with 64MB well (so at least much faster). If I look to vmstat I see plenty of inactive and free memory...

Usually there is only 1 user logged in, the parameter MaxCients is configured away from default 150 to 4 only, the StartServer is reduced to 2. Apache is what I know best, that's the reason why I started with it also on the small embedded machine (400MHz, 64MB RAM). You are right, maybe it is time now to switch to a more lightweight server like LIGHTTPD.
But I have th gut-feelingm that there is another problem laying below, because on other systems it is running well. Maybe with the reduced ressource need of LIGHTTPD the problem is only delayed by some days and at the end being on the same state like now :-(

Edit: It seems the problem is solved. By accident I saw three processes in top running with NICE=-6
These self-made processes run regularly and often need top cpu resources. After chenging the back to -1 to be still a little bit better prioritized, everything works fine. Now the %sy is down to about 10 and the system is much more responsive again.