| Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
10-01-2012, 06:14 PM
|
#1
|
|
Member
Registered: Mar 2005
Posts: 36
Rep:
|
HELP: system load on server is very high
My system responds very slow on Apache requests, as far I analyzed top-command it have something to do with high system CPU usage - but how to analyze this further?
Here is the top-output during a standard (small and simple) http request. This runs on another server within 2s, but on the problematic ones it takes minutes:
Code:
top - 01:06:10 up 3 days, 9:54, 3 users, load average: 2.02, 1.89, 2.28
Tasks: 55 total, 2 running, 53 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.6%us, 83.4%sy, 0.0%ni, 0.2%id, 2.5%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 61540k total, 58832k used, 2708k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 17632k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28536 www-data 20 0 46956 11m 5128 R 12.6 19.2 0:18.61 apache2
31953 root 20 0 2820 752 664 S 8.7 1.2 0:00.42 wget
9370 root 20 0 2536 692 476 R 6.4 1.1 2:32.23 top
6412 root 20 0 8920 800 304 S 4.3 1.3 0:15.25 sshd
31951 root 20 0 1664 492 436 S 2.9 0.8 0:00.20 sh
The server is a tiny embedded single-core ARM, but this shouldn't matter. On some other servers with identical hardware, the same request runs fine.
The distri ist Debian Squeeze.
I have the gut-feeling that the high system-load is somehow caused by Apache, but can't strengthen this thesis. The apache error.log shows nothing...
How to find the process causing this??
Edit: The best explanation for high %sy in top I found is:
"having higher numbers here may indicate a problem with kernel configs, a driver issue, or any number of other things" here
But this don't help...
Thanks
Achim
Last edited by AchimRS; 10-01-2012 at 07:44 PM.
|
|
|
|
10-02-2012, 04:02 AM
|
#2
|
|
Member
Registered: Dec 2006
Distribution: RHEL Debian
Posts: 41
Rep:
|
Is the load average consistently high?
Can you post the output of some other performance tools? vmstat, iostat, free etc
|
|
|
|
10-02-2012, 04:24 AM
|
#3
|
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 11,288
|
Quote:
Originally Posted by AchimRS
I have the gut-feeling that the high system-load is somehow caused by Apache, but can't strengthen this thesis. The apache error.log shows nothing...
|
Unlikely. I'd be guessing a driver issue.
Very hard to track down - never looked at embedded. Have a look at /proc/interrupts for hints on what may be playing up.
vmstat might indicate abnormal context switches - also a clue that driver/interrupt is the problem.
|
|
|
|
10-03-2012, 04:46 PM
|
#4
|
|
Member
Registered: Mar 2005
Posts: 36
Original Poster
Rep:
|
Hi all,
the load is not consistently high, it alternates between 2%sy and 90%sy and it looks like it somehow depends from the apache process.
If it is restarted, the %sy is very low for a while, after some hours of requests to apache it will come up more and more on each request and therefore the response become much slower - until it is uselsess. A simple apache restart will than bring back the system to be responsive again... so i Guess it have something to do with PHP or Apache.
Thanks for the good hint with vmstat, here is some output:
The first output was made after Apache processes have been running since several days during a single request which took about 100s:
Code:
~>vmstat -a 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free inact active si so bi bo in cs us sy id wa
2 1 0 11348 20692 19116 0 0 0 0 34 48 21 15 64 0
2 1 0 10484 20904 19748 0 0 0 0 2066 4051 24 75 0 2
2 1 0 11600 19856 19788 0 0 0 0 1025 1929 54 46 0 1
3 1 0 11420 20952 18720 0 0 0 0 2125 4429 19 80 0 0
3 1 0 12336 19192 19212 0 0 0 0 2692 5784 12 89 0 0
2 0 0 11924 20196 18936 0 0 0 0 3155 6638 13 88 0 0
2 2 0 12464 19136 19136 0 0 0 0 3489 7050 16 84 0 0
3 0 0 11412 20532 19040 0 0 0 0 3010 5988 8 89 0 3
3 2 0 11696 19636 19608 0 0 0 0 3328 6629 16 77 7 2
4 1 0 11536 19728 19472 0 0 0 0 2401 5039 11 89 0 0
4 1 0 10804 19908 20040 0 0 0 0 2479 5168 15 85 0 0
4 1 0 9112 20804 20668 0 0 0 0 2563 5601 8 92 0 0
6 1 0 8200 21236 21164 0 0 0 0 2589 5649 11 90 0 0
5 0 0 8512 22340 19872 0 0 0 0 2501 5506 7 92 0 2
3 1 0 8180 21696 20968 0 0 0 0 2665 5737 23 72 2 3
3 2 0 7980 21304 21304 0 0 0 0 2407 5054 14 85 0 0
3 3 0 9792 21340 19500 0 0 0 0 2674 5802 11 85 0 4
1 1 0 9660 21008 19968 0 0 0 0 2640 5655 9 89 0 3
2 1 0 8476 21100 20956 0 0 0 0 2764 5816 7 90 0 3
3 1 0 7660 21448 21524 0 0 0 0 2524 5506 13 85 0 3
2 1 0 7924 21192 21404 0 0 0 0 2461 5371 11 90 0 0
5 1 0 9284 20644 20776 0 0 0 0 2321 4805 22 75 0 2
2 1 0 9860 20492 20424 0 0 0 0 2787 5648 14 86 0 0
5 1 0 10984 19928 19844 0 0 0 0 2929 5948 18 83 0 0
4 1 0 11184 19856 19832 0 0 0 0 2924 5830 15 82 0 3
2 2 0 11356 19748 19528 0 0 0 0 2618 5492 8 88 0 4
3 1 0 11780 19704 19424 0 0 0 0 3391 6886 5 89 0 6
2 2 0 11620 19524 19540 0 0 0 0 4273 8572 10 87 0 3
1 1 0 12184 19356 19096 0 0 0 0 3911 8299 8 89 0 4
3 1 0 10060 21092 19384 0 0 0 0 2725 5856 2 96 0 2
3 1 0 8344 22336 19932 0 0 0 0 2725 5785 6 91 0 4
4 2 0 7652 22336 20652 0 0 0 0 2683 6072 8 91 0 1
6 1 0 6668 22480 21480 0 0 0 0 2732 5345 8 91 0 0
6 1 0 5960 22420 22180 0 0 0 0 2480 5363 13 87 0 0
5 1 0 7332 21800 21500 0 0 0 0 2676 5764 11 89 0 0
4 2 0 6548 21980 21944 0 0 0 0 2595 5609 11 88 0 2
4 1 0 6792 21916 21708 0 0 0 0 2716 5954 4 93 0 3
4 1 0 6624 22180 21908 0 0 0 0 2782 6115 4 95 0 2
4 1 0 6060 22444 22188 0 0 0 0 2639 5541 13 87 0 0
3 1 0 7584 22016 21152 0 0 0 0 2531 5328 15 86 0 0
5 1 0 6756 21964 21820 0 0 0 0 2578 5334 14 86 0 0
5 1 0 6072 22336 22260 0 0 0 0 2457 5209 13 87 0 0
4 1 0 5136 22852 22732 0 0 0 0 2420 5178 15 85 0 0
4 1 0 10992 19920 19872 0 0 0 0 2258 4723 17 84 0 0
4 1 0 9828 20384 20392 0 0 0 0 1851 3851 32 67 0 1
4 1 0 11388 19488 19820 0 0 0 0 2316 4901 18 83 0 0
3 1 0 12276 19012 19440 0 0 0 0 2908 6043 16 81 0 3
4 1 0 13388 18392 18948 0 0 0 0 2870 5809 14 86 0 1
2 1 0 15400 17708 17748 0 0 0 0 3771 7527 11 89 0 0
2 1 0 14328 18548 18060 0 0 0 0 2673 5669 10 90 0 1
2 0 0 14692 18080 17944 0 0 0 0 2653 5407 18 82 0 0
The next was made immediatelly after a restart of Apache with exactly the same request between line 5 and 10. It took about 10s which is much better:
Code:
~> vmstat -a 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free inact active si so bi bo in cs us sy id wa
0 0 0 17648 18612 15128 0 0 0 0 39 58 21 15 64 0
1 0 0 16660 18612 15972 0 0 0 0 1434 2755 40 19 41 0
0 0 0 17600 18612 15136 0 0 0 0 745 1388 51 25 25 0
0 0 0 17608 18612 15136 0 0 0 0 460 900 8 9 84 0
1 0 0 11248 19736 20364 0 0 0 0 1679 3191 52 26 22 0
2 0 0 4304 20768 26204 0 0 0 0 1904 3622 57 35 8 0
2 0 0 2560 21572 27124 0 0 0 0 1656 3183 59 41 1 0
2 0 0 1708 23952 25532 0 0 0 0 525 923 74 27 0 0
2 0 0 3584 26312 21284 0 0 0 0 3812 7528 18 69 10 4
0 0 0 2236 26124 22756 0 0 0 0 814 1504 49 22 29 0
0 0 0 2248 26160 22792 0 0 0 0 837 1625 2 3 95 0
6 1 0 3264 25412 22460 0 0 0 0 2057 3989 23 30 35 11
Now a memory analysis during a running request, which should also cover a "free" tool call:
Code:
~>vmstat -s
61540 K total memory
57356 K used memory
21888 K active memory
25180 K inactive memory
4184 K free memory
0 K buffer memory
21096 K swap cache
0 K total swap
0 K used swap
0 K free swap
9706553 non-nice user cpu ticks
0 nice user cpu ticks
6785115 system cpu ticks
29369798 idle cpu ticks
122582 IO-wait cpu ticks
474 IRQ cpu ticks
40820 softirq cpu ticks
0 stolen cpu ticks
0 pages paged in
0 pages paged out
0 pages swapped in
0 pages swapped out
405451122 interrupts
801755817 CPU context switches
1348837902 boot time
1930240 forks
A "vmstat -d" running during a request only returns zeros
and "iostat" I can't find for my Debian distribution, it seems not to be available, also not in sysstat, but hopefully it's covered in the output above.
The /proc/interrupts look OK, the imx-i2c is due to a sensor polled regulary via i²c:
Code:
~> cat /proc/interrupts
CPU0
3: 305541991 - imx-i2c
4: 2 - imx-i2c
9: 11 - sdhci
24: 0 - imx-keypad
25: 0 - rtc-mx25
32: 2 - IMX-uart
33: 44406920 - mxc_nd
34: 1 - mxc-sdma
35: 0 - ehci_hcd:usb1
37: 0 - fsl-usb2-udc
40: 2 - IMX-uart
45: 71 - IMX-uart
54: 56444439 - i.MX Timer Tick
57: 1060771 - fec
164: 0 - ESDHCI card 0 detect
168: 1 - phy_interrupt
Err: 0
Is anybody able to see the reason for the problem in above output???
What also is suspicious: I have several of these systems running, same hardware, same processes, hopefully identically installed (never 100% sure, because it is done manually), but this one is so slow by having such high %sy load...
Thanks a lot
Achim
|
|
|
|
10-04-2012, 08:13 AM
|
#6
|
|
Member
Registered: Mar 2005
Posts: 36
Original Poster
Rep:
|
Meanwhile I came to the idea comparing an OK system with the NOK one, because they have really the same hardware, running the same software, only a different machine. Here is an output responding on exactly the same request within 6s, seen in log from line 4 to 6 marked in red:
Code:
~> vmstat -a 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free inact active si so bi bo in cs us sy id wa
1 0 0 2168 24504 22700 0 0 0 0 2 1 21 14 66 0
0 0 0 2504 24508 22428 0 0 0 0 1274 2526 14 26 61 0
1 0 0 2360 24508 22612 0 0 0 0 301 552 9 2 89 0
1 0 0 1596 23896 23972 0 0 0 0 2673 5244 52 45 4 0
2 0 0 2060 24520 22736 0 0 0 0 1813 3464 50 49 1 0
3 0 0 5132 22968 21328 0 0 0 0 473 794 67 33 0 0
1 0 0 4232 23032 22068 0 0 0 0 1284 2399 17 41 42 0
0 0 0 7412 21340 20764 0 0 0 0 1761 3439 23 47 30 0
0 0 0 7404 21356 20764 0 0 0 0 3494 6750 2 21 78 0
I was also able to install iostat, but without result. It seems that the kernel is not maintaining the statistics for the flash devices. So "iostat -kdx" shows nothing, even "iostat -kdx ALL" shows only 0s
sar was running successfully, but I don't see the high load there. In the output below the red colored times are during a slow access, see:
~
Code:
> sar -q 2
Linux 2.6.31 (MyServer-1) 10/04/12 _armv5tejl_ (1 CPU)
15:04:42 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
15:04:44 4 66 1.66 1.45 1.28
15:04:46 1 64 1.69 1.46 1.28
15:04:48 3 64 1.69 1.46 1.28
15:04:50 3 64 1.69 1.46 1.28
15:04:52 4 64 1.87 1.50 1.29
15:04:54 2 66 1.87 1.50 1.29
15:04:56 6 64 2.28 1.60 1.32
15:04:58 3 64 2.28 1.60 1.32
15:05:00 2 64 2.28 1.60 1.32
15:05:02 5 66 2.58 1.67 1.35
15:05:04 6 70 2.58 1.67 1.35
15:05:06 3 64 2.61 1.69 1.36
15:05:08 3 64 2.61 1.69 1.36
15:05:10 2 64 2.61 1.69 1.36
15:05:12 1 64 2.73 1.73 1.37
15:05:14 3 63 2.73 1.73 1.37
15:05:16 2 64 2.75 1.75 1.38
15:05:18 0 62 2.75 1.75 1.38
15:05:20 1 64 2.75 1.75 1.38
15:05:22 1 64 2.69 1.75 1.38
15:05:24 3 63 2.69 1.75 1.38
15:05:26 1 64 2.63 1.76 1.39
15:05:28 2 64 2.63 1.76 1.39
15:05:30 1 64 2.63 1.76 1.39
15:05:32 1 63 2.42 1.73 1.38
15:05:34 1 64 2.42 1.73 1.38
15:05:36 3 64 2.47 1.75 1.39
15:05:38 0 63 2.47 1.75 1.39
15:05:40 1 64 2.47 1.75 1.39
15:05:42 0 63 2.35 1.74 1.38
15:05:44 1 63 2.35 1.74 1.38
15:05:46 1 64 2.24 1.72 1.38
15:05:48 3 64 2.24 1.72 1.38
15:05:50 1 62 2.24 1.72 1.38
15:05:52 2 65 2.14 1.71 1.38
15:05:54 3 66 2.14 1.71 1.38
15:05:56 2 67 2.14 1.71 1.38
15:05:58 2 64 2.13 1.72 1.38
15:06:00 2 64 2.13 1.72 1.38
15:06:02 4 66 2.28 1.75 1.40
15:06:04 1 66 2.28 1.75 1.40
15:06:06 1 66 2.18 1.74 1.39
So I still struggle totally...
Last edited by AchimRS; 10-04-2012 at 08:20 AM.
|
|
|
|
10-04-2012, 08:17 AM
|
#7
|
|
Member
Registered: Dec 2006
Distribution: RHEL Debian
Posts: 41
Rep:
|
Not sure I would recommend running apache with php on 64M RAM??? Certainly with no swap available!!
|
|
|
|
10-04-2012, 09:03 AM
|
#8
|
|
Member
Registered: Mar 2005
Posts: 36
Original Poster
Rep:
|
Hmmm, it is running on some other systems with 64MB well (so at least much faster). If I look to vmstat I see plenty of inactive and free memory...
Usually there is only 1 user logged in, the parameter MaxCients is configured away from default 150 to 4 only, the StartServer is reduced to 2. Apache is what I know best, that's the reason why I started with it also on the small embedded machine (400MHz, 64MB RAM). You are right, maybe it is time now to switch to a more lightweight server like LIGHTTPD.
But I have th gut-feelingm that there is another problem laying below, because on other systems it is running well. Maybe with the reduced ressource need of LIGHTTPD the problem is only delayed by some days and at the end being on the same state like now :-(
Edit: It seems the problem is solved. By accident I saw three processes in top running with NICE=-6
These self-made processes run regularly and often need top cpu resources. After chenging the back to -1 to be still a little bit better prioritized, everything works fine. Now the %sy is down to about 10 and the system is much more responsive again.
Last edited by AchimRS; 10-06-2012 at 05:34 PM.
Reason: Solved
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 03:17 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|