high load average, low cpu usage !
hi,
recently, one of my centOS 4.4 server got slow, and I found it has high load average, low cpu usage, I din't find any specific process eating cpu or memory, anyway to find what's going on with this server? top - 11:43:30 up 399 days, 16:37, 4 users, load average: 7.25, 7.26, 7.27 Tasks: 116 total, 2 running, 113 sleeping, 0 stopped, 1 zombie Cpu(s): 0.2% us, 0.7% sy, 0.0% ni, 97.0% id, 2.2% wa, 0.0% hi, 0.0% si Mem: 1034728k total, 993380k used, 41348k free, 36172k buffers Swap: 1204864k total, 696488k used, 508376k free, 261064k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7389 root 17 0 2952 960 744 R 0.7 0.1 0:01.09 top 1 root 15 0 2792 440 408 S 0.0 0.0 11:56.27 init 2 root RT 0 0 0 0 S 0.0 0.0 2:54.24 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 1:05.87 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 83:42.93 migration/1 5 root 34 19 0 0 0 S 0.0 0.0 1:07.21 ksoftirqd/1 6 root 5 -10 0 0 0 S 0.0 0.0 0:03.50 events/0 7 root 5 -10 0 0 0 S 0.0 0.0 0:01.87 events/1 8 root 5 -10 0 0 0 S 0.0 0.0 0:00.01 khelper 9 root 15 -10 0 0 0 S 0.0 0.0 0:00.00 kacpid [root@localhost ~]# vmstat 1 10 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 696488 40636 36332 261424 0 0 0 1 0 0 4 1 93 2 0 0 696488 40636 36336 261420 0 0 0 12 1021 1221 0 0 99 1 0 0 696488 40780 36336 261420 0 0 0 0 1027 1234 0 1 100 0 0 1 696488 40780 36344 261412 0 0 0 24 1171 1304 1 0 97 3 0 0 696488 40780 36344 261412 0 0 0 0 1028 1179 0 0 100 0 1 1 696488 40780 36352 261404 0 0 0 44 1012 1193 0 0 100 1 0 0 696488 40780 36352 261404 0 0 0 76 1030 1259 0 0 96 4 0 0 696488 40780 36352 261404 0 0 0 20 1014 1214 0 0 100 0 0 0 696488 40780 36352 261404 0 0 0 0 1177 1340 0 0 100 0 0 0 696488 40780 36352 261404 0 0 0 0 1008 1206 0 0 100 0 [root@localhost ~]# free -m total used free shared buffers cached Mem: 1010 970 39 0 35 255 -/+ buffers/cache: 679 330 Swap: 1176 680 496 thanks in adavance! jimmy |
What is the output df -h on your drive(s)?
Kind of curious what that zombied process is, er was? |
Probably tasks waiting for I/O - that's quite a bit of swap in use. Interferes with (normal) I/O when it has to go to disk. Try this to find out how many tasks you have (probably) waiting on I/O
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' |
[root@localhost ~]# df -h
Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.9G 3.9G 725M 85% / none 506M 0 506M 0% /dev/shm /dev/sdb1 9.7G 6.2G 3.0G 68% /disk01 /dev/sdb3 3.2G 2.9G 191M 94% /disk02 /dev/sda2 2.5G 1.8G 507M 79% /home /dev/sdb2 3.9G 3.2G 469M 88% /var 192.168.1.102:/opt/backup/jira/backup/110 27G 4.5G 22G 18% /var/backup/jira [root@localhost ~]# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' top - 16:52:18 up 399 days, 21:46, 4 users, load average: 7.43, 8.11, 8.36 Tasks: 126 total, 1 running, 124 sleeping, 0 stopped, 1 zombie Cpu(s): 4.1% us, 1.2% sy, 0.0% ni, 92.8% id, 1.7% wa, 0.0% hi, 0.1% si Mem: 1034728k total, 955824k used, 78904k free, 20300k buffers Swap: 1204864k total, 714892k used, 489972k free, 262452k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8424 root 18 0 4080 376 372 D 0.0 0.0 0:00.05 du 9303 root 18 0 3608 376 372 D 0.0 0.0 0:00.04 du 17062 root 18 0 4808 376 372 D 0.0 0.0 0:00.10 du 28209 root 18 0 5388 412 408 D 0.0 0.0 0:00.08 find 11535 nmai 17 0 4920 428 424 D 0.0 0.0 0:00.06 find 20033 root 18 0 5572 452 448 D 0.0 0.0 0:00.05 find 1363 root 18 0 4760 552 552 D 0.0 0.1 0:00.05 find Total status D: 7 |
[root@localhost ~]# kill -9 1363
[root@localhost ~]# kill -9 1363 [root@localhost ~]# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' top - 17:04:19 up 399 days, 21:58, 4 users, load average: 7.19, 7.39, 7.79 Tasks: 126 total, 1 running, 124 sleeping, 0 stopped, 1 zombie Cpu(s): 4.1% us, 1.2% sy, 0.0% ni, 92.8% id, 1.7% wa, 0.0% hi, 0.1% si Mem: 1034728k total, 971024k used, 63704k free, 25532k buffers Swap: 1204864k total, 713696k used, 491168k free, 265760k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8424 root 18 0 4080 376 372 D 0.0 0.0 0:00.05 du 9303 root 18 0 3608 376 372 D 0.0 0.0 0:00.04 du 17062 root 18 0 4808 376 372 D 0.0 0.0 0:00.10 du 28209 root 18 0 5388 412 408 D 0.0 0.0 0:00.08 find 11535 nmai 17 0 4920 428 424 D 0.0 0.0 0:00.06 find 20033 root 18 0 5572 452 448 D 0.0 0.0 0:00.05 find 1363 root 18 0 4760 552 552 D 0.0 0.1 0:00.05 find Total status D: 7 I even can't kill them, what I can do? I don't want to restart server. |
If you have "sysstat" you can see all this info in almost one go. For more details :
#man sar |
[root@localhost ~]# sar
Linux 2.6.9-5.0.3.ELsmp (localhost.localdomain) 01/23/2008 12:00:02 AM CPU %user %nice %system %iowait %idle 12:10:02 AM all 3.50 0.01 1.94 1.54 93.01 12:20:02 AM all 3.57 0.00 1.94 1.44 93.05 12:30:01 AM all 3.22 0.00 1.94 1.43 93.41 12:40:01 AM all 3.21 0.00 1.93 1.41 93.45 12:50:01 AM all 3.57 0.00 1.93 1.41 93.09 01:00:01 AM all 3.48 0.01 1.89 1.40 93.22 01:10:01 AM all 3.46 0.00 1.94 1.32 93.27 01:20:01 AM all 3.58 0.00 1.92 1.41 93.08 01:30:01 AM all 3.52 0.01 1.97 1.41 93.10 01:40:01 AM all 3.59 0.01 1.94 1.50 92.96 01:50:01 AM all 3.47 0.00 2.00 1.36 93.17 02:00:01 AM all 3.33 0.00 1.92 1.37 93.38 02:10:01 AM all 3.24 0.01 1.95 1.42 93.38 |
As syg00 pointed out, over half your swap is being used.. that's not good. If it's always at that sort of level, adding more RAM is indicated. Functionally, swap is just a temp extension of RAM, it should normally (for decent perf) be mostly unused.
From your posts there, swap usage is slowly increaseing .... You've got several disks there that are nearly full; you need to purge unwanted stuff and/or backup & remove some stuff before they fill up. Alternately, add more disks. |
Quote:
|
All times are GMT -5. The time now is 09:19 PM. |