LinuxQuestions.org - high load average, low cpu usage !

- Red Hat (https://www.linuxquestions.org/questions/red-hat-31/)

- - high load average, low cpu usage ! (https://www.linuxquestions.org/questions/red-hat-31/high-load-average-low-cpu-usage-615506/)

high load average, low cpu usage !

hi,
recently, one of my centOS 4.4 server got slow, and I found it has high load average, low cpu usage, I din't find any specific process eating cpu or memory, anyway to find what's going on with this server?

top - 11:43:30 up 399 days, 16:37, 4 users, load average: 7.25, 7.26, 7.27
Tasks: 116 total, 2 running, 113 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.2% us, 0.7% sy, 0.0% ni, 97.0% id, 2.2% wa, 0.0% hi, 0.0% si
Mem: 1034728k total, 993380k used, 41348k free, 36172k buffers
Swap: 1204864k total, 696488k used, 508376k free, 261064k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7389 root 17 0 2952 960 744 R 0.7 0.1 0:01.09 top
1 root 15 0 2792 440 408 S 0.0 0.0 11:56.27 init
2 root RT 0 0 0 0 S 0.0 0.0 2:54.24 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 1:05.87 ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 83:42.93 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 1:07.21 ksoftirqd/1
6 root 5 -10 0 0 0 S 0.0 0.0 0:03.50 events/0
7 root 5 -10 0 0 0 S 0.0 0.0 0:01.87 events/1
8 root 5 -10 0 0 0 S 0.0 0.0 0:00.01 khelper
9 root 15 -10 0 0 0 S 0.0 0.0 0:00.00 kacpid

[root@localhost ~]# vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 696488 40636 36332 261424 0 0 0 1 0 0 4 1 93 2
0 0 696488 40636 36336 261420 0 0 0 12 1021 1221 0 0 99 1
0 0 696488 40780 36336 261420 0 0 0 0 1027 1234 0 1 100 0
0 1 696488 40780 36344 261412 0 0 0 24 1171 1304 1 0 97 3
0 0 696488 40780 36344 261412 0 0 0 0 1028 1179 0 0 100 0
1 1 696488 40780 36352 261404 0 0 0 44 1012 1193 0 0 100 1
0 0 696488 40780 36352 261404 0 0 0 76 1030 1259 0 0 96 4
0 0 696488 40780 36352 261404 0 0 0 20 1014 1214 0 0 100 0
0 0 696488 40780 36352 261404 0 0 0 0 1177 1340 0 0 100 0
0 0 696488 40780 36352 261404 0 0 0 0 1008 1206 0 0 100 0

[root@localhost ~]# free -m
total used free shared buffers cached
Mem: 1010 970 39 0 35 255
-/+ buffers/cache: 679 330
Swap: 1176 680 496

thanks in adavance!
jimmy

What is the output df -h on your drive(s)?

Kind of curious what that zombied process is, er was?

Probably tasks waiting for I/O - that's quite a bit of swap in use. Interferes with (normal) I/O when it has to go to disk. Try this to find out how many tasks you have (probably) waiting on I/O

Code:

top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'

[root@localhost ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 4.9G 3.9G 725M 85% /
none 506M 0 506M 0% /dev/shm
/dev/sdb1 9.7G 6.2G 3.0G 68% /disk01
/dev/sdb3 3.2G 2.9G 191M 94% /disk02
/dev/sda2 2.5G 1.8G 507M 79% /home
/dev/sdb2 3.9G 3.2G 469M 88% /var
192.168.1.102:/opt/backup/jira/backup/110
27G 4.5G 22G 18% /var/backup/jira

[root@localhost ~]# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
top - 16:52:18 up 399 days, 21:46, 4 users, load average: 7.43, 8.11, 8.36
Tasks: 126 total, 1 running, 124 sleeping, 0 stopped, 1 zombie
Cpu(s): 4.1% us, 1.2% sy, 0.0% ni, 92.8% id, 1.7% wa, 0.0% hi, 0.1% si
Mem: 1034728k total, 955824k used, 78904k free, 20300k buffers
Swap: 1204864k total, 714892k used, 489972k free, 262452k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8424 root 18 0 4080 376 372 D 0.0 0.0 0:00.05 du
9303 root 18 0 3608 376 372 D 0.0 0.0 0:00.04 du
17062 root 18 0 4808 376 372 D 0.0 0.0 0:00.10 du
28209 root 18 0 5388 412 408 D 0.0 0.0 0:00.08 find
11535 nmai 17 0 4920 428 424 D 0.0 0.0 0:00.06 find
20033 root 18 0 5572 452 448 D 0.0 0.0 0:00.05 find
1363 root 18 0 4760 552 552 D 0.0 0.1 0:00.05 find
Total status D: 7

[root@localhost ~]# kill -9 1363
[root@localhost ~]# kill -9 1363
[root@localhost ~]# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
top - 17:04:19 up 399 days, 21:58, 4 users, load average: 7.19, 7.39, 7.79
Tasks: 126 total, 1 running, 124 sleeping, 0 stopped, 1 zombie
Cpu(s): 4.1% us, 1.2% sy, 0.0% ni, 92.8% id, 1.7% wa, 0.0% hi, 0.1% si
Mem: 1034728k total, 971024k used, 63704k free, 25532k buffers
Swap: 1204864k total, 713696k used, 491168k free, 265760k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8424 root 18 0 4080 376 372 D 0.0 0.0 0:00.05 du
9303 root 18 0 3608 376 372 D 0.0 0.0 0:00.04 du
17062 root 18 0 4808 376 372 D 0.0 0.0 0:00.10 du
28209 root 18 0 5388 412 408 D 0.0 0.0 0:00.08 find
11535 nmai 17 0 4920 428 424 D 0.0 0.0 0:00.06 find
20033 root 18 0 5572 452 448 D 0.0 0.0 0:00.05 find
1363 root 18 0 4760 552 552 D 0.0 0.1 0:00.05 find
Total status D: 7

I even can't kill them, what I can do? I don't want to restart server.

If you have "sysstat" you can see all this info in almost one go. For more details :

#man sar

[root@localhost ~]# sar
Linux 2.6.9-5.0.3.ELsmp (localhost.localdomain) 01/23/2008

12:00:02 AM CPU %user %nice %system %iowait %idle
12:10:02 AM all 3.50 0.01 1.94 1.54 93.01
12:20:02 AM all 3.57 0.00 1.94 1.44 93.05
12:30:01 AM all 3.22 0.00 1.94 1.43 93.41
12:40:01 AM all 3.21 0.00 1.93 1.41 93.45
12:50:01 AM all 3.57 0.00 1.93 1.41 93.09
01:00:01 AM all 3.48 0.01 1.89 1.40 93.22
01:10:01 AM all 3.46 0.00 1.94 1.32 93.27
01:20:01 AM all 3.58 0.00 1.92 1.41 93.08
01:30:01 AM all 3.52 0.01 1.97 1.41 93.10
01:40:01 AM all 3.59 0.01 1.94 1.50 92.96
01:50:01 AM all 3.47 0.00 2.00 1.36 93.17
02:00:01 AM all 3.33 0.00 1.92 1.37 93.38
02:10:01 AM all 3.24 0.01 1.95 1.42 93.38

As syg00 pointed out, over half your swap is being used.. that's not good. If it's always at that sort of level, adding more RAM is indicated. Functionally, swap is just a temp extension of RAM, it should normally (for decent perf) be mostly unused.
From your posts there, swap usage is slowly increaseing ....

You've got several disks there that are nearly full; you need to purge unwanted stuff and/or backup & remove some stuff before they fill up.
Alternately, add more disks.

Quote:

Originally Posted by jimmyjiang (Post 3031790)

Unfortunately if the process is in a "D" dead state, you have to reboot the machine :( No kill command will remove it.