Code:
top - 14:14:52 up 322 days, 20:24, 2 users, load average: 46.25, 27.48, 13.28
Tasks: 341 total, 1 running, 340 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.7%us, 0.4%sy, 0.0%ni, 49.6%id, 48.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4146852k total, 3987340k used, 159512k free, 103432k buffers
Swap: 2096472k total, 34224k used, 2062248k free, 3147252k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29654 popuser 20 0 5888 2304 684 S 5 0.1 0:00.14 pop3d
25595 popuser 20 0 9556 3536 924 D 3 0.1 0:20.66 imapd
25562 popuser 20 0 9556 3532 928 D 2 0.1 0:19.05 imapd
24225 popuser 20 0 11424 5480 928 D 1 0.1 0:24.71 imapd
8956 root 20 0 2404 1188 796 R 0 0.0 0:06.32 top
1 root 20 0 2112 632 544 S 0 0.0 1:45.37 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.07 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:09.84 migration/0
4 root 15 -5 0 0 0 S 0 0.0 14:34.70 ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:03.75 watchdog/0
6 root RT -5 0 0 0 S 0 0.0 0:47.06 migration/1
7 root 15 -5 0 0 0 S 0 0.0 14:10.59 ksoftirqd/1
At the time of the above top, netstat reported:
16 Established :25 connections
16 Established :993 connections
5 Established :110 connections
The server is a dual xeon 8-core with 4gb of ram.
We've got a new problem with our mail server where we have load spikes that reach 50-70 (usual load is 2-3)
During those load spikes the server becomes unresponsive to SNMP requests and times out for any new mail connections and our users have been complaining about that.
How would you verify what is causing this high load? What should I start with?