Hi everyone,
Just wanted to start off with a hello, I'm new to the community and looking for help, but I'm hoping to be able to contribute wherever I can.
The problem in short:
Our IMAP server is having some serious iowait issues, and I am completely stumped as far as what may be causing it.
The detailed versoin:
I help maintain an IMAP server on a medium sized campus. We have around 12,000 active inboxes, and we also have 2 load-balanced web servers which provide a webmail frontend. The IMAP server is a quad-Xeon 2.83 Ghz with 8 Gigs of ram (it's quite a beast) running Rhel3 with the 2.4.21-32.0.1.ELsmp kernel.
We use OpenLDAP for authentication, which resides on a seperate box, and we also have a SpamAssasin box located in front of our mailserver. Neither of these boxes seem to have any significant issues, as they've been purring away for over a year with barely any load.
The issue (I'm guessing) resides with our iSCSI SAN. Our e-mail is all stored out to a 4TB Lefthand iSCSI SAN, which is running about 70% full (mostly because of snapshots used for recent backups, etc). For the life of me, I've been unable to come up with anything that may have changed recently, but in the past few days, we're seeing the load average yo-yo from in the teens to over 300.
This iowait concerns me quite a bit:
Code:
CPU states: cpu user nice system irq softirq iowait idle
total 2.5% 0.0% 6.3% 0.0% 2.4% 78.6% 10.0%
cpu00 0.5% 0.0% 7.7% 0.0% 1.9% 89.6% 0.0%
cpu01 1.1% 0.0% 5.0% 0.0% 0.5% 93.3% 0.0%
cpu02 3.3% 0.0% 4.7% 0.0% 3.6% 88.2% 0.0%
cpu03 1.1% 0.0% 8.4% 0.0% 3.0% 87.3% 0.0%
cpu04 5.3% 0.0% 6.4% 0.0% 7.8% 80.1% 0.2%
cpu05 1.4% 0.0% 3.3% 0.0% 1.6% 92.7% 0.8%
cpu06 2.2% 0.0% 5.2% 0.0% 0.5% 52.0% 39.8%
cpu07 5.5% 0.0% 9.7% 0.0% 0.0% 45.6% 38.9%
Mem: 8202488k av, 8184952k used, 17536k free, 0k shrd, 13636k buff
6384168k actv, 1225644k in_d, 136724k in_c
Swap: 16386292k av, 195568k used, 16190724k free 6888660k cached
As you can see, most processors are in the neighborhood of 80%. The system seems to be swapping out quite a few of the iscsi initiator processes, despite them running with -20 priority.
I'm at my wits end, so any insight would be much appreciated. Let me know what information I've left out. Thanks!