Freezing kernel 220.127.116.11 on HP DL380 G3 (high load kswapd)
I have HP DL380 G3 P4 with HT with 1GB of RAM, 3x72GB SCSI with RAID 5 on Smart Array 5i. Running vanilla kernel 18.104.22.168 with compiled SMP, HT, IO-APIC, ACPI, NOHIGHMEM, IRQBALLANCE. Server freezes 2x per week, no keyboard and ping responses. After three days I have high iowait load (50-70%) and small system idle (50-20%). Always when I run top the kswapd0 is on the first place with big TIME+. I try also put boot options acpi=noidle but with no effect. Here are few lines from top.
top - 18:14:33 up 3 days, 5:18, 1 user, load average: 1.35, 0.74, 0.64
Tasks: 88 total, 1 running, 87 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.2% us, 16.0% sy, 0.0% ni, 22.3% id, 56.5% wa, 0.0% hi, 0.2% si
Mem: 905912k total, 902148k used, 3764k free, 4764k buffers
Swap: 1954296k total, 27112k used, 1927184k free, 26064k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
47 root 15 0 0 0 0 S 21.4 0.0 137:38.66 kswapd0
11720 postgres 15 0 16216 9988 14m S 7.1 1.1 0:59.01 postmaster
21727 postgres 18 0 18328 11m 14m D 2.9 1.3 0:00.09 postmaster
11429 nobody 15 0 79456 2264 73m S 1.0 0.2 0:11.28 httpd
The problem is the the server is in production use so thanks for any help,
Server load looks a little rough, but it sounds like you're locking up hard. Have you run any tools like memtest?
What happens when you try a prebuilt slackware kernel?
I run memtest and also badblocks and all tests passed OK. It looks like a problem with interrupts because when badblocks was running the iowait has 70-90% load. Maybe the problem is in SMP + HyperThreading and bad IRQ handling.
I do not use default slackware 10 kernel because I want to use 2.6 kernel and other people has the same problem on 2.4 kernels. Hardware seems to be OK.
Just a few thoughts. How is your swap organized? Do you have a small swap partition on each hw disk? I would recommend arranging it like this, not swapping to a RAID device (if you run sw raid of course).
Do you have some oops and panic messages on the screen when it crashes?
Also, check if you don't have PREEMPT_KERNEL enabled. This could cause trouble as well.
There are 3x72GB SCSI disks with hardware RAID 5 and I do not have any other disk where should I move swap. When server crashes it does not respond on ping or keyboard requests (also SYSReq) there is only blank screen. I do not have PREEMPT_KERNEL enabled. In my kernel I have enabled SMP, HT and NOHIGHMEM.
freezing solved, but huge iowait is still there
After upgrade of System BIOS (version 2004.06.23) and flashing Smart Array 5i controller (version 2.58 B) there are no freezes, but the a huge iowait is still there. I think it has something to do with performance of SA 5i controller. I have found some forum tips (, but it requires to buy a new array controller or "+" to SA 5i):
I am getting exactly the same behaviour (well close). I have a very similar configured kernel SMP, HT, IO-APIC, ACPI, HIGHMEM, IRQBALLANCE. 2 mirrored SATA disks (using linux software raid)
I rebooted due to a failed raid disk and when I hot-added the second drive the responsiveness went completely down hill. The situation is very curious...
On boot, 100% idle CPU (both CPU's) - load ~ 1.00!!
Heavy disk access ~100% idle CPU (both CPU's) - load ~ 3.0-5.0 (and climbs with time!)
Heavy CPU usage (i.e. bzipping 100's MB's) ~ 50% Idle CPU - load < 1.00 !!!
The odd thing is that when the CPU is inactive the load is always over one and the more hard disk access is done the more the load goes up (but cpu's still claim to be idle) and when you do MORE CPU usage the load goes down!
The more CPU usage there is on the server the more responsive it seems to be too! For example when just restoring one mirror from another it was transferring at ~ 60MB/sec and doing:
time cat /proc/mdstat
returned about 25seconds real and 0.01 sys and user!
However when the CPU is really busy and the mirroring was forced to slow down to ~ 1MB/sec. the cat /proc/mdstat responded instantly!
I suspect that it is something they have changed with IOAPIC's in the new kernels that has made them not good with hyperthreading. I'm going to try recompiling without IOAPIC.
|All times are GMT -5. The time now is 05:14 AM.|