Extremely high slab cache usage
I am running RHAS 3.0 (update 5 ), kernel 2.4.21-32.0.1.ELhugemem and running Oracle 10g RAC. The machine is a part of a 7 node cluster and this particular node is where we are running our Oracle backups via rman (Oracle processes which coodinates the backup process). In case you're familiar with Oracle10g, they have a new feature called ASM (Automatic Storage Management) which is kindof a replacement to volume management and filesystems. Interestingly enough, there is a particular process that spins up during our run and I've performed some strace's on it and found that it is performing statfs() calls on the ASM disks presented to Oracle. Funny thing is, the fs_type in the struct is "EXT2_SUPER_MAGIC", which makes me wonder if Oracle is simply implementing an EXT2 filesystem behind the scenes?
What I am observing is that the buffer_head cache grows by 2000k after each refresh of slabtop utility. It grows from an initial size of ~5000k to over 100000k initially when the backup is kicked off. During that time, we have high IOWAIT times. Once the the cache growth slows down our CPUs shift to be running nearling entirely in SYSTEM time. The machine has 4 x 3.0Mhz hyperthreaded processors and all of them are about 95% sitting in SYSTEM time! My theory is that the buffer_head is getting completely used entirely too much and once we've allocated so much, then the kernel is then working with fragmented memory, in the low memory, and then working really hard to continue to allocate more slabs to the buffer_head cache.
My research has shown me that the buffer_head seems to be primarily used in filesystem operations. So, I'm wondering if there are certain parameters that I could tweak to tune the box to not allocate so much from the buffer_head? Any ideas?