here's what I've been able to find so far:
Packages on both servers are identical as is hardware; Veritas DMP is setup to have the ruleset min-q-length on all dmp devices and that appears to be working as expected.
Kernel Settings are identical on both nodes, too.
strace of du command shows a large 'delay' after getdents is called, but only from the *slow* node:
Code:
0.000089 lstat("94436.xml", {st_mode=S_IFREG|0664, st_size=9001, ...}) = 0
0.000083 lstat("94437.xml", {st_mode=S_IFREG|0664, st_size=9001, ...}) = 0
0.000167 lstat("94438.xml", {st_mode=S_IFREG|0664, st_size=9001, ...}) = 0
0.000131 getdents(4, /* 1024 entries */, 32768) = 32768
0.004166 lstat("94439.xml", {st_mode=S_IFREG|0664, st_size=9001, ...}) = 0
0.000091 lstat("94440.xml", {st_mode=S_IFREG|0664, st_size=9001, ...}) = 0
And, that is very consistent but happens on both nodes even if I swap the CFS Primary Server!
I have a feeling that some meta-data update occurs after calling getdents and that requires some communication/handshake that shows up under lstat as lstat is probably waiting on something to happen within the volume/filesystem.
Sound familiar to anyone?