You could be on to something. I suppose it is possible depending on memory shares allocation that if the vmware-tools driver is ballooning memory so that it may be allocated to a guest with higher shares that needs more physical RAM that it could induce this. (Stealing from Peter to pay Paul -- or something like that). I've not personally seen this as a problem myself...
There are some things that you could consider implementing (and I'm sure others may have more or more robust implementations and/or recommendations than these) to help track down what may be going on.
the first is to make sure that the sysstat package is installed in RHEL4.
Code:
rpm -q sysstat
cat /etc/cron.d/sysstat
this will collect system activity (a snapshot in time every 10 minutes by default) and generate a text report nightly (sometime around 4am by default) that may be found under /var/log/sa/
also, you could configure top sort by mem usage and dump to a file every X minutes via cron.
start top, press M (this should sort by Memory), then W (this should pop a quick confirmation just beneath the memory output and above the process list that says it wrote ~/.toprc) then q (to quit)
then add something to cron.d to capture output every X minutes.
every 5 minutes for example:
Code:
*/5 * * * * root /usr/bin/top -d 1 -n 1 -b >> /root/top.out 2>/dev/null
then after the problem has happened you can refer back to this for a timeline of process activity sorted by memory usage to see what the big hitters were.