vmware ESX and slow syscalls

ineya · 02-05-2010, 10:50 AM

We have 2 physical machines in company. Both have the same HW configuration, running the same CPU:
Intel(R) Xeon(R) CPU E5420 @ 2.50GHz

One is a regular linux, and on second one we have ESX, version 4.
In the ESX we have linux, which should be almost identical with the linux on first machine.

The kernel version is: (a bit old for these days, but needed cause of old project)
Linux x 2.4.21-53.ELhugemem #1 SMP Wed Nov 14 03:46:17 EST 2007 i686 i686 i386 GNU/Linux

The problem is that virtualized linux is running slower. I have read, that the overhead should be ~8%, which is something I could live with. But the performance hit can be seen by naked eye.

I made 2 test programs:
First was just doing some extensive work in userspace (e.g. giant loop and counting numbers). Here the performance hit is around 8%-10%, which is fine.

Second program is doing syscalls - "close(0);" in loop. And this is where things aren't pretty anymore:
Linux running on real HW:

Code:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.65    0.963257          10    100002     99999 close
  0.15    0.001403          33        43        41 open
  0.14    0.001368          34        40        36 stat64
  0.06    0.000566         566         1           execve
  0.00    0.000027           5         5           old_mmap
  0.00    0.000007           4         2           fstat64
  0.00    0.000006           6         1           read
  0.00    0.000006           6         1           munmap
  0.00    0.000004           4         1           uname
  0.00    0.000003           3         1           brk
------ ----------- ----------- --------- --------- ----------------
100.00    0.966647                100097    100076 total
 
real    0m4.613s
user    0m0.760s
sys     0m3.730s

Linux running on ESX:

Code:

Process 14702 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 77.76   17.1206772         182    100002     99999 close
  3.01    0.703602      703602         1           execve
  2.99    0.700382      700382         1           set_thread_area
  2.99    0.700337      700337         1           munmap
  2.99    0.700328      700328         1           uname
  2.99    0.700123      700123         1           read
  2.99    0.700108      700108         1           brk
  2.14    0.500571      100114         5           old_mmap
  1.71    0.400229      200115         2           fstat64
  0.43    0.100360       33453         3         1 open
------ ----------- ----------- --------- --------- ----------------
100.00   23.412812                100018    100000 total
 
real    0m48.434s
user    0m5.410s
sys     0m40.610s

The machine running on ESX spent 1200% more time doing the same thing.
Any ideas why this is happening? It seems, that the context switch is very expensive for some reason.

rweaver · 02-08-2010, 02:56 PM

You say the machines are configured the same, but how are the disks being accessed, have you verified there are no patches that apply to your hardware that need applied, do both systems use the same kind of underlying raid, do you have any other applications guests on the esx host, and did you install the vmware tools?

ineya · 02-09-2010, 01:28 AM

http://communities.vmware.com/message/1471414