We have 2 physical machines in company. Both have the same HW configuration, running the same CPU:
Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
One is a regular linux, and on second one we have ESX, version 4.
In the ESX we have linux, which should be almost identical with the linux on first machine.
The kernel version is: (a bit old for these days, but needed cause of old project)
Linux x 2.4.21-53.ELhugemem #1 SMP Wed Nov 14 03:46:17 EST 2007 i686 i686 i386 GNU/Linux
The problem is that virtualized linux is running slower. I have read, that the overhead should be ~8%, which is something I could live with. But the performance hit can be seen by naked eye.
I made 2 test programs:
First was just doing some extensive work in userspace (e.g. giant loop and counting numbers). Here the performance hit is around 8%-10%, which is fine.
Second program is doing syscalls - "close(0);" in loop. And this is where things aren't pretty anymore:
Linux running on real HW:
Code:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.65 0.963257 10 100002 99999 close
0.15 0.001403 33 43 41 open
0.14 0.001368 34 40 36 stat64
0.06 0.000566 566 1 execve
0.00 0.000027 5 5 old_mmap
0.00 0.000007 4 2 fstat64
0.00 0.000006 6 1 read
0.00 0.000006 6 1 munmap
0.00 0.000004 4 1 uname
0.00 0.000003 3 1 brk
------ ----------- ----------- --------- --------- ----------------
100.00 0.966647 100097 100076 total
real 0m4.613s
user 0m0.760s
sys 0m3.730s
Linux running on ESX:
Code:
Process 14702 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
77.76 17.1206772 182 100002 99999 close
3.01 0.703602 703602 1 execve
2.99 0.700382 700382 1 set_thread_area
2.99 0.700337 700337 1 munmap
2.99 0.700328 700328 1 uname
2.99 0.700123 700123 1 read
2.99 0.700108 700108 1 brk
2.14 0.500571 100114 5 old_mmap
1.71 0.400229 200115 2 fstat64
0.43 0.100360 33453 3 1 open
------ ----------- ----------- --------- --------- ----------------
100.00 23.412812 100018 100000 total
real 0m48.434s
user 0m5.410s
sys 0m40.610s
The machine running on ESX spent 1200% more time doing the same thing.
Any ideas why this is happening? It seems, that the context switch is very expensive for some reason.