advice on how to troubleshoot performance issues

stardotstar · 03-29-2012, 03:51 AM

Hi all,

Well I am having a special kind of hell having moved from bare metal with my HP ProLiant DL360G5 (22G Ram, 2x 4core E5420s) to VMware ESXi (built on a RAID 1 Mirror Pair of 72G 10K SAS drives) hosting a dedicated VPS CentOS 6 x64 with cPanel/WHM as a LAMP server essentially.

The sites and payload is all the same as I ran on baremetal; the server is the same; it just got shipped to a new DC, was spun up with ESXi on the mirror pair of 72G 10K SAS and then a VM built on a mirror pair of SATA 300G.

I seem to be getting diabolical server loads in the vm. The other thing that I am being told is that the disk IO is shot to hell.

This is being measured at the physical level apparently so I am assured that it is not VM IO to blame.

I was told the following:

Quote:

If we copy data from SATA datastore 3 to SATA datastore 2 we get speeds of around 10,000KBps

If we copy data from SATA datastore 3 to SATA datastore 3 we get speeds of around 10,000KBps

If we copy data from SATA datastore 3 to SAS datastore 1 we get speeds of around 10,000KBps

If we copy data from SAS datastore 1 to SAS datastore 1 we get speeds of around 25,000KBps

If we copy data from SATA datastore via 1000mbit network from one of our servers to SAS datastore 1 we get speeds of around 25,000KBps

If we copy data from SATA datastore via 1000mbit network from one of our servers to SAS datastore 1 we get speeds of around 10,000KBps

If we copy data from SATA datastore on one of our servers to SATA datastore on the same server we get speeds of around 125,000KBps

If we copy data from SATA datastore on one of our servers to SATA datastore on a different server via 1000mbit network we get speeds of around 125,000KBps

If we copy data from SAS datastore on one of our servers to SAS datastore on the same server we get speeds of around 200,000KBps

So without a doubt there is an IO issue with your server, putting in new SAS drives or SATA drives will not fix anything, as all 3 independent raid 1 on your server are preforming extremely bad.

Now I did a UnixBench and got the following results:

Quote:

# # # # # # # ##### ###### # # #### # #
# # ## # # # # # # # ## # # # # #
# # # # # # ## ##### ##### # # # # ######
# # # # # # ## # # # # # # # # #
# # # ## # # # # # # # ## # # # #
#### # # # # # ##### ###### # # #### # #

Version 5.1.3 Based on the Byte Magazine Unix Benchmark

Multi-CPU version Version 5 revisions by Ian Smith,
Sunnyvale, CA, USA
January 13, 2011 johantheghost at yahoo period com

1 x Dhrystone 2 using register variables 1 2 3 4 5 6 7 8 9 10

1 x Double-Precision Whetstone 1 2 3 4 5 6 7 8 9 10

1 x Execl Throughput 1 2 3

1 x File Copy 1024 bufsize 2000 maxblocks 1 2 3

1 x File Copy 256 bufsize 500 maxblocks 1 2 3

1 x File Copy 4096 bufsize 8000 maxblocks 1 2 3

1 x Pipe Throughput 1 2 3 4 5 6 7 8 9 10

1 x Pipe-based Context Switching 1 2 3 4 5 6 7 8 9 10

1 x Process Creation 1 2 3

1 x System Call Overhead 1 2 3 4 5 6 7 8 9 10

1 x Shell Scripts (1 concurrent) 1 2 3

1 x Shell Scripts (8 concurrent) 1 2 3

8 x Dhrystone 2 using register variables 1 2 3 4 5 6 7 8 9 10

8 x Double-Precision Whetstone 1 2 3 4 5 6 7 8 9 10

8 x Execl Throughput 1 2 3

8 x File Copy 1024 bufsize 2000 maxblocks 1 2 3

8 x File Copy 256 bufsize 500 maxblocks 1 2 3

8 x File Copy 4096 bufsize 8000 maxblocks 1 2 3

8 x Pipe Throughput 1 2 3 4 5 6 7 8 9 10

8 x Pipe-based Context Switching 1 2 3 4 5 6 7 8 9 10

8 x Process Creation 1 2 3

8 x System Call Overhead 1 2 3 4 5 6 7 8 9 10

8 x Shell Scripts (1 concurrent) 1 2 3

8 x Shell Scripts (8 concurrent) 1 2 3

========================================================================
BYTE UNIX Benchmarks (Version 5.1.3)

System: solaris.sourcepoint.com.au: GNU/Linux
OS: GNU/Linux -- 2.6.32-220.7.1.el6.x86_64 -- #1 SMP Wed Mar 7 00:52:02 GMT 2012
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
CPU 1: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
CPU 2: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
CPU 3: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
CPU 4: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
CPU 5: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
CPU 6: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
CPU 7: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (5000.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSCALL/SYSRET
17:11:43 up 4:37, 6 users, load average: 7.33, 10.34, 8.48; runlevel 3

------------------------------------------------------------------------
Benchmark Run: Thu Mar 29 2012 17:11:43 - 17:47:10
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 19435624.7 lps (10.0 s, 7 samples)
Double-Precision Whetstone 2765.0 MWIPS (9.7 s, 7 samples)
Execl Throughput 348.5 lps (31.3 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 402652.5 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 132149.1 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 641784.0 KBps (30.0 s, 2 samples)
Pipe Throughput 991098.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 92890.8 lps (10.0 s, 7 samples)
Process Creation 882.5 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 1263.2 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 502.7 lpm (60.1 s, 2 samples)
System Call Overhead 1170189.3 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 19435624.7 1665.4
Double-Precision Whetstone 55.0 2765.0 502.7
Execl Throughput 43.0 348.5 81.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 402652.5 1016.8
File Copy 256 bufsize 500 maxblocks 1655.0 132149.1 798.5
File Copy 4096 bufsize 8000 maxblocks 5800.0 641784.0 1106.5
Pipe Throughput 12440.0 991098.6 796.7
Pipe-based Context Switching 4000.0 92890.8 232.2
Process Creation 126.0 882.5 70.0
Shell Scripts (1 concurrent) 42.4 1263.2 297.9
Shell Scripts (8 concurrent) 6.0 502.7 837.9
System Call Overhead 15000.0 1170189.3 780.1
========
System Benchmarks Index Score 481.1

------------------------------------------------------------------------
Benchmark Run: Thu Mar 29 2012 17:47:10 - 18:23:56
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables 138944790.0 lps (10.0 s, 7 samples)
Double-Precision Whetstone 22968.6 MWIPS (10.0 s, 7 samples)
Execl Throughput 3144.4 lps (29.4 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 295069.4 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 93845.6 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 684232.6 KBps (30.0 s, 2 samples)
Pipe Throughput 6113470.0 lps (10.0 s, 7 samples)
Pipe-based Context Switching 668762.2 lps (10.0 s, 7 samples)
Process Creation 4091.9 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 5655.4 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 741.8 lpm (60.4 s, 2 samples)
System Call Overhead 2327030.9 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 138944790.0 11906.2
Double-Precision Whetstone 55.0 22968.6 4176.1
Execl Throughput 43.0 3144.4 731.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 295069.4 745.1
File Copy 256 bufsize 500 maxblocks 1655.0 93845.6 567.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 684232.6 1179.7
Pipe Throughput 12440.0 6113470.0 4914.4
Pipe-based Context Switching 4000.0 668762.2 1671.9
Process Creation 126.0 4091.9 324.8
Shell Scripts (1 concurrent) 42.4 5655.4 1333.8
Shell Scripts (8 concurrent) 6.0 741.8 1236.3
System Call Overhead 15000.0 2327030.9 1551.4
========
System Benchmarks Index Score 1494.1

The last benchmark results I got on the same server baremetal was 669 and 2267.

I don't really know where to go with troubleshooting this.

I am told that if I go back to baremetal it won't make a difference becuase the IO is sluggish at the physical layer.

They say that it could be anything from BIOS to backplane to mixing SAS with SATA on the same hardware.

This server never ever gave me a hint of trouble.

In the iLO 2 there is no warnings. All drives apear normal.

hdparm is this

Quote:

root@solaris [~]# hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 6636 MB in 2.00 seconds = 3321.44 MB/sec
Timing buffered disk reads: 574 MB in 3.00 seconds = 191.31 MB/sec

Don't have the experience to troubleshoot this efficiently guys, the server is also remote to me so they want to charge to troubleshoot and guess and fiddle around - which I don't want them to do.

I would like to work out how to run some solid tests to see where and what the problem may be.

the server load for what it is doing seems always abnormally high like 12 and 15...

I hope I can get some help diagnosing this. Some people tell me that this is all down to virtualisation and IO will be crap but then the results I have been given above show that the slow performance is at server level anyway.

Best regards,
W

syg00 · 03-30-2012, 12:00 AM

Quote:

Originally Posted by stardotstar

I seem to be getting diabolical server loads in the vm. The other thing that I am being told is that the disk IO is shot to hell.

The latter will be causing the former in all likelihood. Do you see any %wa in top (or sar ...)
Presuming you are really talking about "loadavg".

It's possible your provider is correct. *And* you mates as well. Maybe something got bumped in the move - maybe it was always like that, and you weren't pushing the kit hard enough to find out.
Everything (especially the I/O) being virtualized might be enough to bump it over the (performance) cliff.

Did you ever do the hdparm on when you had it as bare-metal ?. Do you have historical sar data to ensure I/O loads are comparible ?.
I'd be inclined to get a liveCD booted and see some numbers from there. But then I don't have to justify the cost.

syg00 · 03-30-2012, 12:15 AM

Just re-reading your post ...

Quote:

Originally Posted by stardotstar

I am told that if I go back to baremetal it won't make a difference becuase the IO is sluggish at the physical layer.

If your provider said that, I'd make them go back to bare-metal and prove it.

As I said, they may be right, but it may also prove your case that it was o.k. in the past. Maybe they'd get some upgrade business out of it, so they may be inclined to agree to the test.

stardotstar · 04-02-2012, 04:47 AM

Hi syg00 - thanks for the reply/s and sorry for slowly getting back to you
What you say makes perfect sense.
No, I don't have hdparm results or anything other than the unixbench data from the old install.
The possibility is that there is a problem just as you say - but somehow it just doesn't tally.
I am thinking RAID 0 in combination with slower disks (sata vs sas) I was using sas 10k drives in 1+0 previously and now 0 on sata.
I just wish I could get the flexability to do some other testing - I may need to get my provider to move my vm onto their hardware and do some testing on mine in the mean time - maybe add a pair of sas for the OS and then put the data and db volumes on a pair of 90G ssd even in 0's they would be significantly faster all round...

BTW I have attached screen grabs of the baremetal unixbench results (*all I have) and the current and currently typical top:

Quote:

top - 19:44:21 up 4 days, 8:10, 3 users, load average: 8.38, 6.14, 6.08
Tasks: 342 total, 19 running, 322 sleeping, 0 stopped, 1 zombie
Cpu(s): 60.6%us, 26.5%sy, 0.0%ni, 12.1%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 18402604k total, 10205456k used, 8197148k free, 389048k buffers
Swap: 4128760k total, 11564k used, 4117196k free, 7731956k cached

Not much in the way of wait really.