advice on how to troubleshoot performance issues - HP DL360G5 + ESXi w/ Centos6x64
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
advice on how to troubleshoot performance issues - HP DL360G5 + ESXi w/ Centos6x64
Hi all,
Well I am having a special kind of hell having moved from bare metal with my HP ProLiant DL360G5 (22G Ram, 2x 4core E5420s) to VMware ESXi (built on a RAID 1 Mirror Pair of 72G 10K SAS drives) hosting a dedicated VPS CentOS 6 x64 with cPanel/WHM as a LAMP server essentially.
The sites and payload is all the same as I ran on baremetal; the server is the same; it just got shipped to a new DC, was spun up with ESXi on the mirror pair of 72G 10K SAS and then a VM built on a mirror pair of SATA 300G.
I seem to be getting diabolical server loads in the vm. The other thing that I am being told is that the disk IO is shot to hell.
This is being measured at the physical level apparently so I am assured that it is not VM IO to blame.
I was told the following:
Quote:
If we copy data from SATA datastore 3 to SATA datastore 2 we get speeds of around 10,000KBps
If we copy data from SATA datastore 3 to SATA datastore 3 we get speeds of around 10,000KBps
If we copy data from SATA datastore 3 to SAS datastore 1 we get speeds of around 10,000KBps
If we copy data from SAS datastore 1 to SAS datastore 1 we get speeds of around 25,000KBps
If we copy data from SATA datastore via 1000mbit network from one of our servers to SAS datastore 1 we get speeds of around 25,000KBps
If we copy data from SATA datastore via 1000mbit network from one of our servers to SAS datastore 1 we get speeds of around 10,000KBps
If we copy data from SATA datastore on one of our servers to SATA datastore on the same server we get speeds of around 125,000KBps
If we copy data from SATA datastore on one of our servers to SATA datastore on a different server via 1000mbit network we get speeds of around 125,000KBps
If we copy data from SAS datastore on one of our servers to SAS datastore on the same server we get speeds of around 200,000KBps
So without a doubt there is an IO issue with your server, putting in new SAS drives or SATA drives will not fix anything, as all 3 independent raid 1 on your server are preforming extremely bad.
Now I did a UnixBench and got the following results:
The last benchmark results I got on the same server baremetal was 669 and 2267.
I don't really know where to go with troubleshooting this.
I am told that if I go back to baremetal it won't make a difference becuase the IO is sluggish at the physical layer.
They say that it could be anything from BIOS to backplane to mixing SAS with SATA on the same hardware.
This server never ever gave me a hint of trouble.
In the iLO 2 there is no warnings. All drives apear normal.
hdparm is this
Quote:
root@solaris [~]# hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 6636 MB in 2.00 seconds = 3321.44 MB/sec
Timing buffered disk reads: 574 MB in 3.00 seconds = 191.31 MB/sec
Don't have the experience to troubleshoot this efficiently guys, the server is also remote to me so they want to charge to troubleshoot and guess and fiddle around - which I don't want them to do.
I would like to work out how to run some solid tests to see where and what the problem may be.
the server load for what it is doing seems always abnormally high like 12 and 15...
I hope I can get some help diagnosing this. Some people tell me that this is all down to virtualisation and IO will be crap but then the results I have been given above show that the slow performance is at server level anyway.
Best regards,
W
Last edited by stardotstar; 03-29-2012 at 06:23 PM.
I seem to be getting diabolical server loads in the vm. The other thing that I am being told is that the disk IO is shot to hell.
The latter will be causing the former in all likelihood. Do you see any %wa in top (or sar ...)
Presuming you are really talking about "loadavg".
It's possible your provider is correct. *And* you mates as well. Maybe something got bumped in the move - maybe it was always like that, and you weren't pushing the kit hard enough to find out.
Everything (especially the I/O) being virtualized might be enough to bump it over the (performance) cliff.
Did you ever do the hdparm on when you had it as bare-metal ?. Do you have historical sar data to ensure I/O loads are comparible ?.
I'd be inclined to get a liveCD booted and see some numbers from there. But then I don't have to justify the cost.
I am told that if I go back to baremetal it won't make a difference becuase the IO is sluggish at the physical layer.
If your provider said that, I'd make them go back to bare-metal and prove it.
As I said, they may be right, but it may also prove your case that it was o.k. in the past. Maybe they'd get some upgrade business out of it, so they may be inclined to agree to the test.
Hi syg00 - thanks for the reply/s and sorry for slowly getting back to you
What you say makes perfect sense.
No, I don't have hdparm results or anything other than the unixbench data from the old install.
The possibility is that there is a problem just as you say - but somehow it just doesn't tally.
I am thinking RAID 0 in combination with slower disks (sata vs sas) I was using sas 10k drives in 1+0 previously and now 0 on sata.
I just wish I could get the flexability to do some other testing - I may need to get my provider to move my vm onto their hardware and do some testing on mine in the mean time - maybe add a pair of sas for the OS and then put the data and db volumes on a pair of 90G ssd even in 0's they would be significantly faster all round...
BTW I have attached screen grabs of the baremetal unixbench results (*all I have) and the current and currently typical top:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.