EXT3 Performance 30% of EXT2 when moving from RH6.2 to RHEL4U7

scottc · 08-14-2009, 03:59 PM

Situation
----------
Existing Environment consists of about 400 RH6.2 servers running Kernel 2.4.34.5-1-i686-HUGEMEM and are migrating to RHEL4 running Kernel 2.6.9-78.0.17.Elsmp

When we move the workload from a RH6.2 server to one running RHEL4 on the exact hardware configuration, our write performance on ext3 drops by a factor of 3 and the boxes cannot sustain the IO, resulting in 100% disk usage, threads blocking for write and the application becoming unresponsive.

In our environment, here are relative performance percentages

RH6.2 ext2 = 100%
RHEL6.2 ext3 = 95%
RHEL4 ext2 = 60% of RH6.2 ext2
RHEL4 ext3 = 30% of RH6.2 ext2
RHEL5 -> equivalent to RHEL4

Hardware
----------
HP DL360 G5
HP 400i Raid Controller
no Cache enabled for Writes
10GB RAM
146GB Drives, 10k RPM
2 Drives set up as a single Mirrored device

Software configuration
----------------------
We have single filesystem /x that contains approximately 90 subdirectories of binaries. Each of These binaries have 1-30 children that write to logs within the /x subdirectory (detail is not important right now)

The standard enterprise tools are installed in the production environment (Tivoli, Big Brother, etc...). Nothing in the lab

Question
---------
In your enterprise environment, have you seen a drop of performance by a factor of 3 when moving from RH6.2 to RHEL4 and from ext2 to ext3 ?

We've tried every tuning parameter of the LVM, tried all the different elevators, and heard recommendations from HP and Redhat. We've even tried it with and without LVM.

Is anyone else out there experiencing this problem going between RH6.2 and EL4/EL5.

Thanks

Scott

salasi · 08-14-2009, 07:31 PM

first thought was that they might have changed the default set up for the filesystems. Have you checked in fstab? (For example, having noatime or nodiratime set in the highest-performing version and not in the lowest-performing one.)

That said, there is probably more margin for changing the performance of an ext4 volume than an ext3 one by filesystem tuning and less still for ext2. If I remember correctly, extents are beneficial to ext3 performance, but as they weren't available under ext2, they couldn't explain the precipitous decline in performance (although if the question was 'how do I get some more performance' and not 'why has this happened').

(see
http://lwn.net/Articles/187321/ and don't trust anything I wrote in that previous para....my memory has gone, at least as far as things that I wasn't really all that interested in are concerned!)

When you say "the exact hardware configuration", you did mean that the partitions are set up in the same order (if you have multiple partitions on the disk). OTOH, I would have expected that to only be worth 2x, worst case, and you have 3x, and you may even only have a single partition.

Quote:

running Kernel 2.6.9-78.0.17.Elsmp

There was a long period for which there was a problem in the SATA subsystem which only afflicted 64 bit systems, but I think that started at 2.6.10 (so actually later than your 2.6.9 kernel) and that was a 'drop dead' rather than a 'limp along' problem.

I take it that RHEL5, when you tried it, had a much later kernel? (Was this 'latest' or 'default'?...you probably tried both, right?)

Beyond those, it does sound like something really nasty (and subtle) hiding in the kernel...failing to take advantage of the lower latency of the mirrored array, perhaps? But I'm not sure how you get from here to a more acceptable place.

Quote:

no Cache enabled for Writes

I take it that you feel that this is a necessity, for integrity reasons? (And there isn't an alternative option...eg, battery back-up on the hp card, if power going down is your concern?) Not even worth trying as an experiment to see if makes a difference?

syg00 · 08-14-2009, 09:53 PM

I'd be looking for a 2.4 vs. 2.6 kernel issue rather than strictly filesystem.
Is the ELsmp kernel a hugemem equivalent - i.e. is it seeing all the memory ?. Swap usage may be getting in the way of the user I/O - easy to check.

Quote:

tried all the different elevator

Including NOOP ?.

Migrating users (sites) dependent on 2.4 I/O profiles usually find the swappiness default introduced in 2.6 messes with their numbers. Try setting it to 0 (zero) to get the 2.4 behaviour - easy to change to test, easy to revert.
Journalling filesystems add overheads just because of the journalling - as you can see from your own 2.4 numbers in the first post. There are several papers on performance options - be aware that some may have integrity drawbacks for production sites. Have a look at noatime and nodiratime as mount options - I use them everywhere, more especially on heavy read environments.

salasi · 08-15-2009, 06:07 AM

Quote:

Originally Posted by syg00

Swap usage may be getting in the way of the user I/O - easy to check.

Excellent point; for some reason, when I wrote my reply I was thinking that excess swap had already been eliminated and re-reading this thread it is clear that that isn't the case.

Quote:

swappiness default introduced in 2.6 messes with their numbers. Try setting it to 0 (zero) to get the 2.4 behaviour - easy to change to test, easy to revert.

My experience of tweaking swappiness is that it has more impact on the relative priority of different classes of tasks than on overall throughput. But, as you say, easy enough to check, and you should get all of the 'easy to check' stuff out of the way.

BTW, you may want to have a look over at Phoronix to see if there is something there that strikes a chord. I can't remember anything directly relevant, but they are the only people I can think of that do performance testing on distro A with kernel X against distro B with kernel Y and version x.x of distro A against version y.y of the same distro. Bit of a long shot, but you never know.

Also, you might want to say what you primarily want to get out this process; is it firstly to have a working system (and you don't care how you get it and you don't care about understanding it further) or do you want to understand it further, on the assumption that once you understand, the rest will become easy, or, at least, possible?

Quote:

Journalling filesystems add overheads just because of the journalling

Journalling is clearly more involved, and involves more code, but that isn't necessarily the same as slower, when comparing filesystem A with filesystem B. In the case in which disk accesses can be reduced, because disk accesses are so much slower than anything that happens in ram, there can still be a speed gain in particular use cases. Agreed that a slow down is more common than a speed up, though.

I'd also re-emphasise the point that, as a general rule of thumb, the more sophisticated the filesystem, the more its performance is suceptible to modification by changing the set-up parameters. So while accepting 'the defaults' (which presumably means 'this distro's defaults') in every case, seems like a level playing field in FS performance testing, it isn't really; it disadvantages the FSs with better potential gains from tuning.

Quote:

Have a look at noatime and nodiratime as mount options - I use them everywhere, more especially on heavy read environments.

...which is exactly why noatime is helpful (reduces disk accesses); I would say always use noatime unless there is a particular reason not to. I believe that noatime implies nodiratime; is that wrong?