RAID/LVM advice needed

abisko00 · 09-04-2023, 10:24 AM

Hi there!

Haven't been around for a while, but I guess this is the right crowd to ask this question:

I have a JBOD of 24 disks with 8 TB each which I want to set-up as redundant RAID volume for the storage of predominantly rather large files (~500 GB). Earlier, I was using a single 95 TB RAID-5 array on a PERC H840 controller. Which was probably not very smart, as I now learned, due to the need to sync many drives and the danger of non-recoverable data. It was recommended to split the 24 disks into 3-4 smaller arrays. I'd like to do that, but I need to combine them into one logical volume.

So the idea is to combine three hardware RAID-5 arrays to one linear logical volume in LVM. Again, I heard this is what LVM was developed for, but I find no examples from people who did that on such large arrays.

How does LVM affect hardware RAID-5 performance?
I am concerned about the two levels of combining the volumes. Adding level of complexity to my system appears dangerous to me. It seems more straightforward to have only one partition that does not require any configuration.

Comments highly appreciated.

Cheers
abisko00

Using Ubuntu 20.04

wpeckham · 09-04-2023, 12:01 PM

The technical meaning of JBOD is a bunch of disks that can be accessed as one volume. Since you are looking to RAID them, I assume that one volume thing does not yet apply?

I have never done a raid LVM volume of raid arrays. I have used MDADM on top of LVM volumes, but this is backword (or inside out) compared to that. I am not sure the result will be what you intend. I prefer MDADM over raid controllers because that makes migration, upgrades, or controller replacements far less painful.

What file system have you considered using? I ask because this can directly impact what you want to accomplish and how it will perform.

Speaking of that, what is your performance priority? Speed, capacity, security, all have different optimal profiles.

You also have to consider backups. RAID is not a backup, and you must consider the recovery path if all of your drives and controllers get taken out by a lightning strike or fire. What is your backup media and schedule like?

syg00 · 09-04-2023, 08:56 PM

LVM doesn't affect the hardware RAID other than by throwing I/O at it. Doesn't really matter whether LVM gets to see one volume or 3 exposed by the RAID card. Where it might matter is how it (LVM) maps its I/O to what it sees as separate physical volumes.

Never having had the pleasure of a true hardware card, I'm in the dark, but if I had to trust my data to a large re-sync, I'd want it being handled in the background by something that was designed to do just that. I like the idea of breaking the pool up and dropping LVM on top - it adds flexibility with only minor addition complexity.

abisko00 · 09-05-2023, 02:02 AM

Great, thanks for your helpful input.

Having a RAID controller at hand, I prefer using it and like syg00, I appreciate the controller doing its job in the background.

The intended filesystem is ext4, simply for the reason I hadn't had time to look into xfs or anything fancy, yet. I am using btrfs on a different array, but for performance reasons, I prefer not to use it here.

My performance priority is difficult to decide on. Long term storage with high accessibility is the major intention, but we also need to use the array for calculations with more smaller I/O processes. Maybe some time, I will add a RAID10 for the later purpose.

I certainly do backups.

Every night, we sync everything with a tape drive in a computing centre.

wpeckham · 09-05-2023, 11:34 AM

Quote:

Originally Posted by abisko00

Great, thanks for your helpful input.

Having a RAID controller at hand, I prefer using it and like syg00, I appreciate the controller doing its job in the background.

The intended filesystem is ext4, simply for the reason I hadn't had time to look into xfs or anything fancy, yet. I am using btrfs on a different array, but for performance reasons, I prefer not to use it here.

My performance priority is difficult to decide on. Long term storage with high accessibility is the major intention, but we also need to use the array for calculations with more smaller I/O processes. Maybe some time, I will add a RAID10 for the later purpose.

I certainly do backups.

Every night, we sync everything with a tape drive in a computing centre.

Okay. The best speed for I/O I have ever used on a modern system was EXT-4/Raid-5 on about 16 drives. (EXT-2 was faster, but has insufficient modern features and tighter limits.) XFS, JFS, RaiserFS were all slower. Raid-6 was slightly slower, but not very much. BTRFS is still a moving target, but generally slower than EXT-4. An enterprise SAN was faster, but not a whole lot.
I used LVM, with MDADM providing RAID.

My testing involved massive numbers of database files in use at the same time, and large file transfers.

Although it is non-intuitive, EXT-4 on RAID-5 was faster than anything RAID-10 based.

My controllers were all enterprise grade FW-SCSI in an enterprise environment installed in 64 core machines with 512GB ram.

I hope that if you do complete this configuration of MDADM raid on top of LVM over HWRAID you will share the configuration and test results.

abisko00 · 09-05-2023, 12:12 PM

Quote:

Originally Posted by wpeckham

Okay. The best speed for I/O I have ever used on a modern system was EXT-4/Raid-5 on about 16 drives.

That's cool, so I made the right choice. Would it make sense to ask you for RAID specific ext4 parameters? I have a strip element size (chunk?) of 256kb, 4096b block size resulting in 64 stride size, and 21 disks, so stride width should be 1344. Really confusing is the naming of these parameters. "Strip/stride/stripe/chunk/block", damn I am still not sure if I got them all right.

Quote:

Although it is non-intuitive, EXT-4 on RAID-5 was faster than anything RAID-10 based.

This is indeed not intuitive after I learned that RAID10 won't need to write the parity information in addition to the data. I'll need to do some experimentation myself.

Quote:

My controllers were all enterprise grade FW-SCSI in an enterprise environment installed in 64 core machines with 512GB ram.

Do I understand correctly that even though you have a RAID controller, you still prefer software RAID? Or do I misunderstand the use of MDADM as software RAID? My machine is a little less powerful with only 16 cores and 32GB RAM.

Quote:

I hope that if you do complete this configuration of MDADM raid on top of LVM over HWRAID you will share the configuration and test results.

Sure, I'll try to share my experiences here. But just to be on the same page, using MDADM was never my plan. I was only referring to linear LVM volumes, not striped or mirrored. LVM should just make one large disk out of several RAID volumes. Interesting question that appeared in the meantime: What would happen if I combine, let's say a RAID-5 and a RAID-10 into one large LVM volume? Is that even possible? But how should LVM know what is provided by the hw controller. To LVM they should appear all the same, correct? Sorry, I digress.

wpeckham · 09-07-2023, 01:29 AM

Quote:

Originally Posted by abisko00

TDo I understand correctly that even though you have a RAID controller, you still prefer software RAID? Or do I misunderstand the use of MDADM as software RAID?

A reasonable question.

I discovered that the replacement controller for the FW-SCSI controllers I started with would not read the RAID formatting on the drives formatted with the original controllers. That meant in failure mode or hardware update I would be building new RAID and restoring from backup. For storage sizes of that size that implied a significant delay before being back in production. This needed to be mitigated in multiple ways.

#1 both controllers would recognize MDADM raid devices using MDADM, so a direct replacement had a clear chance to succeed and be far faster than a restore.

#2 I went to a failover test server with EXACTLY the same software/configuration/data server that could become production quickly and with minimal downtime. This positioned originally in a different rack and room of the same data center, eventually migrated to a different building.

This provided a clear DR advantage. Failover gave me a clear uptime advantage in the case of a failure or disaster, MDADM gave me a quicker recovery in the case that my drives survived.

While benchmark testing the hardware RAID against MDADM showed a slight advantage to the hardware RAID, there was a slight advantage to MDADM during actually database benchmarking. Something about the characteristics of your database operation was clearly handled better in software RAID, but I was unable to pin down the deciding factor. Note that this may have been very hardware and software specific, and you should perform your own tests if this might be important.

Getting your specific desired configuration (two levels of raid) to work might not be reasonable using only MDADM. Taking advantage of the hardware raid for your operation makes sense even if there is a slight performance penalty.

Since my situation was significantly different, that means my choices may not be the best for your operation.
Does that make sense to you?

abisko00 · 09-07-2023, 02:29 AM

Thanks a lot for the insight.

As soon as I receive new hardware, I will certainly to some experimenting. A failover server seems like a great thing to have. Currently (off topic) I am dealing with restoring 74 TB of data from backup after my experimenting with RAID enlargement has led to a complete data loss. Unfortunately, I wasn't aware that our departmental firewall (outside my sphere of influence) limits all transfers to 1 Gbit/s....

syg00 · 09-07-2023, 04:36 AM

If you don't laugh, you'll cry.

If it's above your pay grade, go kick whichever line manager is sufficiently recompensed. Backups are pointless unless they are usable (!) in a timely (!) manner.

chrism01 · 09-08-2023, 12:31 AM

Can just recommend RAID 6 instead of RAID 5 ?

Large disks take a long time (relatively speaking) to re-sync as a new replacement for a failed disk.
RAID 6 allows for 2 failed disks and to keep on running - RAID 5 only allows for one ...

Guttorm · 09-08-2023, 08:40 AM

Yes I recommend RAID 6 too. The main reason is that when the disks get old and one fails. Raid 5 will make a lot of stress on all the disks while rebuilding - lots of reading and writing on all of them. It can happen that another disk fails during the rebuild.

Also, avoid SMR disks in RAID. It has terrible speed when rebuilding a RAID.

https://www.howtogeek.com/803276/cmr...he-difference/