LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   LVM on top RAID: Disadvantages? (https://www.linuxquestions.org/questions/slackware-14/lvm-on-top-raid-disadvantages-4175677892/)

Razziatore 06-30-2020 07:40 AM

LVM on top RAID: Disadvantages?
 
Hi All,
I have use RAID on my slackware box for a long time (about 10 years) but i never use LVM because I don't see advantages in it. I try it several times but without find real utility.

At the moment I have an array of six 6TB Red drive in RAID6 formated in JFS and it work fine for many years (the array have almost 5 year). But I want to make snapshot so i'm sure that when i begin a backup all the data are consistent.

This is plan:
  • Make two partion on each drive
    1. The first, 99% of the disk, for data. This is give me the security in the unfortunate hypothesis that it must change drive in case of breakage. Mdadm does not accept drive smaller than the original one. For this reason at the moment I have left 50 MB at the end of the 6TB drive. But I think that's not enough.
    2. The second, 1% of the disk, i will enlarge the unused space from the current 50 MB up to 1% of the disk. In this way I can create a second partion to use for snapshot.
  • Create two mdadm raid. One for the real data and one for store the changes in the snapshot.
  • Create a VG of the two md-device in linear mode
  • Create a LV of the size of the data raid
  • Mount/use the LV created

in case of backup i will create a snaphot using the free space.

Questions:
  1. Is there any downside in what I want to do?
  2. Now I use JFS... is a good FS o i must migrate to an other ( I am evaluating XFS )?
  3. I'm want to use RAID/LVM only for data not for boot. I need some special configuration?
  4. any suggestion?

Best regards,
Razziatore

Labinnah 06-30-2020 10:16 AM

I've used LVM on RAID some time ago.

There are probably many disadvantages but I know one - some performance lost as there are one more abstraction level.

But there are advantages in freedom with partition sizes, physical disc used etc. Once I was forced to change 8 disk RAID 10 to 6 disk RAID 6 online - thanks to LVM I was able to do it:
- degrade 8 disk RAID 10 to 4 disk;
- make from freed 4 disk degraded RAID 6;
- create PV on RAID 6 and add it to VG;
- move VG from RAID 10 PD to RAID 6 PD;
- remove RAID 10 PV from VG and delete it;
- use 2 from 4 free left disks from RAID 10 to recover RAID 6.
This wasn't particularly safe but was doable.

And some my thoughts about this.
1. Use only one partition on a drive as RAID volume for PD. Use LV as partitions instead. Creating multiple RAIDs on same drive is pointless and adds unnecessary complicity.
2. Don't use whole VG space for LV. Resize volumes on need. This gives you more freedom with partition sizes, data movement between PD, etc.
3. You can use RAID 1 as boot when metadata ar stored at the end of drive (0.9, 1.0)
4. Create some test loop partitions create LVM on it and play on it to get familiar with commands and capabilities it have (extend, shrink, move, add, remove them). This may give you clue to better design your system.
5. My design for you:
One PV on RAID6
One VG on PV.
Two LV (JFS, snapshot) on VG with sizes required for today. You can extends them later.
You can, however create more VG with separated partitions for '/' '/home' or '/var' - all of them with flexible size to be change in future.

Razziatore 06-30-2020 01:52 PM

Hi Labinnah,
Thanks for you replay but i'm not agree with you in some points and I would like to analyze them together.

Quote:

Originally Posted by Labinnah (Post 6139703)
There are probably many disadvantages but I know one - some performance lost as there are one more abstraction level.

I supose it depends of what kind of perfomance lost i get. Now I'm realy near the teorical max speed of n-2 read speed so i'm the range 400-600 MiB/s, more than enough to saturate my gigabit network. So this perfomance lost will be some percetual point or I will lost half of speed?

Quote:

Once I was forced to change 8 disk RAID 10 to 6 disk RAID 6 online - thanks to LVM I was able to do it:
Cool. But this is why I'm always cold about LVM: Yes what you dooing is cool (I repeat myself) but, correct me if I'm wrong, you can get the exact same result make a new array and copy the file from the old one to the new one. I mean, the number of disk used is the same, time spent is the same. I don't get much advantage. Did I miss something?

I do this when I migrate from my 6x2 TB array to my "new" 6x6 TB array. I degrade the old array, i create the new one degrade and then copy the files. At the end I disconect the old array and plug the new disc to recover the new RAID. ( unfortunately I didn't have enough ports on the HBA/PSU to do this without degrading arrays ).

But now we come to your suggestions:
Quote:

1. Use only one partition on a drive as RAID volume for PD. Use LV as partitions instead. Creating multiple RAIDs on same drive is pointless and adds unnecessary complicity.
The main reason for the second partition, and for the second array, is simply to reuse the space left free at the end of the disk. I WANT to leave that free space to be on the safe side. Reuse it and just one way to not waste it completely

Maybe I could leave a less than 1% planed but I want to be safe. At the begining I had planned to lose 1% for raid safe and 1% for SNAPSHOT data but then I thought of making them share and reuse the space that would otherwise be thrown away.

Quote:

2. Don't use whole VG space for LV.
I want a big space for store my file. I don't want manage partition size. I want create a big /srv mount point and forget about it.

Quote:

3. You can use RAID 1 as boot when metadata ar stored at the end of drive (0.9, 1.0)
I thank you for the suggestions. In fact I already have RAID 1 v0.90 on the root partition (with boot inside) and also a RAID 1 v1.2 on /home and /root (i use the v1.2 partition capability but I'm not happy with that... next time maybe I do different way. )

Quote:

Two LV (JFS, snapshot)
Do you use JFS? What do you think about this FS?

Labinnah 06-30-2020 03:35 PM

Quote:

Originally Posted by Razziatrore (Post 6139813)
So this perfomance lost will be some percetual point or I will lost half of speed?

I have no idea. Never test it. But its probably more noticeable in random access to different places on the disk rather in linear speed.

Quote:

Originally Posted by Razziatrore (Post 6139813)
Cool. But this is why I'm always cold about LVM: Yes what you dooing is cool (I repeat myself) but, correct me if I'm wrong, you can get the exact same result make a new array and copy the file from the old one to the new one. I mean, the number of disk used is the same, time spent is the same. I don't get much advantage. Did I miss something?

Fist RAID 10 use 8 disk RAID 6 use 6 disk. This free me 2 to other purposes.
Second RAID 10 can fail half drives but this must be specific ones not any of them. So 2 disk fails may fail RAID 10 and RAID 6 can fail 2 drives safely. And lets say, drives I have to used was not reliable at all...


Quote:

Originally Posted by Razziatrore (Post 6139813)
I do this when I migrate from my 6x2 TB array to my "new" 6x6 TB array. I degrade the old array, i create the new one degrade and then copy the files. At the end I disconect the old array and plug the new disc to recover the new RAID. ( unfortunately I didn't have enough ports on the HBA/PSU to do this without degrading arrays ).

I do this online. Server was up and running. Disk are the same disk. All data was copied hidden from userspace - no stops for syncing data between old and new RAID, even no stops for remounting.


Quote:

Originally Posted by Razziatrore (Post 6139813)
The main reason for the second partition, and for the second array, is simply to reuse the space left free at the end of the disk. I WANT to leave that free space to be on the safe side. Reuse it and just one way to not waste it completely

I don't think leaving so much space that creating reasonable data partitions have sense. IMHO drive can have different sizes in matter of single MB not tens of GB. 1GB space for swap partition is IMHO safe enough. (BTW. don't use RAID for swap - its too slow).

Quote:

Originally Posted by Razziatrore (Post 6139813)
Maybe I could leave a less than 1% planed but I want to be safe. At the begining I had planned to lose 1% for raid safe and 1% for SNAPSHOT data but then I thought of making them share and reuse the space that would otherwise be thrown away.


I want a big space for store my file. I don't want manage partition size. I want create a big /srv mount point and forget about it.

If you want do it in this way, you need to create separate VG for the last 1%. This give you 2 times 1LV on 1VG on 1PV, both using all available space. In this case using LVM is pointless because you have no LVM benefits at all, and you get all its drawbacks. (It can be done 2LV on 1VG on 2PV but will be tricky as you must force one LV on one PV, and this still has no benefits when use all PV free space).



Quote:

Originally Posted by Razziatrore (Post 6139813)
Do you use JFS? What do you think about this FS?

It was to exotic to me, I've always use extX or reiser. I think you are not keen to partition resize since AFAIK JFS in poor in this case.

Maybe you should look at BTRFS? It has some kind of volume management implemented on filesystem level.

Richard Cranium 06-30-2020 03:35 PM

I've used LVM on top of raid for over a decade, I believe.

(As it so happens, LVM can also provide RAID functionality but I have not explored that at all.)

In my upgrades, I've created new RAID arrays, created new PVs out of them, added the new PV to the volume group, and then told LVM to move everything off the old PV.

All of that happens while the system is running; no down time at all and no worries about missing information from the old PV because people were modifying the logical volumes while all of this was going on.

Once the move is complete, remove the old PV from the volume group, and do whatever you want to do with the old array.

Not to mention that if you are making backups, you can create a snapshot LV and take your backup from *that*. Just get rid of the snapshot LV when you are done, because it takes some resources to maintain it.

Since it is very easy to extend a logical volume, there's no need to allocate all of your space in a volume group in the beginning. You may want a new mount point in the future; you may want a given logical volume to max out at a given size to simplify your backup strategy.

In my case...
Code:

0 ✓ cranium ~ # pvs
  PV        VG      Fmt  Attr PSize  PFree 
  /dev/md127 mdgroup lvm2 a--  929.91g 665.94g
  /dev/md2  mdgroup lvm2 a--  929.81g      0
0 ✓ cranium ~ # vgs
  VG      #PV #LV #SN Attr  VSize VFree 
  mdgroup  2  37  1 wz--n- 1.82t 665.94g
0 ✓ cranium ~ # lvs
  LV          VG      Attr      LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  archstore  mdgroup -wn-ao----  4.00g                                                   
  dockerlv    mdgroup -wi-ao----  6.00g                                                   
  extras      mdgroup -wn-a-----  4.31g                                                   
  extraslv    mdgroup -wi-ao----  4.50g                                                   
  homelv      mdgroup -wi-ao---- 195.00g                                                   
  java        mdgroup -wi-ao----  12.50g                                                   
  junk        mdgroup -wi-ao----  58.00g                                                   
  lawsuitlv  mdgroup -wi-a-----  40.00g                                                   
  libvirtlv  mdgroup -wi-ao---- 291.00g                                                   
  lubuntulv  mdgroup -wi-a-----  20.00g                                                   
  lllllv      mdgroup -wi-a-----  32.00g                                                   
  lxcvol      mdgroup owi-aos--- 100.00g                                                   
  lxcvol_snap mdgroup swi-a-s---  1.00g      lxcvol 18.86                                 
  mcplv      mdgroup -wn-a-----  3.47g                                                   
  mongostore  mdgroup -wi-ao----  80.00g                                                   
  musiclv    mdgroup -wn-a-----  19.00g                                                   
  newhome    mdgroup -wn-a-----  9.50g                                                   
  newhomelv  mdgroup -wi-ao----  10.00g                                                   
  newmusiclv  mdgroup -wi-ao----  20.00g                                                   
  newroot    mdgroup -wi-ao----  30.00g                                                   
  opt        mdgroup -wn-ao----  17.50g                                                   
  pgsqllv    mdgroup -wi-ao----  2.00g                                                   
  qemulv      mdgroup -wi-ao----  47.00g                                                   
  rootlv      mdgroup -wi-ao----  7.00g                                                   
  sbolv      mdgroup -wi-a----- 512.00m                                                   
  slackroll  mdgroup -wi-ao----  4.00g                                                   
  slaptlv    mdgroup -wn-ao----  7.47g                                                   
  sourcelv    mdgroup -wi-ao----  28.00g                                                   
  swaplv      mdgroup -wi-ao----  32.00g                                                   
  tacllv      mdgroup -wi-ao----  2.00g                                                   
  testroot    mdgroup -wi-a-----  5.00g                                                   
  tmplv      mdgroup -wi-ao----  8.00g                                                   
  usr        mdgroup -wi-ao----  30.62g                                                   
  usrlocal    mdgroup -wi-ao----  12.41g                                                   
  varlv      mdgroup -wi-ao----  16.00g                                                   
  winimage    mdgroup -wi-ao----  30.00g                                                   
  xconqlv    mdgroup -wi-ao----  4.00g

On a different machine, I've got (some logical volumes renamed to prevent any personal information leaking)...
Code:

root@gateway:~# pvs
  PV        VG    Fmt  Attr PSize  PFree 
  /dev/md124 mdvg  lvm2 a--    1.81t  1.77t
  /dev/md126 mdvg  lvm2 a--    2.73t  2.31t
  /dev/md127 ssdvg lvm2 a--  465.52g 425.52g
root@gateway:~# vgs
  VG    #PV #LV #SN Attr  VSize  VFree 
  mdvg    2  35  0 wz--n-  4.54t  4.08t
  ssdvg  1  2  0 wz--n- 465.52g 425.52g
root@gateway:~# lvs
  LV            VG    Attr      LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  meeblelv      mdvg  -wi-a-----  23.00g                                                   
  backuplv      mdvg  -wi-a-----  55.00g                                                   
  burnlv        mdvg  -wn-a-----  15.50g                                                   
  cachelv      mdvg  -wn-ao----  3.00g                                                   
  homelv        mdvg  -wn-ao----  14.47g                                                   
  irclv        mdvg  -wi-a-----  1.00g                                                   
  meepleslv    mdvg  -wi-ao----  20.00g                                                   
  log64lv      mdvg  -wi-ao----  2.00g                                                   
  loglv        mdvg  -wn-a-----  1.00g                                                   
  xxxxmaillv    mdvg  -wn-a-----  12.00g                                                   
  yyyymaildirlv mdvg  -wi-a-----  11.00g                                                   
  yyyymaillv    mdvg  -wn-ao----  2.00g                                                   
  namedlv      mdvg  -wn-ao---- 256.00m                                                   
  newRaidLv    mdvg  -wi-ao----  41.00g                                                   
  nexuslv      mdvg  -wi-a-----  1.00g                                                   
  opt64lv      mdvg  -wi-ao----  3.00g                                                   
  optlv        mdvg  -wn-a-----  3.00g                                                   
  photolv      mdvg  -wi-ao----  12.00g                                                   
  postgreslv    mdvg  -wi-ao----  90.00g                                                   
  raidBackup    mdvg  -wi-a-----  31.00g                                                   
  raidlv        mdvg  -wn-a-----  25.00g                                                   
  root64lv      mdvg  -wi-ao----  15.00g                                                   
  rootlv        mdvg  -wn-a-----  5.00g                                                   
  sambalv      mdvg  -wn-ao----  44.00g                                                   
  sbolv        mdvg  -wi-a----- 256.00m                                                   
  slackrolllv  mdvg  -wi-a-----  1.00g                                                   
  slaptlv      mdvg  -wn-a-----  1.47g                                                   
  spoollv      mdvg  -wn-ao----  2.00g                                                   
  statelv      mdvg  -wn-ao---- 256.00m                                                   
  testlv        mdvg  -wn-a-----  6.00g                                                   
  tmplv        mdvg  -wi-ao----  2.00g                                                   
  usr64lv      mdvg  -wi-a-----  9.00g                                                   
  usrlv        mdvg  -wn-a-----  8.00g                                                   
  var64lv      mdvg  -wi-ao----  6.00g                                                   
  varlv        mdvg  -wn-a-----  2.00g                                                   
  xxxxmaillv    ssdvg -wi-ao----  20.00g                                                   
  xxxxmaildirlv ssdvg -wi-ao----  20.00g

lvextend will make your life simple.

Olek 06-30-2020 04:29 PM

Quote:

Originally Posted by Razziatrore (Post 6139813)
Hi Labinnah,

Maybe I could leave a less than 1% planed but I want to be safe. At the begining I had planned to lose 1% for raid safe and 1% for SNAPSHOT data but then I thought of making them share and reuse the space that would otherwise be thrown away.

AFAIR When You create shapshot of LV, it must be created on LV which is on the same VG. So, making separate RAID for snapshot is pointless, when You finally have to add it to the same VG as source LV.
I used just single small SSD to do snapshot LV. And my /home is on LV made of PV which is RAID1.

Razziatore 07-01-2020 01:30 AM

Quote:

Originally Posted by Labinnah (Post 6139873)
If you want do it in this way, you need to create separate VG for the last 1%. This give you 2 times 1LV on 1VG on 1PV, both using all available space. In this case using LVM is pointless because you have no LVM benefits at all, and you get all its drawbacks. (It can be done 2LV on 1VG on 2PV but will be tricky as you must force one LV on one PV, and this still has no benefits when use all PV free space).

My plan is using 1LV on 1VG on 1PV. I will add the 2nd PV only for the snapshot. I need it because i can do snapshot ouside the VG (Unfortunately). I don't like the dynamism of LVM. It is the reason why I never used it, I always found it useless. The only functionality I want are snapshots to be able to make backups.

Quote:

Originally Posted by Labinnah (Post 6139873)
It was to exotic to me, I've always use extX or reiser. I think you are not keen to partition resize since AFAIK JFS in poor in this case.

In the past I use RaiserFS but after I learned of the fate of Hans Reiser I preferred to change.

Quote:

Originally Posted by Labinnah (Post 6139873)
Maybe you should look at BTRFS? It has some kind of volume management implemented on filesystem level.

My heart has been kidnapped by ZFS but after oracle doesn't develop it anymore I feel like it is a dead FS. I know there is the opensoruce version but... not to mention the licensing problems.

For BTRFS... is not production ready. And RHEL have removed from version 8 after initial support with version 6. This isn't good.

Razziatore 07-01-2020 01:42 AM

Quote:

Originally Posted by Richard Cranium (Post 6139874)
I've used LVM on top of raid for over a decade, I believe.

Hi Richard thanks for you replay :)

Quote:

Originally Posted by Richard Cranium (Post 6139874)
(As it so happens, LVM can also provide RAID functionality but I have not explored that at all.)

Yes I know but my research showed that it is buggy (I don't know if it's true or not)

Quote:

Originally Posted by Richard Cranium (Post 6139874)
Not to mention that if you are making backups, you can create a snapshot LV and take your backup from *that*. Just get rid of the snapshot LV when you are done, because it takes some resources to maintain it.

This is exacly what i'm seraching!

Quote:

Originally Posted by Richard Cranium (Post 6139874)
Since it is very easy to extend a logical volume, there's no need to allocate all of your space in a volume group in the beginning. You may want a new mount point in the future; you may want a given logical volume to max out at a given size to simplify your backup strategy.

I know this but i don't want this effort. For work in the past I have encountered several times virtual machines with, I don't know..., let's say 60 GB of which only 40% was usable with philosophy "in case of need extend the LV a little bit".

As I say I prefer have a big partition (at the moment 24TB) to store my files. I don't want the extra work to say "oh i fill my space here... extend it" and after a bit "oh i fill my space there... extend it". I prefer use directory instead of partitions.

what is the advantage of having so many LVs?

Razziatore 07-01-2020 01:48 AM

Quote:

Originally Posted by Olek (Post 6139887)
AFAIR When You create shapshot of LV, it must be created on LV which is on the same VG. So, making separate RAID for snapshot is pointless, when You finally have to add it to the same VG as source LV.

I used just single small SSD to do snapshot LV. And my /home is on LV made of PV which is RAID1.

Hi Olek :) Yes I know it must be in the same VG but I want a seperate raid so I can add it to an other VG.

Now I have 2 main RAIDs for data. The md1 for /home and md2 for /srv. With a third raid (say md3) I can add it to VG of md1 do the backup of /home, remove it and add it to VG of md2 and do the backup of /srv.

I don't think there are downside.

Labinnah 07-01-2020 02:57 AM

Quote:

Originally Posted by Razziatrore (Post 6139994)
My plan is using 1LV on 1VG on 1PV. I will add the 2nd PV only for the snapshot. I need it because i can do snapshot ouside the VG (Unfortunately). I don't like the dynamism of LVM. It is the reason why I never used it, I always found it useless. The only functionality I want are snapshots to be able to make backups.

I thought that you want to use some kind of JFS snapshots capability (is there any?) not LVM one. Now it have sense to me. However adding single partitions as PV instead of RAID is more logical - you have more space for it. And since this is not crucial for you, more space mean you can make more snapshots, or single bigger one.


Quote:

Originally Posted by Razziatrore (Post 6139994)
In the past I use RaiserFS but after I learned of the fate of Hans Reiser I preferred to change.

Most do from this reason. I resigned from it, due to lack of active development, and major drawback of "rebuild tree" fsck (lots of orphaned trees from previous "rebuilds" appear in lost+found).

Razziatore 07-01-2020 04:45 AM

Quote:

Originally Posted by Labinnah (Post 6140017)
I thought that you want to use some kind of JFS snapshots capability (is there any?) not LVM one. Now it have sense to me.

Unfortunately not.From what i know (Comparison_of_file_systems (wikipedia)) there is only two mainstream FS that have snapshot build-in. One is ZFS but as I says it not a valid route. The other is BTRFS which is not valid route even.

Quote:

Originally Posted by Labinnah (Post 6140017)
However adding single partitions as PV instead of RAID is more logical

Can you explain yourself better? I don't think I understand what you mean ...

Quote:

I resigned from it, due to lack of active development, and major drawback of "rebuild tree" fsck (lots of orphaned trees from previous "rebuilds" appear in lost+found).
theoretically there is Reiser4 but it is not that it gives me much confidence ... it too is no longer developed by the failure of Namesys (2008).

Labinnah 07-01-2020 04:53 AM

Quote:

Originally Posted by Razziatrore (Post 6140063)
Can you explain yourself better? I don't think I understand what you mean ...

Instead of adding mdXXX (created from sdaY, sdbY,...) as PV add all single sdaY, sdbY,... as PVs.

Richard Cranium 07-01-2020 10:28 AM

Quote:

Originally Posted by Razziatore (Post 6139997)
what is the advantage of having so many LVs?

Well, for one thing, I may not want whatever writes to the LV to eat up all of my disk space. I can (and have) used different file systems on different LVs depending upon the needs of whatever uses that space.

BTW, I'll point out that you need enough free space in your PVs to handle the expected updates when you create a snapshot volume. (More precisely, the old version of the block that was changed needs to stick around for the snapshot to access. That's the other reason to remove the snapshot volume when you are done; that behind the scenes block duplication will continue while the snapshot is active.)

Razziatore 07-02-2020 12:50 PM

Quote:

Originally Posted by Labinnah (Post 6140067)
Instead of adding mdXXX (created from sdaY, sdbY,...) as PV add all single sdaY, sdbY,... as PVs.

Isn't this a little dangerous? What happens to the snapshot if a disk die?

Razziatore 07-02-2020 01:03 PM

Quote:

Originally Posted by Richard Cranium (Post 6140175)
Well, for one thing, I may not want whatever writes to the LV to eat up all of my disk space. I can (and have) used different file systems on different LVs depending upon the needs of whatever uses that space.

Okay yes I can see the some of benefit of this approch but... I just can't fall in love with this lifestyle. :(

It seems to me too much work and too much effort for a simple advantage. For example, I saw an article that said create one LV for each VM and assign direct control to the VMs. Okay I can see the benefits. But it seems to me an excessive optimization...

My fault.

Quote:

BTW, I'll point out that you need enough free space in your PVs to handle the expected updates when you create a snapshot volume.
Yes for this i want to reserve about 1% for snapshot. When I do the backup I will add this space to VG and when I have done I will delete it.


Quote:

(More precisely, the old version of the block that was changed needs to stick around for the snapshot to access. That's the other reason to remove the snapshot volume when you are done; that behind the scenes block duplication will continue while the snapshot is active.)
The Old Block? I thought that in the original LV the data was not changed and that the changes were saved in the snapshot? Is it exactly the opposite? Is the data immediately saved on the LV and the original data is saved in the Snapshot before modify it?

Interesting.


All times are GMT -5. The time now is 03:10 AM.