Linux LVM & Bad superBlock(s)
Hi Guys,
This isnt too much of a question of how to resolve any kind of issue but more of a question to try to understand LVM better. I understand what LVM is, does, and generally how it works and how to use it. I had a server with a Logical Volume Group (LVG_STORE) with three physical volumes inside of it with a single Logical Volume (LV_OPT). Recently though, one of these three disks failed which lead to a number of issues. Some more background information: - Server had a total of 4 disks - 1 disk -- sda -- two partitions -- one for root and one for swap - 3 disks -- sdb, sdc, and sdd (143GB 3gb/s SAS disks) in the Logical Volume - sdb failed - The RAID disk controller reported all disks OK - Uncommenting it in /etc/fstab allowed me to mount to it manually and read certain parts but a specific directory (probably that bad block on the disk) would always cause the disk to throw an error and subsequently be unmountable When I uncommented it from fstab and was able to boot and do some manual work on it and copy some data, I got this error Code:
Feb 28 10:43:18 demo1 kernel: Aborting journal on device dm-0. Code:
Feb 28 10:43:18 demo1 kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Now, I know its pretty apparent that my drive died but here is my question-- It was evidently clear that sdb was the issue but for some reason the entire Logical Volume was unable to be mounted once the disk through an error. I tried everything from finding the backup superblocks and trying e2fsck -b on /dev/mapper/LVG_STORE-LV_OPT to backing up as much data as I could before I hit "X" amount of reads or before I hit that specific block on the disk and the error was thrown resulting in me unable to do anything else. Why does a single disk failing cause an entire LVM to fail? This seems a little backwards unless I am missing something? Any information would help. Thank you. |
Forget about the pv's - from a filesystem perspective, the lv is the equivalent of a partition in the non-LVM world.
Once it's broken, it's broken. There ain't any substitute for backups. |
Quote:
So is that the equivalent of saying that if a single disk goes back in a LVM of three disks that just about the entire LVM is unusable because of that one block? Also, the only way I was able to use the other two disks again was to delete the entire LVG of LVG_STORE and then re-create it with just the two good disks-- was this incorrect or was there another way that I may have missed? |
Theoretically you're supposed to be able to replace the failed drive(s) if you have access to the meta-data (UUID primarily).
Mirrored RAID would be a possible solution. I don't like LVM and don't use it except under orders. Others will hopefully give you better answers - I'll see what I can find as well. |
Hello,
In my opinion you're confusing or mixing LVM with RAID. LVM by default doesn't provide any data redundancy or fault tolerance for as far as I know. Of course it provides possibilities and functionalities like snapshot or mirroring to provide the necessary 'backup' or redundancy. The way you set it up, having three disks in one VG and one LV in that VG that holds all the space, without any RAID or mirroring to other physical devices, I think you're out of luck. I sincerely hope for you that I'm wrong and that someone will come along with a solution. I for one would copy that solution in my personal wiki for future use. Kind regards, Eric |
Quote:
I didnt have a spare SAS disk so I didnt even try (you could be completely right)-- the box was just a demo box and it was a good 'test' place to have an issue with LVM. Quote:
I am not confusing them-- I know the difference between RAID (hardware & software) and an LVM. The LVM journal was recovered when the disk had first failed and I restarted the server and the data/journal was about 3 months old (still completely usable). It wasnt until recently that the disk became worse and would error out much quicker. I do believe you are correct in that since I did not provide any RAID behind the VLM (ideal would have been 6 disks in a RAID 1 -- resulting in 3 usable disks which was served in the LVG) that I am out of luck. Yes, I have stored this in my memory bank of future use :)-- if one physical disk fails in an LVG that the entire LVM becomes unusable which seems correct now because the entire LVM is presented as a block device to the OS (which explains why at a specific file/directory the disk would error out and unmount). Thanks for all the responses though if anything is wrong/incorrect or I have mis-worded something, it would be great for someone to correct me. Thanks! EDIT: Guess I should have done this -- http://www.google.com/search?sourcei...=LVM+snapshots |
All times are GMT -5. The time now is 09:18 PM. |