root filesystem LV disappeared during power failure - server will not boot
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
root filesystem LV disappeared during power failure - server will not boot
This system has two drives in software RAID-1 mirror. Boot is on /dev/md0 and the root filesystem is an LV in /dev/Volume00. After an extended power outage which outlasted the UPS, the machine crashed. Upon bootup everything looks normal until it's time to mount the root filesystem. The error message is something like "failed to mount root filesystem /dev/Volume00/RootVol on /mnt. No such filesystem." It drops down to a command prompt, and I can see and mount any of the other filesystems in /dev/Volume00... there are 4 other filesystems. it's like /dev/Volume00/RootVol disappeared. It doesn't show up in /dev/Volume00 or /dev/mapper at all. Where does the OS get the list of mountable LVM volumes from at bootup, is it metadata on the disk?
There are no LVM tools in my /boot partition and this is an old machine which cannot boot via USB. I'm currently downloading an ISO of Knoppix so I can have access to tools. It is a testament to the stability of linux that it's been doing it's thing virtually unmaintained and unattended for the past 5 years without a single issue (it serves multiuser accounting and reporting software).
What I'm thinking of doing is retrieving /etc/lvm/backup-Volume00 off the most recent tape backup to one of the mountable volumes in Volume00 (I have no non-RAID non-LVM volumes available) and using vgcfgrestore from the Knoppix distro... hopefully the hard drives all mount under knoppix with the correct device numbers/names.
I'm wondering if anyone can comment on this strategy, I have full tape backups of the root filesystem but am unsure as to how exactly I'd create the MD/PV/LV structure I need to restore onto from scratch... would vgcfgrestore do everything I need? I don't really want to go this route if I don't have to...
The list of volume groups to activate and the identity of the root filesystem is passed by GRUB in the kernel command line. The exact format varies somewhat among releases (it's all processed by scripts in the initrd, so anything is possible), but if you look at the GRUB menu it should be apparent.
Restoring the LVM configuration for that VG should be a sound strategy, but you might want to save a vgcfgbackup file so that you can get back to what you have now if something goes awry.
The list of volume groups to activate and the identity of the root filesystem is passed by GRUB in the kernel command line. The exact format varies somewhat among releases (it's all processed by scripts in the initrd, so anything is possible), but if you look at the GRUB menu it should be apparent.
Restoring the LVM configuration for that VG should be a sound strategy, but you might want to save a vgcfgbackup file so that you can get back to what you have now if something goes awry.
I have only one volume group, it's being activated as I can access other logical volumes within the group, and the kernel is receiving the proper device to mount (/dev/Volume00/RootVol) but is unable to find/mount it so it sounds like everything's working as it should. I'll try restoring the volume group config and see if the root LV will be found.
If restoring the volume group config does not work, is there a "how-to" for disaster recovery of a RAID/LVM machine? I am thinking if it's only my root filesystem LV that's missing, I can manually re-create an LV with the same name, mount it, restore my "/" filesystem to it, and all should work... the boot stuff is the hard part and that part seems to be working...
And, will the knoppix distro recognize the md/lvm volumes without manual intervention? Guess I'll find out.
More detail in this (closed) thread.
That array is degraded - one device is missing. Presumably missing metadata if fdisk sees both disks - let's see
Code:
lsblk -f
I would expect some messages. You can force the array to assemble and activate degraded by using "--run" - presumably systemrescueCD does this, and your initrd doesn't. This probably should be in the Slackware forum as they will be better aware of what is in the initrd. Hit the "Request" button on your initial post and ask to get it moved.
I booted a Live USB system recovery CD. Right away I could see that RootVol showed up (the logical volume that does not exist when I try to boot normally), was mountable and looks fine. So I started looking at the raid array.
Not what I expected. My two hard drives are /dev/sda and /dev/sdb, no errors in /var/log/messages about them although I have no ability to tweak loglevels in the Live CD version I am running. Why does /proc/mdstat not show actual devices? What are /dev/dm-# devices?
Output of mdadm --detail /dev/md0 is:
root@sysresccd /mnt/rootvol/etc % dmadm -D /dev/md0
zsh: correct 'dmadm' to 'mdadm' [nyae]? y
/dev/md0:
Version : 0.90
Creation Time : Thu Dec 3 11:53:48 2009
Raid Level : raid1
Array Size : 488287488 (465.67 GiB 500.01 GB)
Used Dev Size : 488287488 (465.67 GiB 500.01 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun Jul 10 12:00:57 2016
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 253 1 0 active sync /dev/dm-1
2 0 0 2 removed
I am guessing that either a) I have a failed disk or b) the array /dev/md0 is not synched, maybe thinks a disk has failed?
At any rate, the machine definitely will not boot from this state, and I can't figure out which, if any, of my hard disks are the problem, nor how to fix this mess. This is a production server with full backups... I could rebuild it, but really woudl rather not as it's a pretty tedious process... there's nothing wrong with the data nor, I'm guessing, either of the disks.
There is no mdadm.conf.
fdisk -l shows both disks as Linux Raid Autodetect, everything looks normal.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.