Software RAID replace failed disk . mdadm , grub and lvm ??
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Software RAID replace failed disk . mdadm , grub and lvm ??
Hello Admins,
one of the SuSE SLES 12 LinuxServers has reported disk failure. Fortunately the Database Server has Software Raid hence the system is still up and running.
But as recommended, we would like to replace the failed disk with a new one and rebuild the software raid on it.
So md0,md1 and md2 have failed devices namely sda1,sda2 and sda3
Please note that it also has 2 VGs defined as shown below,
1 VG - system (/dev/md2)
2 VG - ora_db (/dev/md3)
Quote:
# pvdisplay
--- Physical volume ---
PV Name /dev/md3
VG Name ora_db
PV Size 931.51 GiB / not usable 3.81 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 238466
Free PE 84866
Allocated PE 153600
PV UUID vgPdWQ-x6CW-vvdF-moxh-FKyb-wpSU-NdJqSm
--- Physical volume ---
PV Name /dev/md2
VG Name system
PV Size 912.51 GiB / not usable 2.81 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 233601
Free PE 182401
Allocated PE 51200
PV UUID rdff2n-ztxd-lcBY-nAqk-8O9u-fnFG-BVI91v
The grub.conf shows : (Relevant part)
Quote:
if [ x$feature_default_font_path = xy ] ; then
font=unicode
else
insmod part_msdos msdos
insmod diskfilter mdraid1x lvm
insmod ext2
set root='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX' 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c
else
search --no-floppy --fs-uuid --set=root 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c
fi
font="/share/grub2/unicode.pf2"
fi
if loadfont $font ; then
set gfxmode=auto
load_video
insmod gfxterm
set locale_dir=$prefix/locale
set lang=POSIX
insmod gettext
fi
terminal_output gfxterm
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
-------------------------------------
Quote:
The procedure to follow should go like this,
1. First we mark /dev/sda1 as failed:
mdadm --manage /dev/md0 --fail /dev/sda1
2. Then we remove /dev/sda1 from /dev/md0:
mdadm --manage /dev/md0 --remove /dev/sda1
3. Now we do the same steps again for /dev/sda2 and sda3 (which is part of /dev/md1 and /dev/md2)
4. Then power down the system:
shutdown -h now
and replace the old /dev/sdb hard drive with a new one
5. After inserting new SATA disk /dev/sda, boot the system.
6. Then we create the exact same partitioning as on /dev/sda. We can do this with one simple command:
sfdisk -d /dev/sdb | sfdisk /dev/sda
7. Check if both the disks have same partitions (fdisk -l)
8.Next we add /dev/sda1 to /dev/md0 and /dev/sda2 to /dev/md1 and /dev/sda3 to /dev/md3:
Please let me know if I have missed something. 2 important points I guess would be, how should I take care of lvm and grub in this case.
1. Do I have to do something extra to take care of it or the command sfdisk -d /dev/sdb | sfdisk /dev/sda , should take care of LVM as well.
2. How should I take care of grub in this case? As grun.conf shows entries pertaining to LVM as well as MDADM. Do I have to change anything here before I shutdown the system?
I understand the system has 2 pointers to take care of mdadm+lvm. Which have complicated things. Else would it be easier to setup completely new system??
Kindly guide me to the relevant information.
Thanks.
After rebooting the names of the disks can change depending on the order they are found by the system. Make sure you copy the correct disk. The rest looks good.
Typically mdadm is pretty robust - the initrd can be a different matter; the (re-)boot may fail if the initrd hasn't been built to handle RAID in degraded mode. Only a boot will tell in all likelihood.
As for LVM, it won't care.
After rebooting the names of the disks can change depending on the order they are found by the system. Make sure you copy the correct disk. The rest looks good.
That is one point where I am as well not sure because /etc/fstab has following,
So my concern is , after reboot , will the system correctly recognize the boot, swap and root partitions?
------------------------------------------
Thanks syg00.
Quote:
Typically mdadm is pretty robust - the initrd can be a different matter; the (re-)boot may fail if the initrd hasn't been built to handle RAID in degraded mode. Only a boot will tell in all likelihood.As for LVM, it won't care.
How can I make sure that initrd would not create a problem in this case.
LVM is missing now 1 sda disk altogether so why do you think that would not create a problem?
Quote:
# pvdisplay
/dev/sda: read failed after 0 of 4096 at 0: Input/output error
/dev/sda: read failed after 0 of 4096 at 1000204795904: Input/output error
/dev/sda: read failed after 0 of 4096 at 1000204877824: Input/output error
/dev/sda: read failed after 0 of 4096 at 4096: Input/output error
/dev/sda1: read failed after 0 of 4096 at 1076822016: Input/output error
/dev/sda1: read failed after 0 of 4096 at 1076879360: Input/output error
/dev/sda1: read failed after 0 of 4096 at 0: Input/output error
/dev/sda1: read failed after 0 of 4096 at 4096: Input/output error
/dev/sda2: read failed after 0 of 4096 at 19329384448: Input/output error
/dev/sda2: read failed after 0 of 4096 at 19329441792: Input/output error
/dev/sda2: read failed after 0 of 4096 at 0: Input/output error
/dev/sda2: read failed after 0 of 4096 at 4096: Input/output error
/dev/sda3: read failed after 0 of 4096 at 979796688896: Input/output error
/dev/sda3: read failed after 0 of 4096 at 979796746240: Input/output error
/dev/sda3: read failed after 0 of 4096 at 0: Input/output error
/dev/sda3: read failed after 0 of 4096 at 4096: Input/output error
--- Physical volume ---
PV Name /dev/md3
VG Name ora_db
PV Size 931.51 GiB / not usable 3.81 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 238466
Free PE 84866
Allocated PE 153600
PV UUID vgPdWQ-x6CW-vvdF-moxh-FKyb-wpSU-NdJqSm
--- Physical volume ---
PV Name /dev/md2
VG Name system
PV Size 912.51 GiB / not usable 2.81 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 233601
Free PE 182401
Allocated PE 51200
PV UUID rdff2n-ztxd-lcBY-nAqk-8O9u-fnFG-BVI91v
Your LVM is built on top of mdadm - LVM neither knows nor cares of the physical devices; it cares about /dev/md[012]. Those messages are from layers below mdadm.
The pv size (of RAID1) is unchanged by losing 1 device, so LVM carries on regardless. Likewise filesystem UUIDs are not dependent on adding new device(s) to the array.
fstab having UUIDs is the right way to do it. The device names can change but the UUID won't.
Thanks for clarification.
I have one more query. Is it required to install grub on the non-failed disk? One of the Instructions states:
Quote:
Grub should be installed on another hard disk, so the system can still boot with the primary boot device removed. If this is not completed you will need to have recovery media available to boot from.
Launch grub as the root user.
From the grub shell run the following commands to install grub onto the disk /dev/sdb
grub> device (hd0) /dev/sdb # maps /dev/sdb onto the "hd0" label (temporary, lasts until you quit GRUB)
grub> root (hd0,0) # tell GRUB where to find the root of the filesystem (and thus /boot)
grub> setup (hd0) # installs GRUB to hd0 aka. /dev/sdb
grub> quit
You now have grub installed onto /dev/sdb. If you need to boot of this disk, you will need to set it as the primary boot drive in the bios or boot menu prompt.
Is it required in our case or not?
Thanks in advance.
I did run the command (# dd bs=512 count=1 if=/dev/sda 2>/dev/null| strings) for /dev/sdb to check if grub is installed on the disk /dev/sdb. But it did not give any results. So looks like grub is not installed on sdb. Or am I missing something. If it is really not installed , as per your recommendations , before rebooting this host I should run grub-install /dev/sdb .. is this a safe command? I mean now the system is up and running and I do not wish to disturb it until I have a disk or system replacement plan.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.