[SOLVED] SuSE Linux RAID Faulty disk replacement

LinuGeek · 01-09-2021, 06:19 AM

Hello Experts,

We have a important Database Server with SUSE Linux Enterprise Server 12. The previous admin has setup it as follows.

4 internal disks :

1+1 --RAID-1 Software RAID --> ROOT Partitions
1+1 --RAID-1 Software RAID --> Data Partitions with Database.

Root Partitions have further LVM on top of it and then sliced to have Logical volumes of /usr /boot etc.

So there are 2 Volume groups. 1 System VG and 2. Data VG.

There are 4 Disks sda+sdb and sdc+sdd

Recently we noticed that one ofthe disks out of Software RAID group System is gone bad and the server
continued to work without any problem (Thanks to RAID 1 Mirroring).
See below, 3 Software RAID partitions are marked as Failed/degraded. md0, md1 and md2.
Which are System Partitions. md3 is for database.
So sda1,sda2 and sda3

Code:

#cat /proc/mdstat
Personalities : [raid1]

md0 : active raid1 sdb1[1] sda1[0](F)	<<<<<-------------
      1051584 blocks super 1.0 [2/1] [_U]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md1 : active raid1 sdb2[1] sda2[0](F)	<<<<<-------------
      18876288 blocks super 1.0 [2/1] [_U]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid1 sdb3[1] sda3[0](F)	<<<<<-------------
      956832576 blocks super 1.0 [2/1] [_U]
      bitmap: 2/8 pages [8KB], 65536KB chunk

md3 : active raid1 sdc1[0] sdd1[1]
      976760640 blocks super 1.0 [2/2] [UU]
      bitmap: 2/8 pages [8KB], 65536KB chunk

unused devices: <none>

We have to replace the faulty disk (sda) so that it builds back the original structure.

I have come up with following plan. Please suggest modifications.

1. Shutdown the server that will eventually also take down the database.
2. Take out the faulty disk
3. Replace with new one
4. And restart the server
5. Auto-Build process of mirroring the new disk from the existing one should start.

This sounds more of an automated process.

If this does not work then we can manually do few more steps.

Quote:

Question can we do this on existing runlevel without any problem??

1. Mark the disk as failed if it is not already marked F by the system.

Code:

# mdadm --manage /dev/md0 --fail /dev/sda1
# mdadm --manage /dev/md1 --fail /dev/sda2
# mdadm --manage /dev/md2 --fail /dev/sda3

To verify that the disk is failed, check /proc/mdstat:

2. Remove the disk by mdadm

Code:

# mdadm --manage /dev/md0 --remove /dev/sda1
# mdadm --manage /dev/md1 --remove /dev/sda2
# mdadm --manage /dev/md2 --remove /dev/sda3

3. Replace the disk

Quote:

Question how to identify the faulty disk??

4. Copy the partition table to the new disk
(Caution: This sfdisk command will replace the entire partition table on the target disk with that of the source disk – use an alternative command if you need to preserve other partition information)

Code:

# sfdisk -d /dev/sdb | sfdisk /dev/sda

5. Create the mirror of the disk:

Code:

# mdadm --manage /dev/md0 --add /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sda2
# mdadm --manage /dev/md2 --add /dev/sda3

6. To test the setup, enter the below command:

Code:

# /sbin/mdadm --detail /dev/md0

The following command will show the current progress of the recovery of the mirror disk:

Code:

7.# cat /proc/mdstat

System backup is in place.
Please give your valuable inputs.

Quote:

If there is any better option?

Thank you in advance.

Regards,
Admin

Ser Olmy · 01-09-2021, 08:22 AM

Quote:

Originally Posted by LinuGeek

1. Shutdown the server that will eventually also take down the database.
2. Take out the faulty disk
3. Replace with new one
4. And restart the server

If you have to. Most SATA controllers/drivers support hotplugging, but unless the drive is in a hotplug tray, you'd better shut down the server first.

Also, I'd add this step from your alternative routine:

0. Remove the failed partitions with:

Code:

mdadm --manage /dev/md0 --remove /dev/sda1
mdadm --manage /dev/md1 --remove /dev/sda2
mdadm --manage /dev/md2 --remove /dev/sda2

You should definitely do this before powering down the server.

And you will obviously also have to find a way to identify the failed drive before powering down. If it isn't obvious which drive is which, run smartctl on the working drives and make a note of the serial numbers.

Quote:

Originally Posted by LinuGeek

5. Auto-Build process of mirroring the new disk from the existing one should start.

Nope. This isn't hardware RAID.

Since the RAID components are partitions rather than drives, you'll have to create the partitions manually and then run mdadm --manage /dev/mdx --add /dev/sday for each RAID device and partition. That will start the rebuild process.

LinuGeek · 01-10-2021, 03:29 AM

Quote:

Originally Posted by Ser Olmy

Also, I'd add this step from your alternative routine:

0. Remove the failed partitions with:

Code:

mdadm --manage /dev/md0 --remove /dev/sda1
mdadm --manage /dev/md1 --remove /dev/sda2
mdadm --manage /dev/md2 --remove /dev/sda2

You should definitely do this before powering down the server.

u meant

Code:

mdadm --manage /dev/md2 --remove /dev/sda3.

??
And is it not the same as Step no. 2 in my comment? Or is the order not correct?

LinuGeek · 01-10-2021, 03:32 AM

One more question,

since I have to reboot the system couple of times and one of the systems disks sda is not in place, will it create any problem while booting the OS?

My grub is as follows,

cat /boot/grub2/grub.cfg (Relevant part only

Quote:

insmod part_msdos msdos
insmod diskfilter mdraid1x lvm
insmod ext2
set root='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX' 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c
else
search --no-floppy --fs-uuid --set=root 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c
fi
font="/share/grub2/unicode.pf2"
fi

if loadfont $font ; then
set gfxmode=auto
load_video
insmod gfxterm
set locale_dir=$prefix/locale
set lang=POSIX
insmod gettext
fi
terminal_output gfxterm
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
insmod gfxmenu
loadfont ($root)/grub2/themes/SLE/DejaVuSans-Bold14.pf2
loadfont ($root)/grub2/themes/SLE/DejaVuSans10.pf2
loadfont ($root)/grub2/themes/SLE/DejaVuSans12.pf2
loadfont ($root)/grub2/themes/SLE/ascii.pf2
insmod png
set theme=($root)/grub2/themes/SLE/theme.txt
export theme
if [ x${boot_once} = xtrue ]; then
set timeout=0
elif [ x$feature_timeout_style = xy ] ; then
set timeout_style=menu
set timeout=8
# Fallback normal timeout code in case the timeout_style feature is
# unavailable.
else
set timeout=8
fi
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/10_linux ###
menuentry 'SLES12' --class sles12 --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-690785da-f0f0-4250-b693-5a008acbba10' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
echo'Loading Linux 3.12.28-4-default ...'
linux/vmlinuz-3.12.28-4-default root=UUID=690785da-f0f0-4250-b693-5a008acbba10 resume=/dev/md1 splash=silent quiet crashkernel=232M-:116M showopts
echo'Loading initial ramdisk ...'
initrd/initrd-3.12.28-4-default
}
submenu 'Advanced options for SLES12' --hotkey=1 $menuentry_id_option 'gnulinux-advanced-690785da-f0f0-4250-b693-5a008acbba10' {
menuentry 'SLES12, with Linux 3.12.28-4-default' --hotkey=2 --class sles12 --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-3.12.28-4-default-advanced-690785da-f0f0-4250-b693-5a008acbba10' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
echo'Loading Linux 3.12.28-4-default ...'
linux/vmlinuz-3.12.28-4-default root=UUID=690785da-f0f0-4250-b693-5a008acbba10 resume=/dev/md1 splash=silent quiet crashkernel=232M-:116M showopts
echo'Loading initial ramdisk ...'
initrd/initrd-3.12.28-4-default
}
menuentry 'SLES12, with Linux 3.12.28-4-default (recovery mode)' --hotkey=3 --class sles12 --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-3.12.28-4-default-recovery-690785da-f0f0-4250-b693-5a008acbba10' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
echo'Loading Linux 3.12.28-4-default ...'
linux/vmlinuz-3.12.28-4-default root=UUID=690785da-f0f0-4250-b693-5a008acbba10 showopts apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe crashkernel=232M-:116M
echo'Loading initial ramdisk ...'
initrd/initrd-3.12.28-4-default
}
}

Secondly, I think it is also necessary to install the grub onto the new drive as shown below.

For GRUB2 running grub-install on the new drive is enough. For example:

Quote:

grub-install /dev/sda

Will it be enough then? Or am I missing something?

Thanks in advance.

Ser Olmy · 01-10-2021, 04:37 AM

Quote:

Originally Posted by LinuGeek

u meant

Code:

mdadm --manage /dev/md2 --remove /dev/sda3.

??

Indeed. A typo on my part.

Quote:

Originally Posted by LinuGeek

And is it not the same as Step no. 2 in my comment? Or is the order not correct?

You should do it before powering down and adding the new drive, that's all.

Quote:

Originally Posted by LinuGeek

since I have to reboot the system couple of times and one of the systems disks sda is not in place, will it create any problem while booting the OS?

Good point. GRUB may or may not handle booting from a mirror set, but the BIOS will default to booting from the first hard drive regardless.

If GRUB has duplicated the boot sector to both drives (and that's a big "if"), you could possibly boot directly from the second drive from the server's boot menu, or by booting from a media with a boot loader that allows booting from other drives.

Otherwise, it will be necessary to manually install GRUB to the new drive before you can boot. And you will definitely have to re-run the GRUB installer afterwards anyway.

(BTW, all this would have been unnecessary if the drives had been mounted in hotplug trays, of if this had been a hardware RAID setup or even a so-called "fakeRAID" volume.)

LinuGeek · 01-10-2021, 04:54 AM

Another thing with Swap,

RAID Device md1 is actually swap.

Quote:

#cat /proc/swaps

Filename Type Size Used Priority
/dev/md1 partition 18876284 146068 -1

So before the step 1. I should be doing,

Quote:

#swapoff /dev/md1

And then proceed with the step 1 of marking the RAID partitions as failed etc.

Later after the last step when the rebuild is complete , I have to enable the swap partition once again.

After the rebuild process is complete then the swap should be enabled on md1 Device

Quote:

#mkswap /dev/md1
#swapon -a

Is this okay??

Ser Olmy · 01-11-2021, 02:59 AM

/dev/md1 isn't going to disappear during this procedure, so I see no reason to deactivate the swap partition.

LinuGeek · 01-11-2021, 04:11 AM

Thanks for the reply Ser Olmy.

Couple of questions come to my mind,

1. Sometimes when there are 2 disks lets say sda and sdb and one of them is non-functional e.g. sda in this case, after the reboot the one which was sdb, can be identified as sda in the absence of original sda. This will definately affect the set of commands which I have prepared for.

2. As said in 1st post, there is a LVM Layer sitting under Software RAID.

The fstab file shows this,

Quote:

Filesystem Mounted on
devtmpfs /dev
tmpfs /dev/shm
tmpfs /run
tmpfs /sys/fs/cgroup

/dev/md0 /boot
/dev/mapper/system-root /
/dev/mapper/system-usr /usr
/dev/mapper/system-var /var
/dev/mapper/system-opt /opt

So system is one of the VGs present on sdb and the faulty disk sda. Does it make any difference to the set of commands then?
After the RAID is built once again, the LVMs should be back as expected?

Your thoughts on these questions please.

Thanks in advance.

Ser Olmy · 01-11-2021, 04:30 AM

Quote:

Originally Posted by LinuGeek

1. Sometimes when there are 2 disks lets say sda and sdb and one of them is non-functional e.g. sda in this case, after the reboot the one which was sdb, can be identified as sda in the absence of original sda. This will definately affect the set of commands which I have prepared for.

Yes, that is absolutely the case if you boot the server without a replacement disk.

The order in which Linux kernel enumerates drives (and devices in general) is determined by when the driver for the controller is loaded and how that driver then access the devices attached to said controller.

For SATA/SAS drives, the ports are always enumerated in the same order (which may or may not be the same order used by the motherboard BIOS), and the first drive found is assigned /dev/sda. So yes, if you simply remove the first drive and then boot the server, what used to be /dev/sdb is likely to appear as /dev/sda.

However, if you install a replacement drive and connect it to the same port, that drive will become the new /dev/sda.

To make absolutely sure you're operating on the right drive, check the make, model, and serial number with smartctl.

Quote:

Originally Posted by LinuGeek

2. As said in 1st post, there is a LVM Layer sitting under Software RAID.

That's actually an advantage.

LVM volumes are identified by metadata on the LVM partitions. Drives may appear with different device node names, but as long as pvscan finds the physical volumes at boot, everything will be identified properly.

The same goes for md devices, at least as long as /etc/mdadm.conf is accessible and hasn't been edited to contain hardcoded references to device nodes.

LinuGeek · 01-11-2021, 04:34 AM

Very well explained, thanks.

/etc/mdadm.conf is:

DEVICE containers partitions
ARRAY /dev/md0 UUID=531cd341:e2c7d5a7:1c542ad0:4d9ea589
ARRAY /dev/md1 UUID=fa4682d2:61901280:67e70eb9:c0335a53
ARRAY /dev/md2 UUID=885a178f:328855d9:beb12cf1:193904d1
ARRAY /dev/md3 UUID=a443e415:28f75b00:2fd4779f:44fdd524

I guess, it should be ok then?

Ser Olmy · 01-11-2021, 04:38 AM

Quote:

Originally Posted by LinuGeek

/etc/mdadm.conf is:

DEVICE containers partitions
ARRAY /dev/md0 UUID=531cd341:e2c7d5a7:1c542ad0:4d9ea589
ARRAY /dev/md1 UUID=fa4682d2:61901280:67e70eb9:c0335a53
ARRAY /dev/md2 UUID=885a178f:328855d9:beb12cf1:193904d1
ARRAY /dev/md3 UUID=a443e415:28f75b00:2fd4779f:44fdd524

I guess, it should be ok then?

Definitely. No matter what the devices or partitions may end up being called, the UUIDs in the RAID metadata will remain the same.

LinuGeek · 01-11-2021, 04:40 AM

Thanks a ton. Will get back for further questions if needed.

LinuGeek · 01-11-2021, 06:23 AM

Just a small query. What happens if we restart the server without doing anything. Will it come up without any problems?

Ser Olmy · 01-11-2021, 07:09 AM

Unlikely, as the defective drive is almost certainly the boot device.

If the drive is totally dead, the next drive on the controller (currently seen as /dev/sdb by the OS) will become the boot device. It probably lacks the GRUB bootloader, so the boot process will fail or hang.

If the drive has multiple bad sectors but is still running, the server will attempt to boot from it. If, by pure coincidence, none of the sectors holding the GRUB loader are bad, you may be able to successfully boot. But I wouldn't count on it.

LinuGeek · 01-11-2021, 08:02 AM

Once again good point.

Quote:

GRUB may or may not handle booting from a mirror set, but the BIOS will default to booting from the first hard drive regardless.

If we consider this, then for the system to boot, I need to first recreate the boot. In that case the order should be like this,

1. Note the serialnumbers of the disks using smartctl
2. Mark the faulty disk as failed. And remove it from mdadm RAID.
3. Shutdown the server
4. Take out the faulty disk manually and Replace with the new disk.
5. Boot the server??
This will fail as the new disk does not have any booting information on it. So I guess I will have to use Boot-CD in order to get to the Recovery mode.
The obvious question will be, will I be able to execute the next set of commands in the recovery mode? Or chroot will be required? Its getting complicated now I guess.
6. Once the raid has been reconstructed, I would boot the system normally in the hope that it boots.

Am I missing something here?