Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
We have a important Database Server with SUSE Linux Enterprise Server 12. The previous admin has setup it as follows.
4 internal disks :
1+1 --RAID-1 Software RAID --> ROOT Partitions
1+1 --RAID-1 Software RAID --> Data Partitions with Database.
Root Partitions have further LVM on top of it and then sliced to have Logical volumes of /usr /boot etc.
So there are 2 Volume groups. 1 System VG and 2. Data VG.
There are 4 Disks sda+sdb and sdc+sdd
Recently we noticed that one ofthe disks out of Software RAID group System is gone bad and the server
continued to work without any problem (Thanks to RAID 1 Mirroring).
See below, 3 Software RAID partitions are marked as Failed/degraded. md0, md1 and md2.
Which are System Partitions. md3 is for database.
So sda1,sda2 and sda3
We have to replace the faulty disk (sda) so that it builds back the original structure.
I have come up with following plan. Please suggest modifications.
1. Shutdown the server that will eventually also take down the database.
2. Take out the faulty disk
3. Replace with new one
4. And restart the server
5. Auto-Build process of mirroring the new disk from the existing one should start.
This sounds more of an automated process.
If this does not work then we can manually do few more steps.
Quote:
Question can we do this on existing runlevel without any problem??
1. Mark the disk as failed if it is not already marked F by the system.
4. Copy the partition table to the new disk
(Caution: This sfdisk command will replace the entire partition table on the target disk with that of the source disk – use an alternative command if you need to preserve other partition information)
You should definitely do this before powering down the server.
And you will obviously also have to find a way to identify the failed drive before powering down. If it isn't obvious which drive is which, run smartctl on the working drives and make a note of the serial numbers.
Quote:
Originally Posted by LinuGeek
5. Auto-Build process of mirroring the new disk from the existing one should start.
Nope. This isn't hardware RAID.
Since the RAID components are partitions rather than drives, you'll have to create the partitions manually and then run mdadm --manage /dev/mdx --add /dev/sday for each RAID device and partition. That will start the rebuild process.
since I have to reboot the system couple of times and one of the systems disks sda is not in place, will it create any problem while booting the OS?
My grub is as follows,
cat /boot/grub2/grub.cfg (Relevant part only
Quote:
insmod part_msdos msdos
insmod diskfilter mdraid1x lvm
insmod ext2
set root='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='lvmid/m7AEp0-79EG-D2Vi-ELzE-BTzh-C8mN-CLxrpz/S0eZEl-PlBX-E1ZL-oCwL-SmUx-4Qe4-Mz9NHX' 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c
else
search --no-floppy --fs-uuid --set=root 7c2e3a9c-5f5b-47e3-8a0a-d1e66f12747c
fi
font="/share/grub2/unicode.pf2"
fi
if loadfont $font ; then
set gfxmode=auto
load_video
insmod gfxterm
set locale_dir=$prefix/locale
set lang=POSIX
insmod gettext
fi
terminal_output gfxterm
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
insmod gfxmenu
loadfont ($root)/grub2/themes/SLE/DejaVuSans-Bold14.pf2
loadfont ($root)/grub2/themes/SLE/DejaVuSans10.pf2
loadfont ($root)/grub2/themes/SLE/DejaVuSans12.pf2
loadfont ($root)/grub2/themes/SLE/ascii.pf2
insmod png
set theme=($root)/grub2/themes/SLE/theme.txt
export theme
if [ x${boot_once} = xtrue ]; then
set timeout=0
elif [ x$feature_timeout_style = xy ] ; then
set timeout_style=menu
set timeout=8
# Fallback normal timeout code in case the timeout_style feature is
# unavailable.
else
set timeout=8
fi
### END /etc/grub.d/00_header ###
### BEGIN /etc/grub.d/10_linux ###
menuentry 'SLES12' --class sles12 --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-690785da-f0f0-4250-b693-5a008acbba10' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
echo'Loading Linux 3.12.28-4-default ...'
linux/vmlinuz-3.12.28-4-default root=UUID=690785da-f0f0-4250-b693-5a008acbba10 resume=/dev/md1 splash=silent quiet crashkernel=232M-:116M showopts
echo'Loading initial ramdisk ...'
initrd/initrd-3.12.28-4-default
}
submenu 'Advanced options for SLES12' --hotkey=1 $menuentry_id_option 'gnulinux-advanced-690785da-f0f0-4250-b693-5a008acbba10' {
menuentry 'SLES12, with Linux 3.12.28-4-default' --hotkey=2 --class sles12 --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-3.12.28-4-default-advanced-690785da-f0f0-4250-b693-5a008acbba10' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
echo'Loading Linux 3.12.28-4-default ...'
linux/vmlinuz-3.12.28-4-default root=UUID=690785da-f0f0-4250-b693-5a008acbba10 resume=/dev/md1 splash=silent quiet crashkernel=232M-:116M showopts
echo'Loading initial ramdisk ...'
initrd/initrd-3.12.28-4-default
}
menuentry 'SLES12, with Linux 3.12.28-4-default (recovery mode)' --hotkey=3 --class sles12 --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-3.12.28-4-default-recovery-690785da-f0f0-4250-b693-5a008acbba10' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos msdos
insmod diskfilter mdraid1x
insmod ext2
set root='mduuid/531cd341e2c7d5a71c542ad04d9ea589'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='mduuid/531cd341e2c7d5a71c542ad04d9ea589' 96c11697-c3b7-4f11-90fc-3aef207db526
else
search --no-floppy --fs-uuid --set=root 96c11697-c3b7-4f11-90fc-3aef207db526
fi
echo'Loading Linux 3.12.28-4-default ...'
linux/vmlinuz-3.12.28-4-default root=UUID=690785da-f0f0-4250-b693-5a008acbba10 showopts apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe crashkernel=232M-:116M
echo'Loading initial ramdisk ...'
initrd/initrd-3.12.28-4-default
}
}
Secondly, I think it is also necessary to install the grub onto the new drive as shown below.
For GRUB2 running grub-install on the new drive is enough. For example:
Quote:
grub-install /dev/sda
Will it be enough then? Or am I missing something?
And is it not the same as Step no. 2 in my comment? Or is the order not correct?
You should do it before powering down and adding the new drive, that's all.
Quote:
Originally Posted by LinuGeek
since I have to reboot the system couple of times and one of the systems disks sda is not in place, will it create any problem while booting the OS?
Good point. GRUB may or may not handle booting from a mirror set, but the BIOS will default to booting from the first hard drive regardless.
If GRUB has duplicated the boot sector to both drives (and that's a big "if"), you could possibly boot directly from the second drive from the server's boot menu, or by booting from a media with a boot loader that allows booting from other drives.
Otherwise, it will be necessary to manually install GRUB to the new drive before you can boot. And you will definitely have to re-run the GRUB installer afterwards anyway.
(BTW, all this would have been unnecessary if the drives had been mounted in hotplug trays, of if this had been a hardware RAID setup or even a so-called "fakeRAID" volume.)
1. Sometimes when there are 2 disks lets say sda and sdb and one of them is non-functional e.g. sda in this case, after the reboot the one which was sdb, can be identified as sda in the absence of original sda. This will definately affect the set of commands which I have prepared for.
2. As said in 1st post, there is a LVM Layer sitting under Software RAID.
So system is one of the VGs present on sdb and the faulty disk sda. Does it make any difference to the set of commands then?
After the RAID is built once again, the LVMs should be back as expected?
1. Sometimes when there are 2 disks lets say sda and sdb and one of them is non-functional e.g. sda in this case, after the reboot the one which was sdb, can be identified as sda in the absence of original sda. This will definately affect the set of commands which I have prepared for.
Yes, that is absolutely the case if you boot the server without a replacement disk.
The order in which Linux kernel enumerates drives (and devices in general) is determined by when the driver for the controller is loaded and how that driver then access the devices attached to said controller.
For SATA/SAS drives, the ports are always enumerated in the same order (which may or may not be the same order used by the motherboard BIOS), and the first drive found is assigned /dev/sda. So yes, if you simply remove the first drive and then boot the server, what used to be /dev/sdb is likely to appear as /dev/sda.
However, if you install a replacement drive and connect it to the same port, that drive will become the new /dev/sda.
To make absolutely sure you're operating on the right drive, check the make, model, and serial number with smartctl.
Quote:
Originally Posted by LinuGeek
2. As said in 1st post, there is a LVM Layer sitting under Software RAID.
That's actually an advantage.
LVM volumes are identified by metadata on the LVM partitions. Drives may appear with different device node names, but as long as pvscan finds the physical volumes at boot, everything will be identified properly.
The same goes for md devices, at least as long as /etc/mdadm.conf is accessible and hasn't been edited to contain hardcoded references to device nodes.
Unlikely, as the defective drive is almost certainly the boot device.
If the drive is totally dead, the next drive on the controller (currently seen as /dev/sdb by the OS) will become the boot device. It probably lacks the GRUB bootloader, so the boot process will fail or hang.
If the drive has multiple bad sectors but is still running, the server will attempt to boot from it. If, by pure coincidence, none of the sectors holding the GRUB loader are bad, you may be able to successfully boot. But I wouldn't count on it.
GRUB may or may not handle booting from a mirror set, but the BIOS will default to booting from the first hard drive regardless.
If we consider this, then for the system to boot, I need to first recreate the boot. In that case the order should be like this,
1. Note the serialnumbers of the disks using smartctl
2. Mark the faulty disk as failed. And remove it from mdadm RAID.
3. Shutdown the server
4. Take out the faulty disk manually and Replace with the new disk.
5. Boot the server??
This will fail as the new disk does not have any booting information on it. So I guess I will have to use Boot-CD in order to get to the Recovery mode.
The obvious question will be, will I be able to execute the next set of commands in the recovery mode? Or chroot will be required? Its getting complicated now I guess.
6. Once the raid has been reconstructed, I would boot the system normally in the hope that it boots.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.