LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-11-2014, 11:31 PM   #1
ncdave
LQ Newbie
 
Registered: Nov 2005
Location: Cary, NC USA
Distribution: Scientific Linux 6.1, Scientific Linux 7, Ubuntu, Parted Magic; formerly Mandrake 10.1, etc.
Posts: 7

Rep: Reputation: 0
How can I coax RHEL/CentOS/SL 7 into booting normally with degraded software RAID?


I set up a new server (my first with this version of Linux). I installed a pair of 160 GB blank SATA HDDs (one Seagate and one WDC, but with exactly the same number of LBA sectors) in an old machine, and set out to install Scientific Linux 7.0 (rebranded RHEL) in a RAID 1 (software mirrored) configuration.

The first hiccup was that I couldn't figure out how to get SL / RHEL installer (Anaconda) to set up the two drives for RAID1. So I booted from a PartedMagic CD, and used it to do the partitioning.

I partitioned the two drives identically. Each drive has a big partition for RAID1+ext4 to be mounted at /, a small (currently unused) partition for RAID1+ext3 to be mounted at /safe, and a 3GB Linux Swap partition. I used fdisk to change the types of the RAID partitions on each drive to FD, and mdadm to build the RAID arrays:

mdadm --create --verbose /dev/md0 --raid-devices=2 --level=1 /dev/sda1 /dev/sdb1
mdadm --create --verbose /dev/md1 --raid-devices=2 --level=1 /dev/sda2 /dev/sdb2

Then I shut down, booted the SL DVD, and tried the install again. This time the installer recognized the RAID1 arrays, formatted them for ext4 & ext3, respectively, and installed smoothly.

At this point, everything seemed okay. I shut it down, started it again, and it booted fine. So far so good.

So then I tested the RAID1 functionality: I shut down the computer, removed one of the drives, and tried to boot it. I was expecting it to display some error messages about the RAID array being degraded, and then come up to the normal login screen. But it didn't work. Instead I got:

Welcome to emergency mode! After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" to try again
to boot into default mode.
Give root password for maintenance
(or type Control-D to continue):

The same thing happens regardless of which drive is missing.

That's no good! The purpose of the mirrored drives is to ensure that the server will keep on running if one of the drives fails.

Ctrl-D just gets me back to a repeat of the same "Welcome to emergency mode" screen. So does entering my root password and then "systemctl default".

So then I tried an experiment. At the boot menu I pressed "e" to edit the kernel boot parameters, and changed "rhgb quiet" to "bootdegraded=true" and then booted. No joy.

That let me see more status messages flying by, but it didn't enable the machine to boot normally when a drive was missing. It still stopped at the same "Welcome to emergency mode" screen. The following is what I saw with the Seagate drive removed, and the WDC drive remaining. The last few lines look like the following (except that "...." denotes where I got tired of typing):

[ OK ] Started Activation of DM RAID sets.
[ OK ] Reached target Encrypted Volumes.
[ 14.855860] md: bind<sda2>
[ OK ] Found device WDC_WD1600BEVT-00A23T0.
Activating swap /dev/disk/by-uuid/add41844....
[ 15.190432] Adding 3144700k swap on /dev/sda3. Priority:-1 extents:1 across:3144700k FS
[ OK ] Activated swap /dev/disk/by-uuid/add41844....
[ TIME ] Timed out waiting for device dev-disk-by\x2duuid-a65962d\x2dbf07....
[DEPEND] Dependency failed for /safe.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for Mark the need to relabel after reboot.
[DEPEND] Dependency failed for Relabel all file systems, if necessary.
[ 99.299068] systemd-journald[452]: Received request to flush runtime journal from PID 1
[ 99.3298059] type=1305 audit(1415512815.286:4): audit_pid=588 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
Welcome to emergency mode! After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" to try again
to boot into default mode.
Give root password for maintenance
(or type Control-D to continue):

So it appears that installing on RAID1 mirrored drives will just double the chance of a drive failure bringing down the server (since there are two drives instead of one). That is not what I was hoping to achieve w/ mirrored drives.

Does anyone know how to make it boot & run "normally" (with a degraded RAID1 array) when a hard disk drive fails?


Two other notes:

1. I'm new to RHEL/SL/CentOS 7, so at the "Software Selection" screen, during the SL installation, I had to do some guessing. I chose:
"General Purpose System" +
FTP Server,
File and Storage Server,
Office Suite and Productivity,
Virtualization Hypervisor,
Virtualization Tools, and
Development Tools

2. I'm seeing some apparently-innocuous errors:
ATAx: softreset failed (device not ready)
The "x" depends on which drives are installed. I get more of those errors with two drives installed than with only one.
 
Old 11-13-2014, 07:55 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,195

Rep: Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284
Did you do grub-install to both drives? Is your BIOS set up to try both drives?
 
Old 11-13-2014, 09:55 AM   #3
nbritton
Member
 
Registered: Jun 2013
Location: Dubuque, IA
Distribution: Red Hat Enterprise Linux, Mac OS X, Ubuntu, Fedora, FreeBSD
Posts: 89

Rep: Reputation: Disabled
Remove the swap entry from /etc/fstab and try again, just a guess but maybe fstab is referencing the disk directly instead of the md device?

Why did you setup the disk layout this way? Personally, I would have setup something like this:

Code:
mdadm --create /dev/md0 --raid-devices=2 --level=1 /dev/sda /dev/sdb

pvcreate /dev/md0
vgcreate vg_00 /dev/md0
lvcreate --size 80G --name lv_root vg_00
lvcreate --size 10G --name lv_safe vg_00
lvcreate --size 3G --name lv_swap vg_00
mkfs.ext4 /dev/vg_00/lv_root
mkfs.ext3 /dev/vg_00/lv_safe
mkswap /dev/vg_00/lv_swap
Also, if it was me personally, I would create a swap file rather then a swap partition...

Code:
dd if=/dev/zero of=/var/swap bs=1M count=3000
chmod 600 /var/swap
mkswap /var/swap
swapon /var/swap
echo -e "/var/swap\tswap\tswap\tdefaults\t0\t0" >> /etc/fstab

Last edited by nbritton; 11-13-2014 at 10:48 AM.
 
1 members found this post helpful.
Old 11-13-2014, 01:38 PM   #4
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,195

Rep: Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284Reputation: 1284
Whole disk RAID means you have to be careful. Since the individual disks aren't labeled they look blank to most software. It's better to do software RAID on partitions.
 
Old 11-14-2014, 01:43 AM   #5
nbritton
Member
 
Registered: Jun 2013
Location: Dubuque, IA
Distribution: Red Hat Enterprise Linux, Mac OS X, Ubuntu, Fedora, FreeBSD
Posts: 89

Rep: Reputation: Disabled
Quote:
Originally Posted by smallpond View Post
Whole disk RAID means you have to be careful. Since the individual disks aren't labeled they look blank to most software. It's better to do software RAID on partitions.
Careful how? You don't need to use partitioning software on partition-less disks. I think it is a moot point. On the plus side, one of the benefits of a partition-less disk is the ability to easily resize on the fly. For example, to resize the root partition in the previous example you can simply run:

Code:
lvextend -L+10G vg_00/lv_root && resize2fs /dev/vg_00/lv_root;
You don't even need an outage window to do it because it's an online on the fly resize, and any junior level linux admin is capable of running that command. There is less risk of human error, the last thing you want is a junior admin manually mucking around with the partition tables.


Another example, lets say you want to migrate your system to larger LUNs, this is easily accomplished with:

Code:
mdadm --grow /dev/md0 && pvresize /dev/md0;

Last edited by nbritton; 11-14-2014 at 02:10 AM.
 
Old 11-20-2014, 07:49 PM   #6
ncdave
LQ Newbie
 
Registered: Nov 2005
Location: Cary, NC USA
Distribution: Scientific Linux 6.1, Scientific Linux 7, Ubuntu, Parted Magic; formerly Mandrake 10.1, etc.
Posts: 7

Original Poster
Rep: Reputation: 0
Victory!

Quote:
Originally Posted by nbritton View Post
...if it was me personally, I would create a swap file rather then a swap partition...

Code:
dd if=/dev/zero of=/var/swap bs=1M count=3000
chmod 600 /var/swap
mkswap /var/swap
swapon /var/swap
echo -e "/var/swap\tswap\tswap\tdefaults\t0\t0" >> /etc/fstab
Thank you, that's an excellent thought. Even if what I did had worked, it still might make the system vulnerable to a failure if a disk drive developed a bad block within the swap partition. It's obviously better to use a swap file on the RAID1 partition; I don't know what I was thinking.

So I deleted the swap partition entries from /etc/fstab and added a 3 GB swap file on the ext4 file system, on md0.

That worked fine when both drives were present. But when if I removed one drive I was right back at the Emergency Mode prompt again.

However, when I checked the log with "journalctl -xb" I noticed that it was complaining about my 2nd array (/dev/md1, with the ext3 filesystem), not the main ext4 filesystem. So I commented-out that line in /etc/fstab and tried again, and this time it booted properly, from the degraded RAID1 array!

Apparently the problem booting wasn't due to my main /dev/md0 RAID1 array, it was because of the other partitions: the two swap partitions, and the /dev/md1 RAID1 array.

After I shut down and reinstalled the missing drive, I started the machine again, and it was still running with just one drive. But I did "mdadm --add" to add the missing drive back, and its state went to "spare rebuilding" for a while, and then to "active."

In other words, it's working perfectly.

I thank you both very much for your helpful advice!
 
  


Reply

Tags
boot failure, grub2, raid1, rhel, software raid 1


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Rats! Degraded Software Raid 5 Issue aleinin Linux - Server 12 02-05-2011 01:28 PM
Recovering the OS from software RAID HDD crash with rPath (or RHEL/CentOS/Fedora)? pwjohnston Linux - Server 0 08-21-2009 08:01 PM
Degraded Array on Software Raid pcinfo-az Linux - Hardware 8 07-03-2008 10:43 AM
Software RAID-1 unable to boot degraded keithk23 Linux - Server 2 09-27-2006 08:52 AM
Trouble booting a degraded RAID-1 array aluchko Linux - Software 3 09-09-2006 10:26 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration