Frustrating Linux Raid Issues (kinda long)
We have a Sun X4270 with 16 disks. I am having an odd problem where raid devices are not being ignored by LVM and raid devices are geting hosed after reboots.
This system is running kernel 2.6.18-194.3.1.0.2.el5. (64-bit) /dev/sda is the system disk and is cold-mirrored to /dev/sdf. The remaining devices are being used to create a 14-device raid5 device - /dev/md0. Each of the disks to be used in the raid device are partitioned as so: Disk /dev/sdb: 36472 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sdb1 0+ 36471 36472- 292961308+ fd Linux raid autodetect /dev/sdb2 0 - 0 0 0 Empty /dev/sdb3 0 - 0 0 0 Empty /dev/sdb4 0 - 0 0 0 Empty mdadm -v --create /dev/md0 --level=raid5 --raid-devices=14 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1 At this point I monitor /proc/mdstat until the device has finished sync'ing. I know have this: /dev/md0: Version : 0.90 Creation Time : Wed Jan 25 18:37:09 2012 Raid Level : raid5 Array Size : 3808495808 (3632.06 GiB 3899.90 GB) Used Dev Size : 292961216 (279.39 GiB 299.99 GB) Raid Devices : 14 Total Devices : 14 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Jan 25 19:45:22 2012 State : clean Active Devices : 14 Working Devices : 14 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : cfa84985:544af1f9:fba81980:7651a084 Events : 0.2 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 2 8 49 2 active sync /dev/sdd1 3 8 65 3 active sync /dev/sde1 4 8 97 4 active sync /dev/sdg1 5 8 113 5 active sync /dev/sdh1 6 8 129 6 active sync /dev/sdi1 7 8 145 7 active sync /dev/sdj1 8 8 161 8 active sync /dev/sdk1 9 8 177 9 active sync /dev/sdl1 10 8 193 10 active sync /dev/sdm1 11 8 209 11 active sync /dev/sdn1 12 8 225 12 active sync /dev/sdo1 13 8 241 13 active sync /dev/sdp1 I then do: mdadm --detail --scan > /etc/mdadm.conf At this point I can create a filesystem and mount /dev/md0 and read/write data to it. However, after I reboot things go pear shaped. During boot, I see the following output: Loading mptscsihFusion MPT SAS Host driver 3.04.13rh .ko module Loaddevice-mapper: uevent: version 1.0.3 ing mptsas.ko modevice-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel@redhat.com dule Loading dm-mod.ko module device-mapper: dm-raid45: initialized v0.2594l Loading dm-log.ko module Loading dm-mirror.ko module Loading dm-zero.ko module Loading dm-snapshot.ko module Loading dm-mem-cache.ko module Loading dm-region_hash.ko module Loading dm-message.ko module Loading dm-raid45.ko module Waiting for driver initialization. Scanning and configuring dmraid supported devices RAID set "ddf1_4c5349202020202010000079100092634711471102c38907" was activated device-mapper: table: device /dev/mapper/ddf1_4c5349202020202010000079100092634711471102c38907 too small for target device-mapper: table: 253:1: linear: dm-linear: Device lookup failed device-mapper: ioctl: error adding target to table device-mapper: reload ioctl failed: Invalid argument RAID set "ddf1_4c534920202020201device-mapper: table: device /dev/mapper/ddf1_4c5349202020202010000079100092634711471118f8e4d0 too small for target 0000079100092634device-mapper: table: 253:2: linear: dm-linear: Device lookup failed 711471118f8e4d0"device-mapper: ioctl: error adding target to table was activated <SNIP> I see this stanza of errors for all of the devices in that array except 1. (very odd) Further down the boot process I get this output as well: raid5: failed to run raid set md0 mdadm: /dev/md0 assembled from 1 drive - not enough to start the array. device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument device-mapper: reload ioctl failed: Invalid argument Setting up Logical Volume Management: [ OK ] Checking filesystems Checking all file systems. Once the system has booted my /dev/md0 device is hosed: # mdadm --misc -D /dev/md0 mdadm: md device /dev/md0 does not appear to be active. # mdadm --manage --run /dev/md0 mdadm: failed to run array /dev/md0: Input/output error # mdadm --misc -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Wed Jan 25 18:37:09 2012 Raid Level : raid5 Used Dev Size : 292961216 (279.39 GiB 299.99 GB) Raid Devices : 14 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Jan 25 19:45:22 2012 State : active, degraded, Not Started Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : cfa84985:544af1f9:fba81980:7651a084 Events : 0.2 Number Major Minor RaidDevice State 0 0 0 0 removed 1 0 0 1 removed 2 0 0 2 removed 3 8 65 3 active sync /dev/sde1 4 0 0 4 removed 5 0 0 5 removed 6 0 0 6 removed 7 0 0 7 removed 8 0 0 8 removed 9 0 0 9 removed 10 0 0 10 removed 11 0 0 11 removed 12 0 0 12 removed 13 0 0 13 removed I've down the gamut of zeroing the disks, using dd to overwrite the disk headers, re-partitioning the drives, zero-ing the superblock, etc... At this point I can't online disks, replace disks, start/stop the array, etc.. The only thing I can do is simply destroy/remove the raid device. We can't for the life of us determine what is causing this. We have an identically configured X4270 that is not behaving like this. If you drill down into the LVM guts it is supposed to ignore raid devices: From lvm.conf: # By default, LVM2 will ignore devices used as components of # software RAID (md) devices by looking for md superblocks. # 1 enables; 0 disables. md_component_detection = 1 However, after reboot I have the following devices: [ /dev/mapper]# ls -la total 0 drwxr-xr-x 2 root root 300 Jan 25 19:58 . drwxr-xr-x 11 root root 4940 Jan 25 20:03 .. crw------- 1 root root 10, 63 Jan 25 19:58 control brw-rw---- 1 root disk 253, 0 Jan 25 19:58 ddf1_4c5349202020202010000079100092634711471102c38907 brw-rw---- 1 root disk 253, 1 Jan 25 19:58 ddf1_4c5349202020202010000079100092634711471118f8e4d0 brw-rw---- 1 root disk 253, 2 Jan 25 19:58 ddf1_4c534920202020201000007910009263471147111da173b8 brw-rw---- 1 root disk 253, 3 Jan 25 19:58 ddf1_4c534920202020201000007910009263471147114762ab8e brw-rw---- 1 root disk 253, 4 Jan 25 19:58 ddf1_4c534920202020201000007910009263471147114ed6d9d6 brw-rw---- 1 root disk 253, 5 Jan 25 19:58 ddf1_4c53492020202020100000791000926347114711afdf2d8d brw-rw---- 1 root disk 253, 6 Jan 25 19:58 ddf1_4c53492020202020100000791000926347114711b1a09a14 brw-rw---- 1 root disk 253, 7 Jan 25 19:58 ddf1_4c53492020202020100000791000926347114711b505f25d brw-rw---- 1 root disk 253, 8 Jan 25 19:58 ddf1_4c53492020202020100000791000926347114711bd1d8b39 brw-rw---- 1 root disk 253, 9 Jan 25 19:58 ddf1_4c53492020202020100000791000926347114711df3cfa83 brw-rw---- 1 root disk 253, 10 Jan 25 19:58 ddf1_4c53492020202020100000791000926347114711fa780cfa brw-rw---- 1 root disk 253, 11 Jan 25 19:58 ddf1_4c53492020202020100000791000926347114711fb7550bd pvdisplay, vgdisplay, lvdisplay all report nothing. This is a head scratcher and eludes several of us. My linux is rather rusty having been doing heavy solaris for the last 5 years but I'm at a loss here. Thx for any info. |
The problem is that you're using partition ID fd for your RAID partitions and you're loading device mapper at boot. Device mapper and md volumes are two very different things.
Seeing partitions with ID fd, device mapper will look for LVM metadata on your volumes. Of course, you're not actually using device mapper/LVM, so this cannot possibly work. Unfortunately, it seems device mapper finds something that resembles LVM metadata and tries to set up LVM volumes accordingly. Possible solutions: Don't use partition ID fd and/or don't load the device mapper kernel modules at boot (or at all). |
All times are GMT -5. The time now is 11:20 PM. |