LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   RAID 5 array not assembling all 3 devices on boot using MDADM, one is degraded. (http://www.linuxquestions.org/questions/linux-software-2/raid-5-array-not-assembling-all-3-devices-on-boot-using-mdadm-one-is-degraded-829541/)

kirby9 08-31-2010 10:31 AM

RAID 5 array not assembling all 3 devices on boot using MDADM, one is degraded.
 
I have been having this problem for the past couple days and have done my best to solve it, but to no avail.

I am using mdadm, which I'm not the most experienced in, to make a raid5 array using three separate disks (dev/sda, dev/sdc, dev/sdd). For some reason not all three drives are being assembled at boot, but I can add the missing array without any problems later, its just that this takes hours to sync.

Here is some information:

Code:

sudo mdadm -D /dev/md0

/dev/md0:
        Version : 0.90
  Creation Time : Sun Apr 26 21:31:31 2009
    Raid Level : raid5
    Array Size : 1953524992 (1863.03 GiB 2000.41 GB)
  Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
  Raid Devices : 3
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Aug 31 08:17:07 2010
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

        Layout : left-symmetric
    Chunk Size : 64K

          UUID : b200b6bf:b812907b:c3510a1f:f4a0fa6e
        Events : 0.11604

    Number  Major  Minor  RaidDevice State
      0      0        0        0      removed
      1      8      32        1      active sync  /dev/sdc
      2      8      48        2      active sync  /dev/sdd

As you can see it is clean, and degraded.
After running:
sudo mdadm --add /dev/md0 /dev/sda
it goes to a clean state where all drives are active, it's just that all my efforts of adding disappear after rebooting and I have to do the whole process again.

Here is my /etc/mdadm/mdadm.conf
Code:

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
#DEVICE partitions
DEVICE /dev/sda /dev/sdc /dev/sdd
#ARRAY  /dev/md0 devices=/dev/sda,/dev/sdc,/dev/sdd
ARRAY /dev/md0 UUID=b200b6bf:b812907b:c3510a1f:f4a0fa6e

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

Also there is another config file at /etc/default/mdadm and I noticed that the /etc/init.d/mdadm-raid does has a variable called DEBIANCONFIG that points to it. i tried changing this "default" config to "my" config, but that didn't work either.

Anyway, here is that config file just in case
/etc/default/mdadm

Code:

# mdadm Debian configuration
#
# You can run 'dpkg-reconfigure mdadm' to modify the values in this file, if
# you want. You can also change the values here and changes will be preserved.
# Do note that only the values are preserved; the rest of the file is
# rewritten.
#

# INITRDSTART:
#  list of arrays (or 'all') to start automatically when the initial ramdisk
#  loads. This list *must* include the array holding your root filesystem. Use
#  'none' to prevent any array from being started from the initial ramdisk.
INITRDSTART='none'

# AUTOSTART:
#  should mdadm start arrays listed in /etc/mdadm/mdadm.conf automatically
#  during boot?
AUTOSTART=true

# AUTOCHECK:
#  should mdadm run periodic redundancy checks over your arrays? See
#  /etc/cron.d/mdadm.
AUTOCHECK=true

# START_DAEMON:
#  should mdadm start the MD monitoring daemon during boot?
START_DAEMON=true

# DAEMON_OPTIONS:
#  additional options to pass to the daemon.
DAEMON_OPTIONS="--syslog"

# VERBOSE:
#  if this variable is set to true, mdadm will be a little more verbose e.g.
#  when creating the initramfs.
VERBOSE=false

# MAIL_TO:
#  this variable is now managed in /etc/mdadm/mdadm.conf (MAILADDR).
#  Please see mdadm.conf(5).

And finally, the most important part, the dmesg. Hopefully you guys can make sense of it.

DMESG(Abridged, kind of)
Code:

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.32-5-amd64 (Debian 2.6.32-20) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-2) ) #1 SMP Thu Aug 12 13:01:50 UTC 2010
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-2.6.32-5-amd64 root=UUID=b502040c-bf19-4798-90d3-37f29da58fd5 ro quiet
[    1.509072] scsi 1:0:1:0: Direct-Access    ATA      WDC WD10EADS-00L 01.0 PQ: 0 ANSI: 5
[    1.509173] scsi 2:0:0:0: Direct-Access    ATA      WDC WD10EADS-00L 01.0 PQ: 0 ANSI: 5
[    1.513427] sd 0:0:1:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    1.513462] sd 0:0:1:0: [sda] Write Protect is off
[    1.513464] sd 0:0:1:0: [sda] Mode Sense: 00 3a 00 00
[    1.513478] sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.513565]  sda:
[    1.515074] sd 1:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/232 GiB)
[    1.515108] sd 1:0:0:0: [sdb] Write Protect is off
[    1.515110] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    1.515124] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.515203]  sdb:
[    1.516765] sd 2:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    1.516798] sd 2:0:0:0: [sdd] Write Protect is off
[    1.516800] sd 2:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[    1.516815] sd 2:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.516894]  sdd: sdb1 sdb2 sdb3 sdb4 < sdb5
[    1.555933] sd 1:0:1:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    1.574039]  sdb6 >
[    1.574086] sd 1:0:1:0: [sdc] Write Protect is off
[    1.574089] sd 1:0:1:0: [sdc] Mode Sense: 00 3a 00 00
[    1.574111] sd 1:0:1:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.574264]  sdc: unknown partition table
[    1.582085] sd 1:0:0:0: [sdb] Attached SCSI disk
[    1.582194] sd 1:0:1:0: [sdc] Attached SCSI disk
[    1.601944] usb 2-6.1: New USB device found, idVendor=0510, idProduct=0033
[    1.601947] usb 2-6.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    1.601949] usb 2-6.1: Product: Joint Keyboard
[    1.601951] usb 2-6.1: Manufacturer: SEJIN
[    1.602014] usb 2-6.1: configuration #1 chosen from 1 choice
[    1.607974] usbcore: registered new interface driver hiddev
[    1.622454] input: BTC USB Multimedia Keyboard as /devices/pci0000:00/0000:00:1a.2/usb5/5-2/5-2:1.0/input/input1
[    1.622484] generic-usb 0003:046D:C312.0001: input,hidraw0: USB HID v1.10 Keyboard [BTC USB Multimedia Keyboard] on usb-0000:00:1a.2-2/input0
[    1.666284] input: BTC USB Multimedia Keyboard as /devices/pci0000:00/0000:00:1a.2/usb5/5-2/5-2:1.1/input/input2
[    1.666351] generic-usb 0003:046D:C312.0002: input,hiddev0,hidraw1: USB HID v1.10 Device [BTC USB Multimedia Keyboard] on usb-0000:00:1a.2-2/input1
[    1.669265] input: SEJIN Joint Keyboard as /devices/pci0000:00/0000:00:1d.7/usb2/2-6/2-6.1/2-6.1:1.0/input/input3
[    1.669299] generic-usb 0003:0510:0033.0003: input,hidraw2: USB HID v1.10 Keyboard [SEJIN Joint Keyboard] on usb-0000:00:1d.7-6.1/input0
[    1.669312] usbcore: registered new interface driver usbhid
[    1.669314] usbhid: v2.6:USB HID core driver
[    1.964137] usb 2-6.2: new low speed USB device using ehci_hcd and address 4
[    1.980748]  sdd1 sdd3
[    1.980752] sdd: p1 size 2113799111 exceeds device capacity, enabling native capacity
[    1.980866]  sdd: sdd1 sdd3
[    1.981039] sdd: p1 size 2113799111 exceeds device capacity, limited to end of disk
[    1.981075] sdd: p3 ignored, start 2147615687 is behind the end of the disk
[    1.981338] sd 2:0:0:0: [sdd] Attached SCSI disk
[    2.016087]
[    2.016260] sd 0:0:1:0: [sda] Attached SCSI disk
[    2.021878] sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
[    2.021881] Uniform CD-ROM driver Revision: 3.20
[    2.021962] sr 0:0:0:0: Attached scsi CD-ROM sr0
[    2.024568] sr 0:0:0:0: Attached scsi generic sg0 type 5
[    2.024588] sd 0:0:1:0: Attached scsi generic sg1 type 0
[    2.024610] sd 1:0:0:0: Attached scsi generic sg2 type 0
[    2.024630] sd 1:0:1:0: Attached scsi generic sg3 type 0
[    2.024929] sd 2:0:0:0: Attached scsi generic sg4 type 0
[    2.078619] usb 2-6.2: New USB device found, idVendor=1a7c, idProduct=0068
[    2.078622] usb 2-6.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    2.078624] usb 2-6.2: Product: Evoluent VerticalMouse 3
[    2.078627] usb 2-6.2: Manufacturer: Kingsis Peripherals
[    2.078694] usb 2-6.2: configuration #1 chosen from 1 choice
[    2.082809] input: Kingsis Peripherals  Evoluent VerticalMouse 3  as /devices/pci0000:00/0000:00:1d.7/usb2/2-6/2-6.2/2-6.2:1.0/input/input4
[    2.082849] generic-usb 0003:1A7C:0068.0004: input,hidraw3: USB HID v1.10 Mouse [Kingsis Peripherals  Evoluent VerticalMouse 3 ] on usb-0000:00:1d.7-6.2/input0
[    2.200719] async_tx: api initialized (async)
[    2.201031] xor: automatically using best checksumming function: generic_sse
[    2.220002]    generic_sse: 11501.000 MB/sec
[    2.220003] xor: using function: generic_sse (11501.000 MB/sec)
[    2.288015] raid6: int64x1  2503 MB/s
[    2.356007] raid6: int64x2  3350 MB/s
[    2.424023] raid6: int64x4  2652 MB/s
[    2.492009] raid6: int64x8  2208 MB/s
[    2.560002] raid6: sse2x1    5357 MB/s
[    2.628003] raid6: sse2x2    5740 MB/s
[    2.696004] raid6: sse2x4    8908 MB/s
[    2.696006] raid6: using algorithm sse2x4 (8908 MB/s)
[    2.701595] md: raid6 personality registered for level 6
[    2.701596] md: raid5 personality registered for level 5
[    2.701598] md: raid4 personality registered for level 4
[    2.704349] md: md0 stopped.
[    2.705248] md: bind<sdd>
[    2.705388] md: bind<sdc>
[    2.706242] raid5: device sdc operational as raid disk 1
[    2.706243] raid5: device sdd operational as raid disk 2
[    2.706470] raid5: allocated 3230kB for md0
[    2.706535] 1: w=1 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
[    2.706537] 2: w=2 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
[    2.706539] raid5: raid level 5 set md0 active with 2 out of 3 devices, algorithm 2
[    2.706579] RAID5 conf printout:
[    2.706580]  --- rd:3 wd:2
[    2.706582]  disk 1, o:1, dev:sdc
[    2.706583]  disk 2, o:1, dev:sdd
[    2.706597] md0: detected capacity change from 0 to 2000409591808
[    2.707526]  md0:
[  12.763502] PM: Starting manual resume from disk
[  12.763505] PM: Resume from partition 8:22
[  12.763506] PM: Checking hibernation image.
[  12.789243] PM: Error -22 checking image file
[  12.789245] PM: Resume from disk failed.
[  12.838019] EXT4-fs (sdb3): mounted filesystem with ordered data mode
[  14.405517] udev: starting version 160
[  14.615961] processor LNXCPU:00: registered as cooling_device0
[  14.615993] processor LNXCPU:01: registered as cooling_device1
[  14.629786] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input5
[  14.629793] ACPI: Power Button [PWRB]
[  14.629829] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input6
[  14.629831] ACPI: Power Button [PWRF]
[  15.043297] input: PC Speaker as /devices/platform/pcspkr/input/input7
[  15.232865] i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[  15.232869] ACPI: I/O resource 0000:00:1f.3 [0x400-0x41f] conflicts with ACPI region SMRG [0x400-0x40f]
[  15.232911] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[  15.972461] nvidia: module license 'NVIDIA' taints kernel.
[  15.972464] Disabling lock debugging due to kernel taint
[  16.454842] nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  16.454850] nvidia 0000:01:00.0: setting latency timer to 64
[  16.454854] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
[  16.455050] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  256.44  Thu Jul 29 01:22:44 PDT 2010
[  16.524776] cfg80211: Using static regulatory domain info
[  16.524778] cfg80211: Regulatory domain: US
[  16.524779]        (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[  16.524781]        (2402000 KHz - 2472000 KHz @ 40000 KHz), (600 mBi, 2700 mBm)
[  16.524783]        (5170000 KHz - 5190000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
[  16.524785]        (5190000 KHz - 5210000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
[  16.524786]        (5210000 KHz - 5230000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
[  16.524788]        (5230000 KHz - 5330000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
[  16.524790]        (5735000 KHz - 5835000 KHz @ 40000 KHz), (600 mBi, 3000 mBm)
[  16.524968] cfg80211: Calling CRDA for country: US
[  16.997038] ath9k 0000:04:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  17.421577] ath: EEPROM regdomain: 0x10
[  17.421578] ath: EEPROM indicates we should expect a direct regpair map
[  17.421580] ath: Country alpha2 being used: CO
[  17.421581] ath: Regpair used: 0x10
[  17.459559]  alloc irq_desc for 22 on node -1
[  17.459561]  alloc kstat_irqs on node -1
[  17.459567] HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[  17.459593] HDA Intel 0000:00:1b.0: setting latency timer to 64
[  17.509702] phy0: Selected rate control algorithm 'ath9k_rate_control'
[  17.510141] Registered led device: ath9k-phy0::radio
[  17.510152] Registered led device: ath9k-phy0::assoc
[  17.510162] Registered led device: ath9k-phy0::tx
[  17.510172] Registered led device: ath9k-phy0::rx
[  17.510177] phy0: Atheros AR5416 MAC/BB Rev:2 AR2133 RF Rev:81: mem=0xffffc90011240000, irq=16
[  17.510428] cfg80211: Calling CRDA for country: CO
[  17.538799] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:1b.0/input/input8
[  18.086952] Adding 4072436k swap on /dev/sdb6.  Priority:-1 extents:1 across:4072436k
[  18.601728] loop: module loaded
[  19.289442] EXT4-fs (sdb2): mounted filesystem with ordered data mode
[  19.354586] EXT4-fs (sdb5): mounted filesystem with ordered data mode
[  20.096078] fuse init (API version 7.13)
[  20.870082]  alloc irq_desc for 27 on node -1
[  20.870084]  alloc kstat_irqs on node -1
[  20.870097] ATL1E 0000:02:00.0: irq 27 for MSI/MSI-X
[  20.870644] ADDRCONF(NETDEV_UP): eth0: link is not ready
[  24.752827] Bluetooth: Core ver 2.15
[  24.752859] NET: Registered protocol family 31
[  24.752860] Bluetooth: HCI device and connection manager initialized
[  24.752862] Bluetooth: HCI socket layer initialized
[  24.838626] Bluetooth: L2CAP ver 2.14
[  24.838628] Bluetooth: L2CAP socket layer initialized
[  24.872113] Bluetooth: RFCOMM TTY layer initialized
[  24.872115] Bluetooth: RFCOMM socket layer initialized
[  24.872116] Bluetooth: RFCOMM ver 1.11
[  25.042298] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[  25.042300] Bluetooth: BNEP filters: protocol multicast
[  25.071808] Bridge firewalling registered
[  25.444831] Bluetooth: SCO (Voice Link) ver 0.6
[  25.444833] Bluetooth: SCO socket layer initialized
[  27.297367] lp: driver loaded but no devices found
[  27.436221] ppdev: user-space parallel port driver
[  28.308389] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  35.072213] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  35.154711] ATL1E 0000:02:00.0: irq 27 for MSI/MSI-X
[  35.155236] ADDRCONF(NETDEV_UP): eth0: link is not ready
[  35.285216] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  38.164019] ath9k: Two wiphys trying to scan at the same time
[  38.349934] wlan0: direct probe to AP 00:1a:70:da:b8:c1 (try 1)
[  38.352531] wlan0: direct probe responded
[  38.352534] wlan0: authenticate with AP 00:1a:70:da:b8:c1 (try 1)
[  38.354585] wlan0: authenticated
[  38.354596] wlan0: associate with AP 00:1a:70:da:b8:c1 (try 1)
[  38.358035] wlan0: RX AssocResp from 00:1a:70:da:b8:c1 (capab=0x411 status=0 aid=6)
[  38.358037] wlan0: associated
[  38.358559] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[  49.304005] wlan0: no IPv6 routers present
[  51.768829] CPU0 attaching NULL sched-domain.
[  51.768832] CPU1 attaching NULL sched-domain.
[  51.792070] CPU0 attaching sched-domain:
[  51.792073]  domain 0: span 0-1 level MC
[  51.792075]  groups: 0 1
[  51.792080] CPU1 attaching sched-domain:
[  51.792082]  domain 0: span 0-1 level MC
[  51.792084]  groups: 1 0
[  92.788035] lo: Disabled Privacy Extensions

all help is much appreciated
Eric

mostlyharmless 08-31-2010 03:10 PM

Just some ideas, not sure how helpful:

Could you give the output of (sudo)fdisk -l ?

You're using whole device raid, rather than partitions, which is great, but I would've thought that the partitions would therefore look identical on sda, sdc and sdd. Yet the output from dmesg doesn't appear to reflect that.

What is the physical hardware of the 4 disks; I see only the first two SATA disks...

kirby9 08-31-2010 03:51 PM

So there are four disks and go as follows
sdb - 250gb caviar green
sda,c,d - 1000gb caviar green

Here is
fdisk -l

Code:

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x4e1e5bb1

  Device Boot      Start        End      Blocks  Id  System

Disk /dev/sdb: 250.1 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f1697

  Device Boot      Start        End      Blocks  Id  System
/dev/sdb1              1      10199    81923436    7  HPFS/NTFS
/dev/sdb2  *      10200      10442    1951897+  83  Linux
/dev/sdb3          10443      15305    39062047+  83  Linux
/dev/sdb4          15306      30401  121258620    5  Extended
/dev/sdb5          15306      29894  117186111  83  Linux
/dev/sdb6          29895      30401    4072446  82  Linux swap / Solaris

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x4e1e5bb1

  Device Boot      Start        End      Blocks  Id  System
/dev/sdd1              9      131587  1056899555+  c7  Syrinx
Partition 1 does not end on cylinder boundary.
/dev/sdd2              1          1          0    4  FAT16 <32M
Partition 2 does not end on cylinder boundary.
/dev/sdd3          133683      265261  1056899555+  c7  Syrinx
Partition 3 does not end on cylinder boundary.
/dev/sdd4              1          1          0    4  FAT16 <32M
Partition 4 does not end on cylinder boundary.

Partition table entries are not in disk order

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/md0: 2000.4 GB, 2000409591808 bytes
2 heads, 4 sectors/track, 488381248 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 131072 bytes
Disk identifier: 0x4e1e5bb1

    Device Boot      Start        End      Blocks  Id  System

Hope this gives you some ideas.

purevw 09-01-2010 02:33 AM

Have you tried changing you config file so that mdadm does not start during boot? Western Digital makes it very clear that their "normal" drives do not support RAID. They want you to spend considerably more money on their "RAID Edition" drives. The problem is that there is a lag time with some of the drives and that RAID hardware engines and possibly mdadm see this as a drive failure.
It may be possible to start mdadm manually after your system is up and the drives have a had a few minutes to stabilize. Also, it may be helpful if you use the "verbose" option so that more detailed info will be presented. It could help with troubleshooting.

mostlyharmless 09-01-2010 09:06 AM

Putting a delay in with something like "sleep 20" in your startup file that starts the raid would address the "drive not ready" issue mentioned above; it might be worth trying... I have a delay in rc.local; not sure if that's the same in debian. I think you'll need to put in /etc/init.d/mdadm-raid

If your RAID is started by your initrd, which is likely only if you boot off it, then you need to remake your initrd. Look at
http://wiki.xtronics.com/index.php/Raid , especially at the sections labeled "Regenerate initrd" and immediately below, as well as the end section in "Notes from others"

If all else fails, backup your data and remake the RAID and try again, recreating the RAID from scratch and carefully taking note of the commands used. I still think it's weird that your three RAID disk don't have the same partition table setup, but since sdd is the outlier, not sda, I can't see how that's the problem.

kirby9 09-01-2010 12:28 PM

After removing mdadm from booting, still problems
 
Very strange???

I removed mdadm and mdadm-raid from running at boot by using
Code:

update-rc.d -f [mdadm,mdadm-raid] remove
checking
Code:

ls -l /etc/rc?.d/*mdadm*
finds nothing, so mdadm shouldn't start right?

Here is the dmesg again, same results (algorithm 2, etc)

kirby9 09-01-2010 12:49 PM

I'm starting to get frustrated, and since it seems that my drives are messed up anyhow (partition table, etc), I'm thinking about just recreating the array.

As we speak I am copying off the data from the array to an external drive.

Just one question, is it bad to do raid on whole disks?
Should I instead use partitions? (ex3?)

Any advice is always appreciated.

Eric

markie83 09-01-2010 03:15 PM

for what its worth.....
 
I am fairly green with raid but I setup our server with mdadm with 2 raid-1 arrays....I have a partition on each disk (type fd "linux raid auto-aware") after the raids are constructed I formatted the array /dev/md0 with mkfs.ext3.

this video helped me a bunch even though I am running debian.

http://www.youtube.com/watch?v=CoeRiwOS76M

mostlyharmless 09-01-2010 03:17 PM

I prefer partition tables, even if the whole drive is one partition. You don't make a filesystem (ext3 or anything else), you just make the partition (e.g. with fdisk) then use mdadm to make the array with the partitions
(e.g. mdadm -c (other options)/dev/sda1 /dev/sdc1 etc. n.b. the numerals) then make the filesystem on the array,
(e.g. mkfs -t ext3 /dev/md0 or something like that)

Not to say there's anything wrong with using the whole device, just my preference.

makyo 09-01-2010 05:13 PM

Hi.

I ran into a timing, "settling" issue with 4 SCSI disks in a RAID10. I used this advice:
Quote:

Boot

Boot with rootdelay=9 to shorten the time of waiting for the root device to come up. Also it adds time to scsi device to settle before calling mdadm or lvm thus excluding potential races.

-- excerpt from http://wiki.debian.org/InitramfsDebug
and it seemed to work for me ( about a year ago, based on notes in my log, memory fuzzy beyond that :) )
Good luck ... cheers, makyo

kirby9 09-02-2010 11:42 AM

Hooray!!

After a bunch of crazy headaches, it seems to work. There was this crazy hiccup that I ran into where fdisk was claiming that some of my identical drives were not identical (different number of sectors). After reformatting and restarting that problem magically disappeared.

Thanks for all your help.
Eric

voinageo 11-20-2010 10:32 AM

I was having the same problem with my 3 disks Raid5 setup after upgrading to Fedora 14 x64. My raid was always starting with 2 drives active and one removed. It also happened to be unable to start at all when 2 drives were not active.
It seems that the system boots to fast and at least one of the drives do not have time to settle and mdadm sees them as not ready/failed and removes them from the raid.

I added rootdelay=9 to my boot line and now everything works like a charm.

title Fedora (2.6.35.6-48.fc14.x86_64)
root (hd0,1)
kernel /vmlinuz-2.6.35.6-48.fc14.x86_64 ro root=/dev/mapper/VolGroup00-LogVol00 rd_LVM_LV=VolGroup00/LogVol00 rd_LVM_LV=VolGroup00/LogVol01 rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet rootdelay=9

Thank you all for the solution.


All times are GMT -5. The time now is 07:33 PM.