netfoot |
11-22-2012 12:56 PM |
Can not get RAID set to start
After a reboot, my RAID set won't start. Here is /proc/mdstat:
Code:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md0 : inactive sdc3[2] sdd3[3] sda3[0] sde3[4]
7781050112 blocks
unused devices: <none>
There are five drives in the array. /dev/sdb3 is not shown above. All drives are partitioned exactly the same:
Code:
# sfdisk -l /dev/sdb
Disk /dev/sdb: 243201 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/sdb1 0+ 30 31- 248976 82 Linux swap
/dev/sdb2 31 1026 996 8000370 83 Linux
/dev/sdb3 * 1027 243200 242174 1945262655 fd Linux raid autodetect
/dev/sdb4 0 - 0 0 0 Empty
All swap partitions (including /dev/sdb1) appear to be functional, and I can mount, read and write /dev/sdb2 ok. Querying the drives gives me:
Code:
# mdadm --misc -Q /dev/sda3
/dev/sda3: is not an md array
/dev/sda3: device 0 in 5 device active raid5 /dev/md0. Use mdadm --examine for more detail.
# mdadm --misc -Q /dev/sdb3
/dev/sdb3: is not an md array
/dev/sdb3: device 1 in 5 device mismatch raid5 /dev/md0. Use mdadm --examine for more detail.
Examining the drives give conflicting results for /dev/sdb3 and as opposed to the other four drives. Here for example is /dev/sda3 which shows four active/working drives and one failed:
Code:
# mdadm --misc --examine /dev/sda3
/dev/sda3:
Magic : a92b4efc
Version : 0.90.00
UUID : 4e77808f:197dbdcf:413393e8:3b8beff3
Creation Time : Sat May 22 18:38:19 2010
Raid Level : raid5
Used Dev Size : 1945262528 (1855.15 GiB 1991.95 GB)
Array Size : 7781050112 (7420.59 GiB 7967.80 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0
Update Time : Wed Nov 21 17:16:58 2012
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : 9eb7a564 - correct
Events : 4101971
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3
0 0 8 3 0 active sync /dev/sda3
1 1 0 0 1 faulty removed
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
4 4 8 67 4 active sync /dev/sde3
Note second drive faulty/removed. Compare with /dev/sdb3:
Code:
# mdadm --misc --examine /dev/sdb3
/dev/sdb3:
Magic : a92b4efc
Version : 0.90.00
UUID : 4e77808f:197dbdcf:413393e8:3b8beff3
Creation Time : Sat May 22 18:38:19 2010
Raid Level : raid5
Used Dev Size : 1945262528 (1855.15 GiB 1991.95 GB)
Array Size : 7781050112 (7420.59 GiB 7967.80 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Update Time : Mon Jul 23 00:56:48 2012
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : 9e27a008 - correct
Events : 2588290
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 19 1 active sync /dev/sdb3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
4 4 8 67 4 active sync /dev/sde3
Note five active/working drives with none failed, and all listed as active sync. syslog shows:
Code:
Nov 21 19:54:16 triphod kernel: md: kicking non-fresh sdb3 from array!
Nov 21 19:54:16 triphod kernel: 2: w=1 pa=0 pr=5 m=1 a=2 r=5 op1=0 op2=0
Nov 21 19:54:16 triphod kernel: 3: w=2 pa=0 pr=5 m=1 a=2 r=5 op1=0 op2=0
Nov 21 19:54:16 triphod kernel: 0: w=3 pa=0 pr=5 m=1 a=2 r=5 op1=0 op2=0
Nov 21 19:54:16 triphod kernel: 4: w=4 pa=0 pr=5 m=1 a=2 r=5 op1=0 op2=0
Nov 21 19:54:16 triphod kernel: raid5: cannot start dirty degraded array for md0
Nov 21 19:54:16 triphod kernel: RAID5 conf printout:
Nov 21 19:54:16 triphod kernel: --- rd:5 wd:4
Nov 21 19:54:16 triphod kernel: disk 0, o:1, dev:sda3
Nov 21 19:54:16 triphod kernel: disk 2, o:1, dev:sdc3
Nov 21 19:54:16 triphod kernel: disk 3, o:1, dev:sdd3
Nov 21 19:54:16 triphod kernel: disk 4, o:1, dev:sde3
Nov 21 19:54:16 triphod kernel: raid5: failed to run raid set md0
Nov 21 19:54:16 triphod kernel: md: pers->run() failed ...
Nov 21 19:54:16 triphod kernel: md: do_md_run() returned -5
Nov 21 19:54:16 triphod kernel: md: md0 still in use.
I am at a loss to determine what went wrong, and how to recover from the error. Any constructive suggestions greatly appreciated!
|