Hi,
So I have a raid6 array of 22 drives. I had a drive go bad a couple of days ago, then this morning when I was going to remove that drive another one went bad. In my system, it's very hard to tell which actual physical drive is the bad one(design flaw), so I removed the one I thought it was. It was the wrong one, so I put it back in(yes I know, that was dumb...). At this point, my raid array was showing up as failed and the drive was showing up as spare. I rebooted the system in hopes that it would properly reassemble the raid array. Each time it, would simply start up the array, but it wouldn't run because it was missing a drive. The drive was /dev/sdc, which I could mdadm --add /dev/md0 /dev/sdc. But when I did that, it would always be spare.
I noticed that my event count was off on SDC, and SDC showed that there were a total of 23 drives(it was the spare), while my other drives were newer and showed the 19/22 drives. Every time I tried to assemble I got this:
Code:
gigantor:~# mdadm --assemble --force --scan --verbose
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdu: Device or resource busy
mdadm: /dev/sdu has wrong uuid.
mdadm: /dev/sdt is identified as a member of /dev/md0, slot 21.
mdadm: /dev/sds is identified as a member of /dev/md0, slot 20.
mdadm: /dev/sdr is identified as a member of /dev/md0, slot 18.
mdadm: /dev/sdq is identified as a member of /dev/md0, slot 17.
mdadm: /dev/sdp is identified as a member of /dev/md0, slot 16.
mdadm: /dev/sdo is identified as a member of /dev/md0, slot 13.
mdadm: /dev/sdn is identified as a member of /dev/md0, slot 12.
mdadm: /dev/sdm is identified as a member of /dev/md0, slot 10.
mdadm: /dev/sdl is identified as a member of /dev/md0, slot 9.
mdadm: /dev/sdk is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdj is identified as a member of /dev/md0, slot 15.
mdadm: /dev/sdi is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdh is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 19.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 8.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 14.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 22.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdb to /dev/md0 as 1
mdadm: no uptodate device for slot 2 of /dev/md0
mdadm: no uptodate device for slot 3 of /dev/md0
mdadm: added /dev/sdk to /dev/md0 as 4
mdadm: added /dev/sdi to /dev/md0 as 5
mdadm: added /dev/sdh to /dev/md0 as 6
mdadm: added /dev/sdf to /dev/md0 as 7
mdadm: added /dev/sde to /dev/md0 as 8
mdadm: added /dev/sdl to /dev/md0 as 9
mdadm: added /dev/sdm to /dev/md0 as 10
mdadm: no uptodate device for slot 11 of /dev/md0
mdadm: added /dev/sdn to /dev/md0 as 12
mdadm: added /dev/sdo to /dev/md0 as 13
mdadm: added /dev/sdd to /dev/md0 as 14
mdadm: added /dev/sdj to /dev/md0 as 15
mdadm: added /dev/sdp to /dev/md0 as 16
mdadm: added /dev/sdq to /dev/md0 as 17
mdadm: added /dev/sdr to /dev/md0 as 18
mdadm: added /dev/sdg to /dev/md0 as 19
mdadm: added /dev/sds to /dev/md0 as 20
mdadm: added /dev/sdt to /dev/md0 as 21
mdadm: added /dev/sdc to /dev/md0 as 22
mdadm: added /dev/sda to /dev/md0 as 0
mdadm: /dev/md0 assembled from 19 drives and 1 spare - not enough to start the array.
It appears all my superblocks are ok, its just that I can't get my array to recognize that sdc belongs in it at the proper spot. After much googling and experimentation, being careful not to try things like zeroizing superblocks, etc...I finally tried to recreate the array.
Code:
gigantor:/tmp/src/mdadm-3.2.5# mdadm --create /dev/md0 --assume-clean --level=6 --chunk=16 --metadata=0.90 --uuid=3cd93aff:18032678:261503f8:d1eb9e65 --raid-devices=22 /dev/sd[a-t] missing missing
mdadm: /dev/sda appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdb appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdc appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdd appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sde appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: partition table exists on /dev/sde but will be lost or
meaningless after creating array
mdadm: /dev/sdf appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdg appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdh appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdi appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdj appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdk appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdl appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdm appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdn appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdo appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdp appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdq appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdr appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sds appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
mdadm: /dev/sdt appears to be part of a raid array:
level=raid6 devices=22 ctime=Sun Dec 6 22:25:51 2009
Continue creating array? y
mdadm: array /dev/md0 started.
gigantor:/tmp/src/mdadm-3.2.5# mdadm -D /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Mon Oct 8 21:15:58 2012
Raid Level : raid6
Array Size : 29302769920 (27945.30 GiB 30006.04 GB)
Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
Raid Devices : 22
Total Devices : 20
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Oct 8 21:15:58 2012
State : clean, degraded
Active Devices : 20
Working Devices : 20
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 16K
UUID : 3cd93aff:18032678:6154c110:0f190746 (local to host gigantor)
Events : 0.1
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
2 8 32 2 active sync /dev/sdc
3 8 48 3 active sync /dev/sdd
4 8 64 4 active sync /dev/sde
5 8 80 5 active sync /dev/sdf
6 8 96 6 active sync /dev/sdg
7 8 112 7 active sync /dev/sdh
8 8 128 8 active sync /dev/sdi
9 8 144 9 active sync /dev/sdj
10 8 160 10 active sync /dev/sdk
11 8 176 11 active sync /dev/sdl
12 8 192 12 active sync /dev/sdm
13 8 208 13 active sync /dev/sdn
14 8 224 14 active sync /dev/sdo
15 8 240 15 active sync /dev/sdp
16 65 0 16 active sync /dev/sdq
17 65 16 17 active sync /dev/sdr
18 65 32 18 active sync /dev/sds
19 65 48 19 active sync /dev/sdt
20 0 0 20 removed
21 0 0 21 removed
When I tried to mount the array I got:
Code:
gigantor:/etc# mount -a
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
gigantor:/etc# dmesg | tail
[14304.524311] md0: detected capacity change from 0 to 30006036398080
[14304.525335] md0: unknown partition table
[14347.702810] FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[14347.704070] FAT: bogus number of reserved sectors
[14347.705203] VFS: Can't find a valid FAT filesystem on dev md0.
[14347.705368] qnx4: wrong fsid in superblock.
[14443.038299] FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[14443.039588] FAT: bogus number of reserved sectors
[14443.040778] VFS: Can't find a valid FAT filesystem on dev md0.
[14443.041002] qnx4: wrong fsid in superblock.
When I try a fsck /dev/md0:
Code:
gigantor:/etc# fsck /dev/md0
fsck from util-linux-ng 2.17.2
fsck.jfs version 1.1.12, 24-Aug-2007
processing started: 10/8/2012 21.37.25
Using default parameter: -p
The current device is: /dev/md0
The superblock does not describe a correct jfs file system.
If device /dev/md0 is valid and contains a jfs file system,
then both the primary and secondary superblocks are corrupt
and cannot be repaired, and fsck cannot continue.
Otherwise, make sure the entered device /dev/md0 is correct.
fdisk -l /dev/md0 shows:
Code:
gigantor:/tmp/src/mdadm-3.1.1# fdisk -l /dev/md0
Disk /dev/md0: 30006.0 GB, 30006036398080 bytes
2 heads, 4 sectors/track, -1 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 327680 bytes
Disk identifier: 0x00000000
Disk /dev/md0 doesn't contain a valid partition table
Although...I think that it probably correct.
cat /proc/mdstat shows:
Code:
gigantor:/etc# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active (auto-read-only) raid6 sdt[19] sds[18] sdr[17] sdq[16] sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
29302769920 blocks level 6, 16k chunk, algorithm 2 [22/20] [UUUUUUUUUUUUUUUUUUUU__]
unused devices: <none>
Can anyone help point me in the right direction? I'm thinking that my data is still there, but perhaps the array isn't being assembled in the correct order. I don't have a backup of this raid since it's over 20TB, my drive failures had some unfortunate timing...but I know that I still have 20 good drives. If i can get the raid running again, I can add 2 new drives and get the parity drives back up and synced.
Thanks for any help.
-Dan