LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   RAID-6 issues (https://www.linuxquestions.org/questions/linux-software-2/raid-6-issues-866226/)

BTG308 03-03-2011 09:02 AM

RAID-6 issues
 
I have a 12-disk RAID-6 array setup on commodity hardware. It's been running fine for a few weeks until yesterday when one of the disks failed. I suspected a faulty cable, so I replaced it. While I was doing that, I noticed that I had put the cables in the "wrong" order when installing, so I swapped them around since I wanted to know which disk was connected to which interface. I thought the RAID would use the disk's UUIDs only and not really care which port they were on. When I brought the array back up, it found 10 disks and one spare (the faulty one) with one disk out of the array. I tried adding the lone disk and let it run for a while. Next I looked at the reconstruction, it was counting up time remaining. Re-tried, same thing. Around here, my old Windows roots took over and made me reboot. I guess I thought the kernel was confused and wanted to re-read the disks or something. When it came back up it found 8 disks, two spares, no missing. I went to bed.

Today, I swapped the two disks whose cables were swapped and tried again, now it finds 10 of the disks, all spares:

Code:

root@baloo:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdl1[12](S) sdg1[5](S) sdh1[8](S) sdb1[1](S) sdk1[9](S) sdd1[7](S) sdc1[2](S) sdm1[13](S) sdf1[4](S) sdi1[6](S)
      4883864320 blocks
     
unused devices: <none>

I figured I'd try pushing a little harder to see what happened:

Code:

root@baloo:~# mdadm --assemble --force /dev/md0
mdadm: forcing event count in /dev/sdd1(7) from 68545 upto 68574
mdadm: Cannot open /dev/sdj1: Device or resource busy

dmsetup table was clean, so I thought maybe sdj1 needed an even harder nudge and zeroed it's superblock:

Code:

root@baloo:~# mdadm --misc --zero-superblock /dev/sdj1
root@baloo:~# mdadm --assemble --force /dev/md0
mdadm: clearing FAULTY flag for device 7 in /dev/md0 for /dev/sdd1
mdadm: SET_ARRAY_INFO failed for /dev/md0: Device or resource busy

Oh dear. (sdd would be the previously faulty disk, that may or may not be a cable error.) Right about now, I finally realize that I am trying very hard to dig myself out of a hole. So, let's see were we're at right now:

Code:

root@baloo:~# mdadm --assemble --force /dev/md0  --update=summaries --verbose
mdadm: looking for devices for /dev/md0
mdadm: no RAID superblock on /dev/sdj1
mdadm: /dev/sdj1 has wrong raid level.
mdadm: /dev/dm-3 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/dm-2 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/dm-1 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/dm-0 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdm is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdl is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdk is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: no RAID superblock on /dev/sdj1
mdadm: /dev/sdj1 has wrong raid level.
mdadm: /dev/sdj is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdi is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdh is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdg is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdf is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sde1 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sde is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdd is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdc is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdb is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sda5 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sda2 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sda1 is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sda is not one of /dev/sdd1,/dev/sdh1,/dev/sdi1,/dev/sdj1,/dev/sdl1,/dev/sdb1,/dev/sdc1,/dev/sdf1,/dev/sdg1,/dev/sdg1,/dev/sdm1,/dev/sdk1,/dev/sdn1
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdm1 is identified as a member of /dev/md0, slot 13.
mdadm: /dev/sdk1 is identified as a member of /dev/md0, slot 9.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 8.
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdl1 is identified as a member of /dev/md0, slot 12.
Segmentation fault

Code:

root@baloo:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdl1[12](S) sdi1[6](S) sdh1[8](S) sdd1[7](S) sdc1[2](S) sdm1[13](S) sdb1[1](S) sdk1[9](S) sdg1[5](S) sdf1[4](S)
      4883864320 blocks
     
unused devices: <none>
root@baloo:~# uname -a
Linux baloo 2.6.35-27-server #48-Ubuntu SMP Tue Feb 22 21:53:16 UTC 2011 x86_64 GNU/Linux
root@baloo:~# mdadm --version
mdadm - v2.6.7.1 - 15th October 2008

Code:

[ 2897.030447] md: bind<sdf1>
[ 2897.055035] md: bind<sdg1>
[ 2897.101455] md: bind<sdk1>
[ 2897.118990] md: bind<sdb1>
[ 2897.148076] md: bind<sdm1>
[ 2897.333941] md: bind<sdc1>
[ 2897.525613] md: bind<sdd1>
[ 2897.573990] md: bind<sdh1>
[ 2898.036870] mdadm[3389]: segfault at 4 ip 000000000041823d sp 00007fff1f1c7ed0 error 4 in mdadm[400000+2a000]
[ 2898.044518] md: bind<sdi1>
[ 2898.246323] md: bind<sdl1>


What I would like to do is force an assembly of all disks, without risking a re-sync since I'm pretty sure at least 11 of the 12 disks have good data. Any ideas?


All times are GMT -5. The time now is 05:35 PM.