LinuxQuestions.org - Recovering a Raid 5 array, mdadm mess-up

I've been searching through this site for some raid answers, but found nothing specific to my problem. This is my first post, so here goes :)

I have a debian Etch server, and my /home partition is set up as a RAID 5 array, with four SATA 250GB disks (750GB total). I recently returned from vacation and found that the machine was locked. After rebooting, /home did not mount. Here's what showed in syslog:

Code:

Sep 12 05:18:42 workshop kernel: md: bind<sdb1>

Sep 12 05:18:42 workshop kernel: md: bind<sda1>

Sep 12 05:18:42 workshop kernel: md: bind<sdd1>

Sep 12 05:18:42 workshop kernel: md: bind<sdc1>

Sep 12 05:18:42 workshop kernel: md: kicking non-fresh sda1 from array!

Sep 12 05:18:42 workshop kernel: md: unbind<sda1>

Sep 12 05:18:42 workshop kernel: md: export_rdev(sda1)

Sep 12 05:18:42 workshop kernel: md: kicking non-fresh sdb1 from array!

Sep 12 05:18:42 workshop kernel: md: unbind<sdb1>

Sep 12 05:18:42 workshop kernel: md: export_rdev(sdb1)

Sep 12 05:18:42 workshop kernel: md: md0: raid array is not clean -- starting backgro

und reconstruction

Sep 12 05:18:42 workshop kernel: raid5: device sdc1 operational as raid disk 2

Sep 12 05:18:42 workshop kernel: raid5: device sdd1 operational as raid disk 3

Sep 12 05:18:42 workshop kernel: raid5: not enough operational devices for md0 (2/4 f

ailed)

Sep 12 05:18:42 workshop kernel: RAID5 conf printout:

Sep 12 05:18:42 workshop kernel:  --- rd:4 wd:2 fd:2

Sep 12 05:18:42 workshop kernel:  disk 2, o:1, dev:sdc1

Sep 12 05:18:42 workshop kernel:  disk 3, o:1, dev:sdd1

Sep 12 05:18:42 workshop kernel: raid5: failed to run raid set md0

Sep 12 05:18:42 workshop kernel: md: pers->run() failed ...

Sep 12 05:18:42 workshop kernel: Attempting manual resume

Sep 12 05:18:42 workshop kernel: EXT3-fs: INFO: recovery required on readonly filesys

tem.

Sep 12 05:18:42 workshop kernel: EXT3-fs: write access will be enabled during recover

y.

So, it seemed that two out of the four disks were failed. I was hoping that the drives overheated, perhaps the machine was not cleanly rebooted, etc. Two drives out of four drive raid5 set is not good.

I captured the output of mdadm --examine for all the disks:

Code:

/dev/sda1:

          Magic : a92b4efc

        Version : 00.90.03

          UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c

  Creation Time : Sat Apr 22 22:55:01 2006

    Raid Level : raid5

    Device Size : 244195904 (232.88 GiB 250.06 GB)

    Array Size : 732587712 (698.65 GiB 750.17 GB)

  Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Mon Sep  3 13:00:35 2007

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

      Checksum : e679baca - correct

        Events : 0.2488136



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    1      8        1        1      active sync  /dev/sda1



  0    0      8      17        0      active sync  /dev/sdb1

  1    1      8        1        1      active sync  /dev/sda1

  2    2      8      33        2      active sync  /dev/sdc1

  3    3      8      49        3      active sync  /dev/sdd1

/dev/sdb1:

          Magic : a92b4efc

        Version : 00.90.03

          UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c

  Creation Time : Sat Apr 22 22:55:01 2006

    Raid Level : raid5

    Device Size : 244195904 (232.88 GiB 250.06 GB)

    Array Size : 732587712 (698.65 GiB 750.17 GB)

  Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Mon Sep  3 13:00:35 2007

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

      Checksum : e679bad8 - correct

        Events : 0.2488136



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    0      8      17        0      active sync  /dev/sdb1



  0    0      8      17        0      active sync  /dev/sdb1

  1    1      8        1        1      active sync  /dev/sda1

  2    2      8      33        2      active sync  /dev/sdc1

  3    3      8      49        3      active sync  /dev/sdd1

/dev/sdc1:

          Magic : a92b4efc

        Version : 00.90.03

          UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c

  Creation Time : Sat Apr 22 22:55:01 2006

    Raid Level : raid5

    Device Size : 244195904 (232.88 GiB 250.06 GB)

    Array Size : 732587712 (698.65 GiB 750.17 GB)

  Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Mon Sep  3 13:02:51 2007

          State : active

 Active Devices : 2

Working Devices : 2

 Failed Devices : 1

  Spare Devices : 0

      Checksum : e653c444 - correct

        Events : 0.2488139



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    2      8      33        2      active sync  /dev/sdc1



  0    0      0        0        0      removed

  1    1      0        0        1      faulty removed

  2    2      8      33        2      active sync  /dev/sdc1

  3    3      8      49        3      active sync  /dev/sdd1

/dev/sdd1:

          Magic : a92b4efc

        Version : 00.90.03

          UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c

  Creation Time : Sat Apr 22 22:55:01 2006

    Raid Level : raid5

    Device Size : 244195904 (232.88 GiB 250.06 GB)

    Array Size : 732587712 (698.65 GiB 750.17 GB)

  Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Mon Sep  3 13:02:51 2007

          State : active

 Active Devices : 2

Working Devices : 2

 Failed Devices : 1

  Spare Devices : 0

      Checksum : e653c456 - correct

        Events : 0.2488139



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    3      8      49        3      active sync  /dev/sdd1



  0    0      0        0        0      removed

  1    1      0        0        1      faulty removed

  2    2      8      33        2      active sync  /dev/sdc1

  3    3      8      49        3      active sync  /dev/sdd1

Notice that the different disks had a different idea of what the state of the array was. I hoped that at worst, there was only one faulty disk.

I decided from the above output that I should try to reassemble the array. In the past, mdadm was pretty smart about trying to resync the disks. However, I made a big mistake. I typed the following command:

Code:

# mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sd[a-d]

So, madam took a long time to rebuild the array, and then I could not mount it. I tried to reboot, no help. Here's the error from mount:

Code:

# mount /home

mount: wrong fs type, bad option, bad superblock on /dev/md0,

      missing codepage or other error

      In some cases useful info is found in syslog - try

      dmesg | tail  or so

Looking at /proc/mdstat:

Code:

# cat /proc/mdstat 

Personalities : [raid6] [raid5] [raid4] 

md0 : active raid5 sda[0] sdd[3] sdc[2] sdb[1]

      732595392 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

      

unused devices: <none>

In horror, I realized that mdadm had built the array using the whole disks, instead of partitions. I wanted /dev/sda1, /dev/sdb1, etc ... NOT /dev/sda, /dev/sdb, etc!

Here's where I get really confused. If I look at the disks with fdisk, the partitions are still there, but two of them are just regular linux partitions (not raid autodetect):

Code:

 $ fdisk -l /dev/sda



Disk /dev/sda: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes



  Device Boot      Start        End      Blocks  Id  System

/dev/sda1              1      30401  244196001  fd  Linux raid autodetect



 $ fdisk -l /dev/sdb



Disk /dev/sdb: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes



  Device Boot      Start        End      Blocks  Id  System

/dev/sdb1              1      30401  244196032  83  Linux



$ fdisk -l /dev/sdc



Disk /dev/sdc: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes



  Device Boot      Start        End      Blocks  Id  System

/dev/sdc1              1      30401  244196001  fd  Linux raid autodetect



$ fdisk -l /dev/sdd



Disk /dev/sdd: 250.0 GB, 250059350016 bytes

255 heads, 63 sectors/track, 30401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes



  Device Boot      Start        End      Blocks  Id  System

/dev/sdd1              1      30401  244196032  83  Linux

But it gets even stranger... I no longer see the partitions in /dev:

Code:

$ ls /dev/sd*

/dev/sda  /dev/sdb  /dev/sdc  /dev/sdd

And when I try to assemble the array now, mdadm can't find those old partitions:

Code:

$ mdadm --assemble /dev/md0 --verbose /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

mdadm: looking for devices for /dev/md0

mdadm: cannot open device /dev/sda1: No such file or directory

mdadm: /dev/sda1 has no superblock - assembly aborted

So, I'm in a real bind. I don't know if my data is still on the drives (and of course, I REALLY want to recover it, only some of it is backed up). I can't see the old partitions on the drives, despite the fact that fdisk does see something.

Is it possible that my mdadm --create command wiped my disks somehow? I though mdadm was careful to check for existing raid partitions!

Any help would be greatly appreciated!