I've been searching through this site for some raid answers, but found nothing specific to my problem. This is my first post, so here goes :)
I have a debian Etch server, and my /home partition is set up as a RAID 5 array, with four SATA 250GB disks (750GB total). I recently returned from vacation and found that the machine was locked. After rebooting, /home did not mount. Here's what showed in syslog:
Code:
Sep 12 05:18:42 workshop kernel: md: bind<sdb1>
Sep 12 05:18:42 workshop kernel: md: bind<sda1>
Sep 12 05:18:42 workshop kernel: md: bind<sdd1>
Sep 12 05:18:42 workshop kernel: md: bind<sdc1>
Sep 12 05:18:42 workshop kernel: md: kicking non-fresh sda1 from array!
Sep 12 05:18:42 workshop kernel: md: unbind<sda1>
Sep 12 05:18:42 workshop kernel: md: export_rdev(sda1)
Sep 12 05:18:42 workshop kernel: md: kicking non-fresh sdb1 from array!
Sep 12 05:18:42 workshop kernel: md: unbind<sdb1>
Sep 12 05:18:42 workshop kernel: md: export_rdev(sdb1)
Sep 12 05:18:42 workshop kernel: md: md0: raid array is not clean -- starting backgro
und reconstruction
Sep 12 05:18:42 workshop kernel: raid5: device sdc1 operational as raid disk 2
Sep 12 05:18:42 workshop kernel: raid5: device sdd1 operational as raid disk 3
Sep 12 05:18:42 workshop kernel: raid5: not enough operational devices for md0 (2/4 f
ailed)
Sep 12 05:18:42 workshop kernel: RAID5 conf printout:
Sep 12 05:18:42 workshop kernel: --- rd:4 wd:2 fd:2
Sep 12 05:18:42 workshop kernel: disk 2, o:1, dev:sdc1
Sep 12 05:18:42 workshop kernel: disk 3, o:1, dev:sdd1
Sep 12 05:18:42 workshop kernel: raid5: failed to run raid set md0
Sep 12 05:18:42 workshop kernel: md: pers->run() failed ...
Sep 12 05:18:42 workshop kernel: Attempting manual resume
Sep 12 05:18:42 workshop kernel: EXT3-fs: INFO: recovery required on readonly filesys
tem.
Sep 12 05:18:42 workshop kernel: EXT3-fs: write access will be enabled during recover
y.
So, it seemed that two out of the four disks were failed. I was hoping that the drives overheated, perhaps the machine was not cleanly rebooted, etc. Two drives out of four drive raid5 set is
not good.
I captured the output of mdadm --examine for all the disks:
Code:
/dev/sda1:
Magic : a92b4efc
Version : 00.90.03
UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
Creation Time : Sat Apr 22 22:55:01 2006
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 732587712 (698.65 GiB 750.17 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Mon Sep 3 13:00:35 2007
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : e679baca - correct
Events : 0.2488136
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 1 1 active sync /dev/sda1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 1 1 active sync /dev/sda1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.03
UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
Creation Time : Sat Apr 22 22:55:01 2006
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 732587712 (698.65 GiB 750.17 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Mon Sep 3 13:00:35 2007
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : e679bad8 - correct
Events : 0.2488136
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 1 1 active sync /dev/sda1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.03
UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
Creation Time : Sat Apr 22 22:55:01 2006
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 732587712 (698.65 GiB 750.17 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Mon Sep 3 13:02:51 2007
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : e653c444 - correct
Events : 0.2488139
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 33 2 active sync /dev/sdc1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.03
UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
Creation Time : Sat Apr 22 22:55:01 2006
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 732587712 (698.65 GiB 750.17 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Mon Sep 3 13:02:51 2007
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : e653c456 - correct
Events : 0.2488139
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 49 3 active sync /dev/sdd1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
Notice that the different disks had a different idea of what the state of the array was. I hoped that at worst, there was only one faulty disk.
I decided from the above output that I should try to reassemble the array. In the past, mdadm was pretty smart about trying to resync the disks.
However, I made a big mistake. I typed the following command:
Code:
# mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sd[a-d]
So, madam took a long time to rebuild the array, and then I could not mount it. I tried to reboot, no help. Here's the error from mount:
Code:
# mount /home
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
Looking at /proc/mdstat:
Code:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda[0] sdd[3] sdc[2] sdb[1]
732595392 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
In horror, I realized that mdadm had built the array using the whole disks, instead of partitions. I wanted /dev/sda1, /dev/sdb1, etc ...
NOT /dev/sda, /dev/sdb, etc!
Here's where I get really confused. If I look at the disks with fdisk, the partitions are still there, but two of them are just regular linux partitions (not raid autodetect):
Code:
$ fdisk -l /dev/sda
Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 30401 244196001 fd Linux raid autodetect
$ fdisk -l /dev/sdb
Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 30401 244196032 83 Linux
$ fdisk -l /dev/sdc
Disk /dev/sdc: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 1 30401 244196001 fd Linux raid autodetect
$ fdisk -l /dev/sdd
Disk /dev/sdd: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 1 30401 244196032 83 Linux
But it gets even stranger... I no longer see the partitions in /dev:
Code:
$ ls /dev/sd*
/dev/sda /dev/sdb /dev/sdc /dev/sdd
And when I try to assemble the array now, mdadm can't find those old partitions:
Code:
$ mdadm --assemble /dev/md0 --verbose /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sda1: No such file or directory
mdadm: /dev/sda1 has no superblock - assembly aborted
So, I'm in a real bind. I don't know if my data is still on the drives (and of course, I REALLY want to recover it, only some of it is backed up). I can't see the old partitions on the drives, despite the fact that fdisk does see something.
Is it possible that my mdadm --create command wiped my disks somehow? I though mdadm was careful to check for existing raid partitions!
Any help would be greatly appreciated!