Hey guys, I'd like some help recovering from a failed software Raid-5 setup. The raid-5 setup is on an embedded linux NAS (the Bufallo Terastation Pro, if anyone's familiar with it), so I can't really give all that many details as to the distro, version, setup, etc. All of that is hidden and proprietary. The tech support told me that all I can do is scrap my data, but this is stupid... they're manufacturing a redundant data server; they should know better.
Anyways, a hacked firmware does allow me to telnet into the device as root (and void my warrenty probably, but whatever), so if any pertinent information is discoverable, I can attempt to reverse engineer this thing if you tell me what to do (my linux experience is about a few month's worth... enough to get by but lacking in the deeper understandings of things). Google has been surprisingly unhelpful in finding a comprehensive tutorial on troubleshooting a raid configuration, so I'm hoping someone here can help me.
Anyways, here's what I do know about the setup: it uses four 500gb hard-drives in a RAID-5 configuration, and the raid arrays are mounted as md devices. [edit]The file system is XFS.[/edit] There's two main partitions of interest: /md0 is a system partition and /md1 is the partition of data that I'm trying to recover. I suspect the problem is a bricked superblock, but I'm not quite sure on how to recover from that.
Here's what I've discovered by poking around with mdadm. Looking at the system partition...
Code:
root@HAXD_HELPER:/etc# mdadm --examine /dev/md0
mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got 00000000)
root@HAXD_HELPER:/etc# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.02
Creation Time : Sat Jan 14 12:32:49 2006
Raid Level : raid1
Array Size : 385408 (376.38 MiB 394.66 MB)
Device Size : 385408 (376.38 MiB 394.66 MB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Jun 6 21:26:53 2007
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
UUID : e87531ac:9fe1f96a:121f55a1:1220867e
Events : 0.110
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 17 3 active sync /dev/sdb1
I may not be understanding it correctly (or just not knowing what a good working config looks like), but it seems that all the --details are fine while the --examine says uh-oh. This is also weird since this is supposed to be the system partition (and the system works since, well, I'm in it and running commands), but it supposedly has a bad superblock.
Anyways, there's probably some implementation magic that makes things happen. Thats not too important. I'm really just concerned about my data, which is on /md1.
Code:
root@HAXD_HELPER:/etc# mdadm --examine /dev/md1
mdadm: No super block found on /dev/md1 (Expected magic a92b4efc, got 7d7d7d7d)
root@HAXD_HELPER:/etc# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.02
Creation Time : Tue Dec 27 16:09:40 2005
Raid Level : raid5
Array Size : 1462862592 (1395.09 GiB 1497.97 GB)
Device Size : 487620864 (465.03 GiB 499.32 GB)
Raid Devices : 4
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Wed Jun 6 22:22:04 2007
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 37d97fb5:083ede07:8d3e9c16:0f299b85
Events : 0.300
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
What concerns me here are the lines that say there are 4 raid devices, but only 1 total device. The md device doesn't have a good superblock, but when I --examine the individual sd*3 partitions, they do appear to have good superblocks, so this makes me think that all hope is not yet lost...
Code:
root@HAXD_HELPER:/etc# mdadm -E /dev/sd[abcd]3
/dev/sda3:
Magic : a92b4efc
Version : 00.90.02
UUID : 37d97fb5:083ede07:8d3e9c16:0f299b85
Creation Time : Tue Dec 27 16:09:40 2005
Raid Level : raid5
Raid Devices : 4
Total Devices : 1
Preferred Minor : 1
Update Time : Wed Jun 6 22:22:04 2007
State : active
Active Devices : 4
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Checksum : 2cd505c9 - correct
Events : 0.300
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
/dev/sdb3:
Magic : a92b4efc
Version : 00.90.02
UUID : 37d97fb5:083ede07:8d3e9c16:0f299b85
Creation Time : Tue Dec 27 16:09:40 2005
Raid Level : raid5
Raid Devices : 4
Total Devices : 1
Preferred Minor : 1
Update Time : Wed Jun 6 22:22:04 2007
State : active
Active Devices : 4
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Checksum : 2cd505db - correct
Events : 0.300
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 19 1 active sync /dev/sdb3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
/dev/sdc3:
Magic : a92b4efc
Version : 00.90.02
UUID : 37d97fb5:083ede07:8d3e9c16:0f299b85
Creation Time : Tue Dec 27 16:09:40 2005
Raid Level : raid5
Raid Devices : 4
Total Devices : 1
Preferred Minor : 1
Update Time : Wed Jun 6 22:22:04 2007
State : active
Active Devices : 4
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Checksum : 2cd505ed - correct
Events : 0.300
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 35 2 active sync /dev/sdc3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
/dev/sdd3:
Magic : a92b4efc
Version : 00.90.02
UUID : 37d97fb5:083ede07:8d3e9c16:0f299b85
Creation Time : Tue Dec 27 16:09:40 2005
Raid Level : raid5
Raid Devices : 4
Total Devices : 1
Preferred Minor : 1
Update Time : Wed Jun 6 22:22:04 2007
State : active
Active Devices : 4
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Checksum : 2cd505ff - correct
Events : 0.300
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 51 3 active sync /dev/sdd3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 active sync /dev/sdc3
3 3 8 51 3 active sync /dev/sdd3
Soo... it seems to me like the individual sd*3 devices have the right superblock info, but the superblock info on the md1 device got bust. Is there any way I can tell the md1 device to look at the individual sd*3 devices for its superblock? I'm not sure how to phrase this in terms of proper raid/mdadm terminology (or if I even have the right idea).
Finally, it may help to figure out how these devices are scripted to be setup at boot-time. Again, this is a embedded linux NAS device, so all of this is hidden and would have to be reverse-engineered. I've been told that creating a /initrd directory un-hides all of the boot-time scripts/ramdisk (and indeed this is true for my device), but I have no idea what to look for in here.
Any help from a raid guru would be infinitely helpful.