mdadm question. Disk failed 3 out of 5.
I set up a Raid 5 or 6 (not sure)
for five disks that i have using mdadm on redhat. I just found out that the disks have failed for quite some time. I guess the data are probably lost, but I want to check with any expert here who may be able to help me retriving the data. I am sure the disks have not gone bad since it happened before. if I took the failed disk and re-format them, i am sure I can still use it again. I think the problem is mdadm fails the disk somehow. Code:
[root@eh3 /]# /sbin/mdadm --assemble /dev/md1 /dev/sd[g-k]1 Are they sdh1[3], sdi1[4], and sdg1[1]? Are there anyway I can get the data out of these disks? |
what does "fdisk -l" and "for i in `ls /dev/sd[g-k]`; do smartctl -a $i; done"
show? Sounds like the drives aren't even plugged in... |
sd[g-k]1 used to be the 5 disks for the md1. After reboot, I use "#fdisk -l" to check the disks. I can see all the disks, but mdadm refused to assemble them.
|
what does mdadm --detail show?
|
Quote:
Code:
[root@eh3 ~]# /sbin/mdadm --detail |
Try this:
mdadm --examine /dev/sd[g-k]1 You might be able to force it to start with: /sbin/mdadm --assemble --force /dev/md1 /dev/sd[g-k]1 or mdadm --assemble --scan --force The run option might do something, not sure: /sbin/mdadm --assemble --run --force /dev/md1 /dev/sd[g-k]1 There are alot of options for assemble mode, --force should do everything though. http://man-wiki.net/index.php/8:mdadm Post back results please :) |
Force assemble does not work. It said 3 disks are not sufficient to assemble.
Here is the result with --examine. I have no idea what it meant. /sbin/mdadm --examine /dev/sd[g-k]1 /dev/sdg1: Magic : a92b4efc Version : 00.90.00 UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e Creation Time : Tue Mar 24 19:30:08 2009 Raid Level : raid5 Device Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 1 Update Time : Thu Jun 4 08:04:34 2009 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : 5774d686 - correct Events : 0.391714 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 129 1 active sync /dev/sdi1 0 0 8 97 0 active sync /dev/sdg1 1 1 8 129 1 active sync /dev/sdi1 2 2 0 0 2 faulty removed 3 3 8 145 3 active sync 4 4 8 161 4 active sync /dev/sdh1: Magic : a92b4efc Version : 00.90.00 UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e Creation Time : Tue Mar 24 19:30:08 2009 Raid Level : raid5 Device Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 1 Update Time : Mon Jun 15 13:22:36 2009 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 3 Spare Devices : 0 Checksum : 5783a4d7 - correct Events : 0.391714 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 145 3 active sync 0 0 8 97 0 active sync /dev/sdg1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 145 3 active sync 4 4 8 161 4 active sync /dev/sdi1: Magic : a92b4efc Version : 00.90.00 UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e Creation Time : Tue Mar 24 19:30:08 2009 Raid Level : raid5 Device Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 1 Update Time : Mon Jun 15 13:22:36 2009 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 3 Spare Devices : 0 Checksum : 5783a4e9 - correct Events : 0.391714 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 4 8 161 4 active sync 0 0 8 97 0 active sync /dev/sdg1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 145 3 active sync 4 4 8 161 4 active sync [root@eh3 /]# |
That is only showing 3 disks, you are missing /dev/sdj1 and /dev/sdk1. Are the others even plugged in?
What does smartctl -a /dev/sdj && smartctl -a /dev/sdk say? Also cat /proc/partitions |
Quote:
Here is what I did further after reboot the machine and make sure all the disk are seen by the fdisk. Code:
[root@eh3]# /sbin/mdadm --examin /dev/sd[b-f]1 or sdc1 and sdf1? I have some spare disks that I can add to the RAID. Code:
[root@eh3]# cat /proc/mdstat |
Looks like the array changed from /dev/md1 /dev/sd[g-k]1
to /dev/md0 /dev/sd[b-f]1 ? Yes: Quote:
mdadm --add /dev/md1 /dev/sdd1 I take it that this is a 4 disk array with one spare? |
Quote:
I set up 2 RAID, md0 and md1. Both of them are currently failed. amd i am trying to fix the md0 first. I --remove /dev/sdd1 and --add /dev/sdd1 back. but it still still show [U__UU]. However, the "sdd1[6]F" is gone. It just shows "sdd1[5]" The "F" is gone. How can we tell which disks are failed? Code:
# cat /proc/mdstat |
Quote:
I reboot the machine again and try to reassemble it, Code:
# /sbin/mdadm --assemble --update=summaries --force /dev/md0 /dev/sd[b-f]1 |
Yes I think spares are automatically handled. If the array is working right, and you add another drive to it with --add, then it will be added as a spare. I don't know why you are now showing you have 2 spares. The only way to see spares is with /sbin/mdadm --examine /dev/sd[b-f]1. In your last post sdc was the spare. I guess you could try to add sdc and sdd back and try to reassemble. Any reason for the "--update=super-minor"? That updates the superblock of each drive. The superblock is the only place where array info is stored, so if that gets messed up....
|
Quote:
I wasn't sure what I was doing. I just tried different option including "--update=super-minor". Now after rebooting couple of times, it does not seem to be able to assemble at all. I checked with fdisks and saw all the disks are there. I tried different things, take put one disk at a time, or take sdc and sdd out and put it back. Nothing works. Code:
# /sbin/mdadm --assemble --force /dev/md0 /dev/sd[b-f]1 |
Quote:
|
All times are GMT -5. The time now is 08:18 AM. |