LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   mdadm question. Disk failed 3 out of 5. (https://www.linuxquestions.org/questions/linux-newbie-8/mdadm-question-disk-failed-3-out-of-5-a-742807/)

ufmale 07-25-2009 07:51 PM

mdadm question. Disk failed 3 out of 5.
 
I set up a Raid 5 or 6 (not sure)
for five disks that i have using mdadm on redhat.

I just found out that the disks have failed for quite some time.
I guess the data are probably lost, but I want to check with any expert here who may be able to help me retriving the data.

I am sure the disks have not gone bad since it happened before.
if I took the failed disk and re-format them, i am sure I can still use it again. I think the problem is mdadm fails the disk somehow.

Code:

[root@eh3 /]# /sbin/mdadm --assemble /dev/md1 /dev/sd[g-k]1
mdadm: /dev/md1 assembled from 2 drives - not enough to start the array.
[root@eh3 /]# cat /proc/mdstat
Personalities :
md1 : inactive sdh1[3] sdi1[4] sdg1[1]
      1465151808 blocks
unused devices: <none>

Above is what is shown after I reboot the machine and try to assemble the disk. Can someone tell me which disks are failed?
Are they sdh1[3], sdi1[4], and sdg1[1]?

Are there anyway I can get the data out of these disks?

esaym 07-26-2009 01:47 PM

what does "fdisk -l" and "for i in `ls /dev/sd[g-k]`; do smartctl -a $i; done"
show? Sounds like the drives aren't even plugged in...

ufmale 07-27-2009 11:51 AM

sd[g-k]1 used to be the 5 disks for the md1. After reboot, I use "#fdisk -l" to check the disks. I can see all the disks, but mdadm refused to assemble them.

dxangel 07-27-2009 01:25 PM

what does mdadm --detail show?

ufmale 07-28-2009 08:18 AM

Quote:

Originally Posted by dxangel (Post 3621983)
what does mdadm --detail show?

It would not show any info since mdadm refused to assemble the disks.

Code:

[root@eh3 ~]# /sbin/mdadm --detail
mdadm: No devices given.
[root@eh3 ~]# /sbin/mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
[root@eh3 ~]# /sbin/mdadm --detail /dev/md1
mdadm: md device /dev/md1 does not appear to be active.
[root@eh3 ~]#


esaym 07-30-2009 07:38 AM

Try this:
mdadm --examine /dev/sd[g-k]1

You might be able to force it to start with:

/sbin/mdadm --assemble --force /dev/md1 /dev/sd[g-k]1
or
mdadm --assemble --scan --force

The run option might do something, not sure:
/sbin/mdadm --assemble --run --force /dev/md1 /dev/sd[g-k]1


There are alot of options for assemble mode, --force should do everything though.
http://man-wiki.net/index.php/8:mdadm

Post back results please :)

ufmale 08-03-2009 05:03 PM

Force assemble does not work. It said 3 disks are not sufficient to assemble.

Here is the result with --examine. I have no idea what it meant.


/sbin/mdadm --examine /dev/sd[g-k]1
/dev/sdg1:
Magic : a92b4efc
Version : 00.90.00
UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
Creation Time : Tue Mar 24 19:30:08 2009
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 1

Update Time : Thu Jun 4 08:04:34 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : 5774d686 - correct
Events : 0.391714

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 129 1 active sync /dev/sdi1
0 0 8 97 0 active sync /dev/sdg1
1 1 8 129 1 active sync /dev/sdi1
2 2 0 0 2 faulty removed
3 3 8 145 3 active sync
4 4 8 161 4 active sync
/dev/sdh1:
Magic : a92b4efc
Version : 00.90.00
UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
Creation Time : Tue Mar 24 19:30:08 2009
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 1

Update Time : Mon Jun 15 13:22:36 2009
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 3
Spare Devices : 0
Checksum : 5783a4d7 - correct
Events : 0.391714

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 145 3 active sync
0 0 8 97 0 active sync /dev/sdg1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 145 3 active sync
4 4 8 161 4 active sync
/dev/sdi1:
Magic : a92b4efc
Version : 00.90.00
UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
Creation Time : Tue Mar 24 19:30:08 2009
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 1

Update Time : Mon Jun 15 13:22:36 2009
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 3
Spare Devices : 0
Checksum : 5783a4e9 - correct
Events : 0.391714

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 4 8 161 4 active sync
0 0 8 97 0 active sync /dev/sdg1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 145 3 active sync
4 4 8 161 4 active sync
[root@eh3 /]#

esaym 08-03-2009 05:19 PM

That is only showing 3 disks, you are missing /dev/sdj1 and /dev/sdk1. Are the others even plugged in?

What does smartctl -a /dev/sdj && smartctl -a /dev/sdk say?
Also cat /proc/partitions

ufmale 08-10-2009 08:55 AM

Quote:

Originally Posted by esaym (Post 3630136)
That is only showing 3 disks, you are missing /dev/sdj1 and /dev/sdk1. Are the others even plugged in?

What does smartctl -a /dev/sdj && smartctl -a /dev/sdk say?
Also cat /proc/partitions

you are right.. I cannot believe I missed that.
Here is what I did further after reboot the machine and make sure
all the disk are seen by the fdisk.


Code:

[root@eh3]# /sbin/mdadm --examin /dev/sd[b-f]1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
  Creation Time : Tue Mar 24 19:30:08 2009
    Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
  Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Mon Aug 10 09:50:50 2009
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 3
  Spare Devices : 1
      Checksum : 5f428037 - correct
        Events : 0.62953324

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    0      8      17        0      active sync  /dev/sdb1
  0    0      8      17        0      active sync  /dev/sdb1
  1    1      0        0        1      faulty removed
  2    2      0        0        2      faulty removed
  3    3      8      65        3      active sync  /dev/sde1
  4    4      8      81        4      active sync  /dev/sdf1
  5    5      8      33        2      spare  /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
  Creation Time : Tue Mar 24 19:30:08 2009
    Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
  Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Mon Aug 10 09:50:50 2009
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 3
  Spare Devices : 1
      Checksum : 5f42804e - correct
        Events : 0.62953324

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    5      8      33        5      spare  /dev/sdc1
  0    0      8      17        0      active sync  /dev/sdb1
  1    1      0        0        1      faulty removed
  2    2      0        0        2      faulty removed
  3    3      8      65        3      active sync  /dev/sde1
  4    4      8      81        4      active sync  /dev/sdf1
  5    5      8      33        5      spare  /dev/sdc1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
  Creation Time : Tue Mar 24 19:30:08 2009
    Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
  Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Sat Aug  8 19:51:12 2009
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 1
      Checksum : 57d0a1b8 - correct
        Events : 0.748650

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    1      8      49        1      active sync  /dev/sdd1
  0    0      8      17        0      active sync  /dev/sdb1
  1    1      8      49        1      active sync  /dev/sdd1
  2    2      0        0        2      faulty removed
  3    3      8      65        3      active sync  /dev/sde1
  4    4      8      81        4      active sync  /dev/sdf1
  5    5      8      33        5      spare  /dev/sdc1
/dev/sde1:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
  Creation Time : Tue Mar 24 19:30:08 2009
    Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
  Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Mon Aug 10 09:50:50 2009
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 3
  Spare Devices : 1
      Checksum : 5f42806f - correct
        Events : 0.62953325

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    3      8      65        3      active sync  /dev/sde1
  0    0      8      17        0      active sync  /dev/sdb1
  1    1      0        0        1      faulty removed
  2    2      0        0        2      faulty removed
  3    3      8      65        3      active sync  /dev/sde1
  4    4      8      81        4      active sync  /dev/sdf1
  5    5      8      33        2      spare  /dev/sdc1
/dev/sdf1:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
  Creation Time : Tue Mar 24 19:30:08 2009
    Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
  Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Mon Aug 10 09:50:50 2009
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 3
  Spare Devices : 1
      Checksum : 5f428083 - correct
        Events : 0.62953326

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    4      8      81        4      active sync  /dev/sdf1
  0    0      8      17        0      active sync  /dev/sdb1
  1    1      0        0        1      faulty removed
  2    2      0        0        2      faulty removed
  3    3      8      65        3      active sync  /dev/sde1
  4    4      8      81        4      active sync  /dev/sdf1
  5    5      8      33        2      spare  /dev/sdc1

Also, What disks are actually failed here? is it sdd1?
or sdc1 and sdf1? I have some spare disks that I can add to the RAID.
Code:

[root@eh3]# cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 sdb1[0] sdc1[5] sdf1[4] sde1[3] sdd1[6](F)
      1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]

unused devices: <none>
[root@eh3]#


esaym 08-10-2009 01:01 PM

Looks like the array changed from /dev/md1 /dev/sd[g-k]1
to /dev/md0 /dev/sd[b-f]1 ?

Yes:

Quote:

md0 : active raid5 sdb1[0] sdc1[5] sdf1[4] sde1[3] sdd1[6](F)
1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]
Shows 5 disks with 2 missing "[5/3] [U__UU]" disk 6 "sdd1[6](F)" is failed, re-add it with


mdadm --add /dev/md1 /dev/sdd1

I take it that this is a 4 disk array with one spare?

ufmale 08-10-2009 01:43 PM

Quote:

Originally Posted by esaym (Post 3638172)
Looks like the array changed from /dev/md1 /dev/sd[g-k]1
to /dev/md0 /dev/sd[b-f]1 ?

Yes:



Show 5 disks with 2 missing "[5/3] [U__UU]" disk 6 "sdd1[6](F)" is failed, re-add it with


mdadm --add /dev/md1 /dev/sdd1

I take it that this is a 4 disk array with one spare?


I set up 2 RAID, md0 and md1. Both of them are currently failed.
amd i am trying to fix the md0 first.

I --remove /dev/sdd1 and --add /dev/sdd1 back. but it still still show
[U__UU]. However, the "sdd1[6]F" is gone. It just shows "sdd1[5]" The "F" is gone. How can we tell which disks are failed?
Code:

# cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 sdd1[5] sdb1[0] sdc1[6] sdf1[4] sde1[3]
      1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]

By the way, this is a 5-disk array, setting up with RAID5

ufmale 08-10-2009 02:21 PM

Quote:

Originally Posted by ufmale (Post 3638217)
I set up 2 RAID, md0 and md1. Both of them are currently failed.
amd i am trying to fix the md0 first.

I --remove /dev/sdd1 and --add /dev/sdd1 back. but it still still show
[U__UU]. However, the "sdd1[6]F" is gone. It just shows "sdd1[5]" The "F" is gone. How can we tell which disks are failed?
Code:

# cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 sdd1[5] sdb1[0] sdc1[6] sdf1[4] sde1[3]
      1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]

By the way, this is a 5-disk array, setting up with RAID5


I reboot the machine again and try to reassemble it,


Code:

# /sbin/mdadm --assemble --update=summaries  --force /dev/md0 /dev/sd[b-f]1
mdadm: /dev/md0 assembled from 3 drives and 2 spares - not enough to start the array.
# /sbin/mdadm --stop /dev/md0 [root@evvspeech3 charoe]# /sbin/mdadm --assemble --update=super-minor --run --force /dev/md0 /dev/sd[b-f]1
mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument
# cat /proc/mdstat
Personalities : [raid5]
md0 : inactive sdb1[0] sdc1[6] sdd1[5] sdf1[4] sde1[3]
      2441919680 blocks
unused devices: <none>

I am really confused about the spare disks. I don't remember I setup any disk to be spare. Does mdadm assign it automatically?

esaym 08-10-2009 04:22 PM

Yes I think spares are automatically handled. If the array is working right, and you add another drive to it with --add, then it will be added as a spare. I don't know why you are now showing you have 2 spares. The only way to see spares is with /sbin/mdadm --examine /dev/sd[b-f]1. In your last post sdc was the spare. I guess you could try to add sdc and sdd back and try to reassemble. Any reason for the "--update=super-minor"? That updates the superblock of each drive. The superblock is the only place where array info is stored, so if that gets messed up....

ufmale 08-11-2009 02:21 PM

Quote:

Originally Posted by esaym (Post 3638392)
Yes I think spares are automatically handled. If the array is working right, and you add another drive to it with --add, then it will be added as a spare. I don't know why you are now showing you have 2 spares. The only way to see spares is with /sbin/mdadm --examine /dev/sd[b-f]1. In your last post sdc was the spare. I guess you could try to add sdc and sdd back and try to reassemble. Any reason for the "--update=super-minor"? That updates the superblock of each drive. The superblock is the only place where array info is stored, so if that gets messed up....


I wasn't sure what I was doing. I just tried different option including "--update=super-minor". Now after rebooting couple of times, it does not seem to be able to assemble at all. I checked with fdisks and saw all the disks are there.
I tried different things, take put one disk at a time, or take sdc and sdd out and put it back. Nothing works.

Code:

# /sbin/mdadm --assemble --force /dev/md0 /dev/sd[b-f]1
mdadm: /dev/md0 assembled from 0 drives and 1 spare - not enough to start the array.

Any more suggestion that I can try to test or work on it?

esaym 08-11-2009 02:48 PM

Quote:

Originally Posted by ufmale (Post 3639625)
I wasn't sure what I was doing. I just tried different option including "--update=super-minor". Now after rebooting couple of times, it does not seem to be able to assemble at all. I checked with fdisks and saw all the disks are there.
I tried different things, take put one disk at a time, or take sdc and sdd out and put it back. Nothing works.

Code:

# /sbin/mdadm --assemble --force /dev/md0 /dev/sd[b-f]1
mdadm: /dev/md0 assembled from 0 drives and 1 spare - not enough to start the array.

Any more suggestion that I can try to test or work on it?

It looks dead to me :(


All times are GMT -5. The time now is 08:18 AM.