LinuxQuestions.org - mdadm question. Disk failed 3 out of 5.

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - mdadm question. Disk failed 3 out of 5. (https://www.linuxquestions.org/questions/linux-newbie-8/mdadm-question-disk-failed-3-out-of-5-a-742807/)

mdadm question. Disk failed 3 out of 5.

I set up a Raid 5 or 6 (not sure)
for five disks that i have using mdadm on redhat.

I just found out that the disks have failed for quite some time.
I guess the data are probably lost, but I want to check with any expert here who may be able to help me retriving the data.

I am sure the disks have not gone bad since it happened before.
if I took the failed disk and re-format them, i am sure I can still use it again. I think the problem is mdadm fails the disk somehow.

Code:

[root@eh3 /]# /sbin/mdadm --assemble /dev/md1 /dev/sd[g-k]1

mdadm: /dev/md1 assembled from 2 drives - not enough to start the array.

[root@eh3 /]# cat /proc/mdstat

Personalities :

md1 : inactive sdh1[3] sdi1[4] sdg1[1]

      1465151808 blocks

unused devices: <none>

Above is what is shown after I reboot the machine and try to assemble the disk. Can someone tell me which disks are failed?
Are they sdh1[3], sdi1[4], and sdg1[1]?

Are there anyway I can get the data out of these disks?

what does "fdisk -l" and "for i in `ls /dev/sd[g-k]`; do smartctl -a $i; done"
show? Sounds like the drives aren't even plugged in...

sd[g-k]1 used to be the 5 disks for the md1. After reboot, I use "#fdisk -l" to check the disks. I can see all the disks, but mdadm refused to assemble them.

what does mdadm --detail show?

Quote:

Originally Posted by dxangel (Post 3621983)

what does mdadm --detail show?

It would not show any info since mdadm refused to assemble the disks.

Code:

[root@eh3 ~]# /sbin/mdadm --detail

mdadm: No devices given.

[root@eh3 ~]# /sbin/mdadm --detail /dev/md0

mdadm: md device /dev/md0 does not appear to be active.

[root@eh3 ~]# /sbin/mdadm --detail /dev/md1

mdadm: md device /dev/md1 does not appear to be active.

[root@eh3 ~]#

Try this:
mdadm --examine /dev/sd[g-k]1

You might be able to force it to start with:

/sbin/mdadm --assemble --force /dev/md1 /dev/sd[g-k]1
or
mdadm --assemble --scan --force

The run option might do something, not sure:
/sbin/mdadm --assemble --run --force /dev/md1 /dev/sd[g-k]1

There are alot of options for assemble mode, --force should do everything though.
http://man-wiki.net/index.php/8:mdadm

Post back results please :)

Force assemble does not work. It said 3 disks are not sufficient to assemble.

Here is the result with --examine. I have no idea what it meant.

/sbin/mdadm --examine /dev/sd[g-k]1
/dev/sdg1:
Magic : a92b4efc
Version : 00.90.00
UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
Creation Time : Tue Mar 24 19:30:08 2009
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 1

Update Time : Thu Jun 4 08:04:34 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : 5774d686 - correct
Events : 0.391714

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 129 1 active sync /dev/sdi1
0 0 8 97 0 active sync /dev/sdg1
1 1 8 129 1 active sync /dev/sdi1
2 2 0 0 2 faulty removed
3 3 8 145 3 active sync
4 4 8 161 4 active sync
/dev/sdh1:
Magic : a92b4efc
Version : 00.90.00
UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
Creation Time : Tue Mar 24 19:30:08 2009
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 1

Update Time : Mon Jun 15 13:22:36 2009
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 3
Spare Devices : 0
Checksum : 5783a4d7 - correct
Events : 0.391714

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 145 3 active sync
0 0 8 97 0 active sync /dev/sdg1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 145 3 active sync
4 4 8 161 4 active sync
/dev/sdi1:
Magic : a92b4efc
Version : 00.90.00
UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e
Creation Time : Tue Mar 24 19:30:08 2009
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 1

Update Time : Mon Jun 15 13:22:36 2009
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 3
Spare Devices : 0
Checksum : 5783a4e9 - correct
Events : 0.391714

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 4 8 161 4 active sync
0 0 8 97 0 active sync /dev/sdg1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 145 3 active sync
4 4 8 161 4 active sync
[root@eh3 /]#

That is only showing 3 disks, you are missing /dev/sdj1 and /dev/sdk1. Are the others even plugged in?

What does smartctl -a /dev/sdj && smartctl -a /dev/sdk say?
Also cat /proc/partitions

Quote:

Originally Posted by esaym (Post 3630136)

That is only showing 3 disks, you are missing /dev/sdj1 and /dev/sdk1. Are the others even plugged in?

What does smartctl -a /dev/sdj && smartctl -a /dev/sdk say?
Also cat /proc/partitions

you are right.. I cannot believe I missed that.
Here is what I did further after reboot the machine and make sure
all the disk are seen by the fdisk.

Code:

[root@eh3]# /sbin/mdadm --examin /dev/sd[b-f]1

/dev/sdb1:

          Magic : a92b4efc

        Version : 00.90.00

          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e

  Creation Time : Tue Mar 24 19:30:08 2009

    Raid Level : raid5

    Device Size : 488383936 (465.76 GiB 500.11 GB)

  Raid Devices : 5

  Total Devices : 5

Preferred Minor : 0



    Update Time : Mon Aug 10 09:50:50 2009

          State : clean

 Active Devices : 3

Working Devices : 4

 Failed Devices : 3

  Spare Devices : 1

      Checksum : 5f428037 - correct

        Events : 0.62953324



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    0      8      17        0      active sync  /dev/sdb1

  0    0      8      17        0      active sync  /dev/sdb1

  1    1      0        0        1      faulty removed

  2    2      0        0        2      faulty removed

  3    3      8      65        3      active sync  /dev/sde1

  4    4      8      81        4      active sync  /dev/sdf1

  5    5      8      33        2      spare  /dev/sdc1

/dev/sdc1:

          Magic : a92b4efc

        Version : 00.90.00

          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e

  Creation Time : Tue Mar 24 19:30:08 2009

    Raid Level : raid5

    Device Size : 488383936 (465.76 GiB 500.11 GB)

  Raid Devices : 5

  Total Devices : 5

Preferred Minor : 0



    Update Time : Mon Aug 10 09:50:50 2009

          State : clean

 Active Devices : 3

Working Devices : 4

 Failed Devices : 3

  Spare Devices : 1

      Checksum : 5f42804e - correct

        Events : 0.62953324



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    5      8      33        5      spare  /dev/sdc1

  0    0      8      17        0      active sync  /dev/sdb1

  1    1      0        0        1      faulty removed

  2    2      0        0        2      faulty removed

  3    3      8      65        3      active sync  /dev/sde1

  4    4      8      81        4      active sync  /dev/sdf1

  5    5      8      33        5      spare  /dev/sdc1

/dev/sdd1:

          Magic : a92b4efc

        Version : 00.90.00

          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e

  Creation Time : Tue Mar 24 19:30:08 2009

    Raid Level : raid5

    Device Size : 488383936 (465.76 GiB 500.11 GB)

  Raid Devices : 5

  Total Devices : 5

Preferred Minor : 0



    Update Time : Sat Aug  8 19:51:12 2009

          State : clean

 Active Devices : 4

Working Devices : 5

 Failed Devices : 1

  Spare Devices : 1

      Checksum : 57d0a1b8 - correct

        Events : 0.748650



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    1      8      49        1      active sync  /dev/sdd1

  0    0      8      17        0      active sync  /dev/sdb1

  1    1      8      49        1      active sync  /dev/sdd1

  2    2      0        0        2      faulty removed

  3    3      8      65        3      active sync  /dev/sde1

  4    4      8      81        4      active sync  /dev/sdf1

  5    5      8      33        5      spare  /dev/sdc1

/dev/sde1:

          Magic : a92b4efc

        Version : 00.90.00

          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e

  Creation Time : Tue Mar 24 19:30:08 2009

    Raid Level : raid5

    Device Size : 488383936 (465.76 GiB 500.11 GB)

  Raid Devices : 5

  Total Devices : 5

Preferred Minor : 0



    Update Time : Mon Aug 10 09:50:50 2009

          State : clean

 Active Devices : 3

Working Devices : 4

 Failed Devices : 3

  Spare Devices : 1

      Checksum : 5f42806f - correct

        Events : 0.62953325



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    3      8      65        3      active sync  /dev/sde1

  0    0      8      17        0      active sync  /dev/sdb1

  1    1      0        0        1      faulty removed

  2    2      0        0        2      faulty removed

  3    3      8      65        3      active sync  /dev/sde1

  4    4      8      81        4      active sync  /dev/sdf1

  5    5      8      33        2      spare  /dev/sdc1

/dev/sdf1:

          Magic : a92b4efc

        Version : 00.90.00

          UUID : 9a108dd8:d8fd3620:df7c4ee0:aaa5350e

  Creation Time : Tue Mar 24 19:30:08 2009

    Raid Level : raid5

    Device Size : 488383936 (465.76 GiB 500.11 GB)

  Raid Devices : 5

  Total Devices : 5

Preferred Minor : 0



    Update Time : Mon Aug 10 09:50:50 2009

          State : clean

 Active Devices : 3

Working Devices : 4

 Failed Devices : 3

  Spare Devices : 1

      Checksum : 5f428083 - correct

        Events : 0.62953326



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    4      8      81        4      active sync  /dev/sdf1

  0    0      8      17        0      active sync  /dev/sdb1

  1    1      0        0        1      faulty removed

  2    2      0        0        2      faulty removed

  3    3      8      65        3      active sync  /dev/sde1

  4    4      8      81        4      active sync  /dev/sdf1

  5    5      8      33        2      spare  /dev/sdc1

Also, What disks are actually failed here? is it sdd1?
or sdc1 and sdf1? I have some spare disks that I can add to the RAID.

Code:

[root@eh3]# cat /proc/mdstat

Personalities : [raid5]

md0 : active raid5 sdb1[0] sdc1[5] sdf1[4] sde1[3] sdd1[6](F)

      1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]



unused devices: <none>

[root@eh3]#

Looks like the array changed from /dev/md1 /dev/sd[g-k]1
to /dev/md0 /dev/sd[b-f]1 ?

Yes:

Quote:

md0 : active raid5 sdb1[0] sdc1[5] sdf1[4] sde1[3] sdd1[6](F)
1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]

Shows 5 disks with 2 missing "[5/3] [U__UU]" disk 6 "sdd1[6](F)" is failed, re-add it with

mdadm --add /dev/md1 /dev/sdd1

I take it that this is a 4 disk array with one spare?

Quote:

Originally Posted by esaym (Post 3638172)

Looks like the array changed from /dev/md1 /dev/sd[g-k]1
to /dev/md0 /dev/sd[b-f]1 ?

Yes:

Show 5 disks with 2 missing "[5/3] [U__UU]" disk 6 "sdd1[6](F)" is failed, re-add it with

mdadm --add /dev/md1 /dev/sdd1

I take it that this is a 4 disk array with one spare?

I set up 2 RAID, md0 and md1. Both of them are currently failed.
amd i am trying to fix the md0 first.

I --remove /dev/sdd1 and --add /dev/sdd1 back. but it still still show
[U__UU]. However, the "sdd1[6]F" is gone. It just shows "sdd1[5]" The "F" is gone. How can we tell which disks are failed?

Code:

# cat /proc/mdstat

Personalities : [raid5]

md0 : active raid5 sdd1[5] sdb1[0] sdc1[6] sdf1[4] sde1[3]

      1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]

By the way, this is a 5-disk array, setting up with RAID5

Quote:

Originally Posted by ufmale (Post 3638217)

Code:

# cat /proc/mdstat

Personalities : [raid5]

md0 : active raid5 sdd1[5] sdb1[0] sdc1[6] sdf1[4] sde1[3]

      1953535744 blocks level 5, 64k chunk, algorithm 2 [5/3] [U__UU]

By the way, this is a 5-disk array, setting up with RAID5

I reboot the machine again and try to reassemble it,

Code:

# /sbin/mdadm --assemble --update=summaries  --force /dev/md0 /dev/sd[b-f]1

mdadm: /dev/md0 assembled from 3 drives and 2 spares - not enough to start the array.

# /sbin/mdadm --stop /dev/md0 [root@evvspeech3 charoe]# /sbin/mdadm --assemble --update=super-minor --run --force /dev/md0 /dev/sd[b-f]1

mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument

# cat /proc/mdstat

Personalities : [raid5]

md0 : inactive sdb1[0] sdc1[6] sdd1[5] sdf1[4] sde1[3]

      2441919680 blocks

unused devices: <none>

I am really confused about the spare disks. I don't remember I setup any disk to be spare. Does mdadm assign it automatically?

Yes I think spares are automatically handled. If the array is working right, and you add another drive to it with --add, then it will be added as a spare. I don't know why you are now showing you have 2 spares. The only way to see spares is with /sbin/mdadm --examine /dev/sd[b-f]1. In your last post sdc was the spare. I guess you could try to add sdc and sdd back and try to reassemble. Any reason for the "--update=super-minor"? That updates the superblock of each drive. The superblock is the only place where array info is stored, so if that gets messed up....

Quote:

Originally Posted by esaym (Post 3638392)

I wasn't sure what I was doing. I just tried different option including "--update=super-minor". Now after rebooting couple of times, it does not seem to be able to assemble at all. I checked with fdisks and saw all the disks are there.
I tried different things, take put one disk at a time, or take sdc and sdd out and put it back. Nothing works.

Code:

# /sbin/mdadm --assemble --force /dev/md0 /dev/sd[b-f]1

mdadm: /dev/md0 assembled from 0 drives and 1 spare - not enough to start the array.

Any more suggestion that I can try to test or work on it?

Quote:

Originally Posted by ufmale (Post 3639625)

Code:

# /sbin/mdadm --assemble --force /dev/md0 /dev/sd[b-f]1

mdadm: /dev/md0 assembled from 0 drives and 1 spare - not enough to start the array.

Any more suggestion that I can try to test or work on it?

It looks dead to me :(