LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 02-26-2014, 11:38 AM   #1
elcattivo
LQ Newbie
 
Registered: Feb 2014
Posts: 2

Rep: Reputation: Disabled
Raid 5 troubles because of failed disk(s)


Hi!

--- SOLVED ---

See bottom for solution.


As you can see i'm new to this forum. Mainly because up until now my open media vault just worked and if it didn't I was able to troubleshoot with google, PHD.

But now I ran into a problem which I can't solve...

I'm running open media vault with 1 disk for OS and 4 disks (incl. 1 spare) in a raid 5 array.

I had a failed disk some time ago but the spare took over, no problem. Since I wanted to upgrade from 2 TB disks to 3 TB disks I didn't exchange the failed disk right away because I wanted to look up how to do this transition properly. Now I know... replace disk, rebuild, rinse, grow...
But this information was too late. Now it appears, that at least one other disk has gone faulty on me and the whole raid isn't starting up.

In /var/log/messages I found some entries which lead me to the following thread:
http://www.linuxquestions.org/questi...-array-416853/

Outputs from my system:

Code:
[    1.527927] ata1: PATA max UDMA/100 cmd 0xd480 ctl 0xd400 bmdma 0xd800 irq 17
[    1.527933] ata2: PATA max UDMA/100 cmd 0xdc00 ctl 0xd880 bmdma 0xd808 irq 17
[    1.617953] ata3: SATA max UDMA/133 cmd 0xb800 ctl 0xc080 bmdma 0xb880 irq 19
[    1.617961] ata4: SATA max UDMA/133 cmd 0xc000 ctl 0xbc00 bmdma 0xb888 irq 19
[    6.625862] md: raid6 personality registered for level 6
[    6.625868] md: raid5 personality registered for level 5
[    6.625873] md: raid4 personality registered for level 4
[    6.633502] mdadm: sending ioctl 1261 to a partition!
[    6.633509] mdadm: sending ioctl 1261 to a partition!
[    6.633915] mdadm: sending ioctl 1261 to a partition!
[    6.633921] mdadm: sending ioctl 1261 to a partition!
[    6.635898] mdadm: sending ioctl 1261 to a partition!
[    6.635904] mdadm: sending ioctl 1261 to a partition!
[    6.636153] mdadm: sending ioctl 1261 to a partition!
[    6.636159] mdadm: sending ioctl 1261 to a partition!
[    6.641750] md: md0 stopped.
[    6.642371] mdadm: sending ioctl 1261 to a partition!
[    6.642378] mdadm: sending ioctl 1261 to a partition!
[    6.675037] md: bind<sdb>
[    6.677909] md: bind<sdd1>
[    6.677970] md: could not open unknown-block(8,65).
[    6.678032] md: md_import_device returned -6
[    6.678353] md: bind<sde>
[    6.678526] md: bind<sdc>
[    6.678560] md: kicking non-fresh sde from array!
[    6.678570] md: unbind<sde>
[    6.678578] md: export_rdev(sde)
[    6.693380] raid5: md0 is not clean -- starting background reconstruction
[    6.694041] raid5: allocated 4221kB for md0
[    6.695273] raid5: cannot start dirty degraded array for md0
[    6.696231] raid5: failed to run raid set md0
[    6.696291] md: pers->run() failed ...
[   11.335064] md: md127 stopped.

So far so ... not exactly similar but close enough.

Code:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : inactive sdc[4] sdd1[3] sdb[5]
      5860538097 blocks super 1.2
       
unused devices: <none>
There the similarities end. Also, for the last few bootups there was another raid device, md127, which was automatically stopped this time. And which shouldn't be there in the first place.

Removing the non-fresh device doesn't work:
Code:
# mdadm /dev/md0 --fail /dev/sde --remove /dev/sde
mdadm: set device faulty failed for /dev/sde:  No such device

My guess was to identify the failed disk(s) with the serialnumber, match it on the physical disks and replacing them with two of my 3TB disks... But since I can't remove/fail/whatever the damaged disks which is always the first step in the tutorials I come to you for help.

Thanks in advance,
Jakob

edit:

Output from mdadm -E /dev/sd[bcde]:
Code:
# mdadm -E /dev/sd[bcde]
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a148a7f:4d6a55ca:86bdec2e:c2b06689
           Name : sarah:0  (local to host sarah)
  Creation Time : Fri Jun 24 19:29:50 2011
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 7814041600 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907020800 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 2b8df29d:f748fd42:d0fb5a64:339a3532

    Update Time : Mon Feb 24 03:27:03 2014
       Checksum : f75a325f - correct
         Events : 229601

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : .AAA ('A' == active, '.' == missing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a148a7f:4d6a55ca:86bdec2e:c2b06689
           Name : sarah:0  (local to host sarah)
  Creation Time : Fri Jun 24 19:29:50 2011
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 7814041600 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907020800 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : f06cff4d:23598a1f:924b8024:0a0a7259

    Update Time : Mon Feb 24 03:27:03 2014
       Checksum : 6b55e1e8 - correct
         Events : 229601

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : .AAA ('A' == active, '.' == missing)
mdadm: No md superblock detected on /dev/sdd.
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a148a7f:4d6a55ca:86bdec2e:c2b06689
           Name : sarah:0  (local to host sarah)
  Creation Time : Fri Jun 24 19:29:50 2011
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 7814041600 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907020800 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 4e48e9f6:86928e3b:2eaf7905:163957fb

    Update Time : Mon Feb 24 03:27:03 2014
       Checksum : b31f0873 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : .AAA ('A' == active, '.' == missing)
also:
Code:
mdadm -E /dev/sd[bcde]?
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a148a7f:4d6a55ca:86bdec2e:c2b06689
           Name : sarah:0  (local to host sarah)
  Creation Time : Fri Jun 24 19:29:50 2011
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 3907021954 (1863.01 GiB 2000.40 GB)
     Array Size : 7814041600 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907020800 (1863.01 GiB 2000.39 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 362cad3b:4911c779:eaadb3aa:730c567c

    Update Time : Mon Feb 24 03:27:03 2014
       Checksum : 5c57a9e7 - correct
         Events : 229601

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : .AAA ('A' == active, '.' == missing)
So apparently I configured the raid with sdb, sdc, sde and sdd1.

Also I was able to get the array running again with:
mdadm --assemble /dev/md0 /dev/sdb /dev/sdc /dev/sde /dev/sdd1 --force
Code:
# more /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active (auto-read-only) raid6 sdc[4] sdd1[3] sdb[5]
      3907020800 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/3] [_UUU]
      
unused devices: <none>
After rebooting I added /dev/sde to the raid again and now it's rebuilding...
So basically the answer was one google search, one --force and a few hours of clearing my head away.
It was exact the same problem as the thread I linked above and thus contains no new info for this forum

But I still don't trust the whole thing... after the rebuild I will begin to migrate everything to my new disks. Just to be sure.

If you have any better solutions than my remove-old;add-new;rebuild;rinse;grow technique I'm eager to hear it. But if this isn't the thread for it please close/delete, whatever is appropriate.
I will add my experience to the other thread.

Last edited by elcattivo; 02-26-2014 at 02:23 PM. Reason: New Information
 
Old 02-28-2014, 12:05 PM   #2
byau
Member
 
Registered: Sep 2009
Location: Los Angeles, CA
Posts: 33

Rep: Reputation: 5
btw, slightly off-topic = your messages seem to refer to you using SATA drives (2TB and 3TB?). Got to watch using Raid 5 with large SATA volumes. SATA URE is 1x10^15 which means you will get a URE on average once every 12.5 TB which also means you are theoretically going to have a close to 100% chance of unrecoverable RAID 5 if your array is 12.5 TB.

So if your RAID 5 is half that (6.25 TB) your chances are still pretty good.

http://www.zdnet.com/has-raid5-stopp...ng-7000019939/
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Failed/missing RAID disk prompts boot menu hd1080 Linux - Software 2 10-16-2011 10:44 AM
recovering software raid - disk marked as failed rjstephens Linux - General 9 06-10-2008 03:29 AM
How do physically identify a failed RAID disk? horde Linux - General 5 02-17-2008 04:23 AM
Failed disk replacement - RAID 1 madia Linux - Hardware 1 06-28-2007 09:17 AM
Software Raid - recreate failed disk. FragInHell Red Hat 5 11-25-2004 04:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 02:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration