LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-27-2012, 01:49 PM   #1
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Rep: Reputation: 77
Drive Failed on Software RAID


I've got a Debian Linux system running 5 identical 2 TB drives in a RAID5 array as shown below:

Code:
fs3:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md2 : active raid5 sda3[0] sde3[4] sdd3[3] sdc3[2] sdb3[1](F)
      7806318592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [U_UUU]

md1 : active raid5 sda2[0] sde2[4] sdd2[3] sdc2[2] sdb2[1](F)
      1558528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [U_UUU]

md0 : active (auto-read-only) raid5 sda1[0] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      3995648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
It appears the drive that failed was /dev/sdb to above output and I've since replaced the drive in the server and the amber failure LED has since gone away with the new drive. Now my question is how can I repair / rebuild this array from my point in time? I've identified the drive was labeled /dev/sdb and have replaced the physical drive. Do you I need to manually partition the drive and then add it into the array? Or do I need to just omit partitioning and use the mdadm utility?

Code:
fs3:~# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Tue Mar 22 13:53:16 2011
     Raid Level : raid5
     Array Size : 7806318592 (7444.69 GiB 7993.67 GB)
  Used Dev Size : 1951579648 (1861.17 GiB 1998.42 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Tue Nov 27 14:49:44 2012
          State : clean, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : fs3:2  (local to host fs3)
           UUID : 29a919a4:4a740a7b:64b56f03:691635b9
         Events : 1443572

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       0        0        1      removed
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3

       1       8       19        -      faulty spare   /dev/sdb3

Last edited by carlosinfl; 11-27-2012 at 01:50 PM.
 
Old 11-27-2012, 02:15 PM   #2
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Original Poster
Rep: Reputation: 77
***UPDATE***

Removed the two failed partitions from /dev/md1 & md2...

Code:
fs3:~# mdadm --remove /dev/md1 /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md1
fs3:~# mdadm --remove /dev/md2 /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md2
 
Old 11-29-2012, 07:03 AM   #3
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Original Poster
Rep: Reputation: 77
Nobody???
 
Old 11-29-2012, 09:39 AM   #4
eantoranz
Senior Member
 
Registered: Apr 2003
Location: Costa Rica
Distribution: Kubuntu, Debian, Knoppix
Posts: 2,092
Blog Entries: 1

Rep: Reputation: 90
Well.... I _guess_ you shouldn't do much on the raid. Even with a missing disk, whatever was in the raid up until now is still there so you should be able to work on it as usual.

Now, there must be some way to tell mdadm to add this new disk to replace the missing slot and that should be it.
 
Old 11-30-2012, 02:32 PM   #5
netfoot
LQ Newbie
 
Registered: Nov 2012
Posts: 23

Rep: Reputation: Disabled
Take into consideration the fact that I have not tested this, but this is what I would do:

Remove all partitions on the drive that is to be replaced, even partitions that have not failed.

Code:
mdadm /dev/md0 --remove /dev/sdb1
mdadm /dev/md1 --remove /dev/sdb2
mdadm /dev/md2 --remove /dev/sdb3
Partition the new drive exactly the same as the old one, and add those partitions back.

Code:
mdadm /dev/md0 --add /dev/sdb1
mdadm /dev/md1 --add /dev/sdb2
mdadm /dev/md2 --add /dev/sdb3
cat /proc/mdstat and check if the arrays are resyncing.

Once again, do this at your own risk...
 
Old 12-05-2012, 01:07 PM   #6
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Original Poster
Rep: Reputation: 77
I'm unable to get rid of the /dev/sdb1 from /dev/md0 because I believe this is swap and I'm unable to remove since the partition is mounted and most likely being used.

Code:
fs3:~# mdadm /dev/md0 --remove /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: Device or resource busy
 
Old 12-05-2012, 01:53 PM   #7
carlosinfl
Senior Member
 
Registered: May 2004
Location: Orlando, FL
Distribution: Arch
Posts: 2,905

Original Poster
Rep: Reputation: 77
I've also turned off swap (I think) and it didn't work:

Code:
fs3:~# swapoff -a
fs3:~# mdadm /dev/md0 --remove /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: Device or resource busy
 
Old 12-05-2012, 05:51 PM   #8
netfoot
LQ Newbie
 
Registered: Nov 2012
Posts: 23

Rep: Reputation: Disabled
Quote:
I'm unable to get rid of the /dev/sdb1 from /dev/md0 because I believe this is swap and I'm unable to remove since the partition is mounted and most likely being used.
So, you are using /dev/md0 as a swap area? That sounds complicated! :-)

Look at /proc/swaps to check what is being used as swap. If /dev/md0 is a swap area you could use swapoff to stop using it as a swap area. If that would leave you with insufficient swap (or none at all), first add some swap space temporarily. Use mkswap on a suitable spare partition, or (more convenient) use dd to create a large file, and use mkswap on that. Either way, once the temporary swap space is prepared, use swapon to add it to your system. Once it is on, then you can free up /dev/md0 with swapoff, and --remove /dev/sdb1 from the array.

After you replace the drive and --add back the partitions, you can reverse the process. Use swapon on to put /dev/md0 back into use as swap, then use swapoff to free the temporary swap file or partition.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Why would a failed drive in RAID 1 cause entire system to halt? transient Linux - Hardware 1 07-03-2012 03:37 PM
Opensuse 11 software RAID 5 failed: how to recover ? laufandreas Linux - Server 3 06-30-2009 04:51 AM
Physically detect a failed hard drive in a software RAID 5 array testnbbuser Linux - Server 3 12-21-2007 05:10 PM
Raid 1 Recovery after a drive failed... Wyntyr Linux - General 2 09-02-2005 04:01 PM
Software Raid - recreate failed disk. FragInHell Red Hat 5 11-25-2004 04:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration