LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 03-25-2011, 12:48 PM   #1
pjakobs
LQ Newbie
 
Registered: Mar 2011
Posts: 3

Rep: Reputation: 0
RAID 6 Array coming up with all disks as spare


I have been running a server with an increasingly large md array and always been plagued with intermittent disk faults. For a long time, I've attributed those to either temperature or power glitches.
I had just embarked on a quest to a) lower case and drive temperature. They were running between 43 and 47°C, sometimes peaking at 52°C, so I've added more case fan power and made sure the drive cage was in the flow (it has it's own fan, too). Also, I've upgraded my power supply and made very sure that all the connectors are good.

The array currently is a RAID6 with 5 Seagate 1,5TB drives.

When everything seemed to be working fine, I looked at my SMART logs and found that two of my drives (both well over 14000 operating hours) were showing uncorrectible bad blocks. Since it's RAID6, I figured, I couldn't do much harm, ran a badblocks test on it, zeroed the blocks that were reported bad, figuring the drive defect management would remap them to a good part of the disk and zeroed the superblock.
I then added it back to the pack and the resync started.
At around 50%, a second drive decided to go and shortly thereafter a third. Now, with two out of five drives, RAID6 will fail. Fine. At least, no data will be written to it anymore, however, now I cannot reassemble the array anymore. Whenever I try I get this:
Code:
mdadm --assemble --scan
mdadm: /dev/md1 assembled from 2 drives and 2 spares - not enough to start the array
Yeah, makes sense, however:
Code:
cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [linear] 
md1 : inactive sdf1[4](S) sde1[6](S) sdg1[1](S) sdh1[5](S) sdd1[2](S)
      7325679320 blocks super 1.0
       
md0 : active raid1 sdb2[0] sdc2[1]
      312464128 blocks [2/2] [UU]
      bitmap: 3/149 pages [12KB], 1024KB chunk
which is not fine. I'm sure that three devices are fine (normally, a failed device would just rejoin the array, skipping most of the resync by way of the bitmap) so I should be able to reassemble the array with the two good ones and the one that failed last, then add the one that failed during the resync and finally re-add the original offender. However, I have no idea how to get them out of the "(S)" state.
Code:
 mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : d79d81cc:fff69625:5fb4ab4c:46d45217
           Name : linux-z2qv:1
  Creation Time : Wed May 26 12:49:07 2010
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 2930271728 (1397.26 GiB 1500.30 GB)
     Array Size : 8790810624 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930270208 (1397.26 GiB 1500.30 GB)
   Super Offset : 2930271984 sectors
          State : active
    Device UUID : d7646629:eddb4e80:e8b695e9:f89bc31e

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Mar 25 15:24:14 2011
       Checksum : 3be5349b - correct
         Events : 77338

     Chunk Size : 4096K

   Device Role : Active device 2
   Array State : A.A.. ('A' == active, '.' == missing)
Code:
mdadm --examine /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : d79d81cc:fff69625:5fb4ab4c:46d45217
           Name : linux-z2qv:1
  Creation Time : Wed May 26 12:49:07 2010
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 2930271728 (1397.26 GiB 1500.30 GB)
     Array Size : 8790810624 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930270208 (1397.26 GiB 1500.30 GB)
   Super Offset : 2930271984 sectors
          State : active
    Device UUID : 86a3e0df:9cf5a8a9:966216b4:bde4c89b

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Mar 25 15:24:14 2011
       Checksum : 633f93d1 - correct
         Events : 77338

     Chunk Size : 4096K

   Device Role : spare
   Array State : A.A.. ('A' == active, '.' == missing)
Code:
mdadm --examine /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : d79d81cc:fff69625:5fb4ab4c:46d45217
           Name : linux-z2qv:1
  Creation Time : Wed May 26 12:49:07 2010
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 2930271728 (1397.26 GiB 1500.30 GB)
     Array Size : 8790810624 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930270208 (1397.26 GiB 1500.30 GB)
   Super Offset : 2930271984 sectors
          State : active
    Device UUID : a74b9a85:61a932c1:22f3bc8c:1632bd08

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Mar 25 15:24:14 2011
       Checksum : 661db4e1 - correct
         Events : 77338

     Chunk Size : 4096K

   Device Role : Active device 0
   Array State : A.A.. ('A' == active, '.' == missing)
Code:
mdadm --examine /dev/sdg1
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : d79d81cc:fff69625:5fb4ab4c:46d45217
           Name : linux-z2qv:1
  Creation Time : Wed May 26 12:49:07 2010
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 2930271728 (1397.26 GiB 1500.30 GB)
     Array Size : 8790810624 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930270208 (1397.26 GiB 1500.30 GB)
   Super Offset : 2930271984 sectors
          State : active
    Device UUID : eafb97a3:61eaef07:4b87cd7d:9a9bcdec

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Mar 25 15:24:14 2011
       Checksum : 9ff9bc86 - correct
         Events : 77338

     Chunk Size : 4096K

   Device Role : spare
   Array State : A.A.. ('A' == active, '.' == missing)
Code:
mdadm --examine /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : d79d81cc:fff69625:5fb4ab4c:46d45217
           Name : linux-z2qv:1
  Creation Time : Wed May 26 12:49:07 2010
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 2930271728 (1397.26 GiB 1500.30 GB)
     Array Size : 8790810624 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930270208 (1397.26 GiB 1500.30 GB)
   Super Offset : 2930271984 sectors
          State : active
    Device UUID : 6140c7d6:807684f5:0d1fd895:32411a7d

Internal Bitmap : 2 sectors from superblock
    Update Time : Fri Mar 25 15:20:23 2011
       Checksum : 6919c338 - correct
         Events : 77331

     Chunk Size : 4096K

   Device Role : spare
   Array State : A.AAA ('A' == active, '.' == missing)
by the Events, sdd1 through sdg1 should be ok
Code:
[/dev/sdd1] Events : 77338
[/dev/sde1] Events : 77338
[/dev/sdf1] Events : 77338
[/dev/sdg1] Events : 77338
[/dev/sdh1] Events : 77331
The update time also shows them to be in aggreement:
Code:
[/dev/sdd1] Update Time : Fri Mar 25 15:24:14 2011
[/dev/sde1] Update Time : Fri Mar 25 15:24:14 2011
[/dev/sdf1] Update Time : Fri Mar 25 15:24:14 2011
[/dev/sdg1] Update Time : Fri Mar 25 15:24:14 2011
[/dev/sdh1] Update Time : Fri Mar 25 15:20:23 2011
So from the data in the superblocks, I should be able to start /dev/sdd1, /dev/sdf1 plus /dev/sde1 as a running yet degraded RAID and then add /dev/sdg1, wait for resync and then finally add /dev/sdh1.

Any ideas/thoughts/suggestions?

TIA

pj
 
Old 04-18-2011, 10:30 AM   #2
xaminmo
LQ Newbie
 
Registered: Feb 2010
Location: TX
Distribution: Debian
Posts: 10

Rep: Reputation: 4
I'm assuming the following:
* Your array was up with data before this
* Something occurred (disconnect, outage, remove/readd drives)
* Now the array is inaccessible and all of the drives are spare
* Your OLD array UUID was d79d81cc:fff69625:5fb4ab4c:46d45217
* Your OLD metadata type was 1.0
* Your OLD chunk size was 1024
* Your OLD raid type was Level 5 with 5 volumes

If any of the above assumptions are not correct, then do not proceed.

Basically, in the above situation, you would need to recreate your array without wiping what's on it. This is where "Assume Clean" comes into play.
Quote:
cp -p /etc/mdadm/mdadm.conf /etc/mdadm/mdadm.conf
mdadm --stop /dev/md1
mdadm --create --verbose --assume-clean -l5 -n5 -c1024 --metadata=1.0 \
--uuid=d79d81cc:fff69625:5fb4ab4c:46d45217 /dev/md1 \
/dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
If everything looks good, then
Quote:
mdadm --readwrite /dev/md1
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
/usr/share/mdadm/checkarray -l /dev/md1 # Slow rescan
Now, if you have a VG on it, you can run vgscan and then vgchange -ay $vgname to bring your VG online.

Last edited by xaminmo; 04-18-2011 at 10:31 AM. Reason: Clarify "recreate array"
 
1 members found this post helpful.
Old 04-23-2011, 10:52 AM   #3
pjakobs
LQ Newbie
 
Registered: Mar 2011
Posts: 3

Original Poster
Rep: Reputation: 0
--assume-clean is something that I wasn't aware of, from what I read, it's exactly what I would have needed. However, since I had a recent backup, I went ahead and re-created the array. In the, hopefully unlikely, event that I'll need this again in the future, I shall remember your hints here.

Thanks a lot

pj
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RAID mdadm cant add disks to array vockleya Linux - Software 4 09-13-2010 06:37 PM
Is it possible to spin-off software raid spare disks n0bster Linux - Server 1 06-25-2010 01:59 PM
Looking for good hard disks to use in a Raid 1 array gregw040 Linux - Hardware 2 01-14-2010 04:26 AM
Raid 5 Array with Different Sized Disks Dewar Linux - Hardware 1 11-19-2004 11:09 PM
How do Install to system with 3 disks, 2 of which are RAID array ? Raptor Ramjet Slackware 1 09-28-2003 10:07 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 11:56 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration