LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-02-2010, 09:46 AM   #1
mvanhorn
LQ Newbie
 
Registered: Mar 2010
Posts: 8

Rep: Reputation: 0
Exclamation RAID6 rebuild weirdness


I am in a weird situation with a raid6 array which has just had a disk replaced. It is composed of 8 disks, sd[b-i], and sdh went bad. I took the system down, replaced the disk, brought it back up, and I can't mount the array (/dev/md0), and I'm not sure what it happening. Here's some output:


Code:
[root@hostname ~]# mdadm --query /dev/md0 
/dev/md0: 0.00KiB raid6 8 devices, 1 spare. Use mdadm --detail for more detail.
How is this possible? Doesn't raid 6, by definition, have *2* spares?

Code:
[root@hostname ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Mon Jun 28 10:46:51 2010
     Raid Level : raid6
    Device Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Nov  2 08:19:52 2010
          State : dirty, degraded
 Active Devices : 7
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 1

     Chunk Size : 64K

           UUID : 6b8b4567:327b23c6:643c9869:66334873
         Events : 0.34591155

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      spare rebuilding   /dev/sdh1
       7       8      129        7      active sync   /dev/sdi1
If there are 7 disks working, and it's rebuilding the other, why can't I go ahead and mount the array; I've only lost one disk!

Code:
[root@hostname ~]# cat /proc/mdstat
Personalities : [raid6] 
md0 : inactive sdh1[6] sdb1[0] sdi1[7] sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1]
      15628095488 blocks
       
unused devices: <none>
I don't understand why I can't just mount the array. Instead, I get a superblock error:

Code:
root@hostname ~]# mount /dev/md0
mount: /dev/md0: can't read superblock
Can someone please shed some light on what it going on?

Thanks!

Last edited by mvanhorn; 11-02-2010 at 10:08 AM.
 
Old 11-02-2010, 10:02 AM   #2
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

the RAID cannot be mounted, because it is inactive as mdstat shows.

Did you try the
Code:
mdadm --run /dev/md0
Be careful with all commands in case there is data on your RAID.
 
Old 11-02-2010, 10:07 AM   #3
mvanhorn
LQ Newbie
 
Registered: Mar 2010
Posts: 8

Original Poster
Rep: Reputation: 0
I've now downloaded a new mdadm, and this shows that the array is active:

Code:
[root@chanute04 mdadm-3.1.4]# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Mon Jun 28 10:46:51 2010
     Raid Level : raid6
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Nov  2 08:19:52 2010
          State : active, degraded, Not Started
 Active Devices : 7
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 6b8b4567:327b23c6:643c9869:66334873
         Events : 0.34591155

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      spare rebuilding   /dev/sdh1
       7       8      129        7      active sync   /dev/sdi1
So, why can't I mount it?

And, yes, I have tried --run:

Code:
[root@hostname]# mdadm --run /dev/md0
mdadm: failed to run array /dev/md0: Input/output error
And, yes, the raid does have data on it; I was expecting to still be able to get to the data while it was adding the new drive back in. I don't understand why I can't do so.

Last edited by mvanhorn; 11-02-2010 at 10:09 AM.
 
Old 11-02-2010, 10:12 AM   #4
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

did they RAID autostart during boot, or did you assemble it manualy? I'm also not sure what creates this behaviour.

Can you stop your raid and start it manually, i can not see any reason why mdstat reports the array as inactive.
 
Old 11-02-2010, 10:20 AM   #5
mvanhorn
LQ Newbie
 
Registered: Mar 2010
Posts: 8

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by mesiol View Post
did they RAID autostart during boot, or did you assemble it manualy? I'm also not sure what creates this behaviour.

Can you stop your raid and start it manually, i can not see any reason why mdstat reports the array as inactive.
No, it does not start at boot. I'm trying to start it manually, and that isn't working. I also don't understand why it's showing up as inactive; with (now) 8 disks of an 8-disk raid6 array, it should be just fine.
 
Old 11-02-2010, 10:33 AM   #6
mvanhorn
LQ Newbie
 
Registered: Mar 2010
Posts: 8

Original Poster
Rep: Reputation: 0
This may help. I just tried to assemble it again, and now it says:

Code:
[root@hostname]# mdadm -A /dev/md0
mdadm: /dev/sdh1 has no superblock - assembly aborted
So, how do I put a superblock on /dev/sdh1 (this is the drive that was replaced). I thought the system would do this automatically?
 
Old 11-02-2010, 11:22 AM   #7
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

could please post output of
Code:
 fdisk -l /dev/sdh
Do you replace your disk by only replacing the physical hardware? Or do you also replace it with the appropriate fdisk/mdadm commands?

Did you add the disk by creating a software RAID partition with fdisk and than using
Code:
mdadm manage /dev/md0 --add /dev/sdh1
?

Software RAID partitions contains private regions where information about the RAID is stored. This is created during initialization of the RAID. If you replace a disk this information is lost, so it cannot really be taken into the RAID. So you have to tell the RAID to use this disk as an replacement, therefor are commands like --remove and --add.

Last edited by mesiol; 11-02-2010 at 01:49 PM.
 
Old 11-03-2010, 07:55 AM   #8
mvanhorn
LQ Newbie
 
Registered: Mar 2010
Posts: 8

Original Poster
Rep: Reputation: 0
It turns out I had a second disk bad. Or, well, maybe I do. I started checking the output of 'mdadm --examine' on each individual disk in the array, and on a couple of them it showed that a different disk (sdd, rather than sdh) had been removed from the array as well. So, I did a 'mdadm --assemble' without either sdd or sdh, and it took right off. Then, I was able to add sdh back in, and it's rebuilding now.

What's odd is that I've now run what I thought was the bad sdh through the Western Digital diagnostics, and everything seems okay with this disk. So, I'm going to try replacing sdd with this (formerly sdh) disk, and see what happens.

Thanks for your help!
 
Old 11-03-2010, 08:05 AM   #9
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

fine to hear that your rebuild is working. Wish you success on your problem.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Disable RAID6 in kernel? dbrazeau Linux - Kernel 6 04-13-2010 11:37 PM
RAID6 I/O and Alignment aviso Linux - Server 0 08-16-2009 12:29 PM
RAID6 File System Question. Tahir Saleh Linux - Server 5 03-22-2009 01:53 AM
raid6 questions/implementation jadeddog Linux - Hardware 6 11-25-2008 07:10 PM
RAID6 Setup Questions carlosinfl Linux - Hardware 3 05-22-2007 09:44 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 06:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration