LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 06-24-2013, 09:15 PM   #1
fireball1974
LQ Newbie
 
Registered: Apr 2008
Posts: 11

Rep: Reputation: 1
Upon reboot, Linux software raid drops one device of a RAID1 array


One of my four Linux software raid arrays drops one of its two devices when I reboot my system. The other three arrays work fine. I am running RAID1 on kernel version 2.6.32-5-amd64 (Debian Squeeze). Every time I reboot, /dev/md2 comes up with only one device. I can manually add the device by saying $ sudo mdadm /dev/md2 --add /dev/sdc1. This works fine, and mdadm confirms that the device has been re-added as follows:
Code:
    mdadm: re-added /dev/sdc1
After adding the device and allowing the array time to resync, this is what the output of $ cat /proc/mdstat looks like:
Code:
    Personalities : [raid1] 
    md3 : active raid1 sda4[0] sdb4[1]
          244186840 blocks super 1.2 [2/2] [UU]
      
    md2 : active raid1 sdc1[0] sdd1[1]
          732574464 blocks [2/2] [UU]
      
    md1 : active raid1 sda3[0] sdb3[1]
          722804416 blocks [2/2] [UU]
      
    md0 : active raid1 sda1[0] sdb1[1]
          6835520 blocks [2/2] [UU]
      
    unused devices: <none>
Then after I reboot, this is what the output of $ cat /proc/mdstat looks like:
Code:
    Personalities : [raid1] 
    md3 : active raid1 sda4[0] sdb4[1]
          244186840 blocks super 1.2 [2/2] [UU]
      
    md2 : active raid1 sdd1[1]
          732574464 blocks [2/1] [_U]
      
    md1 : active raid1 sda3[0] sdb3[1]
          722804416 blocks [2/2] [UU]
      
    md0 : active raid1 sda1[0] sdb1[1]
          6835520 blocks [2/2] [UU]
      
    unused devices: <none>
During reboot, here is the output of $ sudo cat /var/log/syslog | grep mdadm :
Code:
    Jun 22 19:00:08 rook mdadm[1709]: RebuildFinished event detected on md device /dev/md2
    Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1
    Jun 22 19:00:20 rook kernel: [ 7819.446412] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:00:20 rook kernel: [ 7819.446415] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:00:20 rook kernel: [ 7819.446782] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:00:20 rook kernel: [ 7819.446785] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:00:20 rook kernel: [ 7819.515844] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:00:20 rook kernel: [ 7819.515847] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:00:20 rook kernel: [ 7819.606829] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:00:20 rook kernel: [ 7819.606832] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:48 rook kernel: [ 8027.855616] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:48 rook kernel: [ 8027.855620] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:48 rook kernel: [ 8027.855950] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:48 rook kernel: [ 8027.855952] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:49 rook kernel: [ 8027.962169] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:49 rook kernel: [ 8027.962171] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:49 rook kernel: [ 8028.054365] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:03:49 rook kernel: [ 8028.054368] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.588662] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.588664] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.601990] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.601991] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.602693] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.602695] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.605981] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.605983] mdadm: sending ioctl 1261 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.606138] mdadm: sending ioctl 800c0910 to a partition!
    Jun 22 19:10:23 rook kernel: [    9.606139] mdadm: sending ioctl 800c0910 to a partition!
    Jun 22 19:10:48 rook mdadm[1737]: DegradedArray event detected on md device /dev/md2
Here is the result of $ cat /etc/mdadm/mdadm.conf:
Code:
    ARRAY /dev/md0 metadata=0.90 UUID=92121d42:37f46b82:926983e9:7d8aad9b
    ARRAY /dev/md1 metadata=0.90 UUID=9c1bafc3:1762d51d:c1ae3c29:66348110
    ARRAY /dev/md2 metadata=0.90 UUID=98cea6ca:25b5f305:49e8ec88:e84bc7f0
    ARRAY /dev/md3 metadata=1.2 name=rook:3 UUID=ca3fce37:95d49a09:badd0ddc:b63a4792
Here is the output of $ sudo mdadm -E /dev/sdc1 after re-adding the device and allowing it time to resync:
Code:
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook)
  Creation Time : Sun Jul 13 08:05:55 2008
     Raid Level : raid1
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
     Array Size : 732574464 (698.64 GiB 750.16 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2

    Update Time : Mon Jun 24 07:42:49 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 5fd6cc13 - correct
         Events : 180998


      Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
Here is the output of $ sudo mdadm -D /dev/md2 after re-adding the device and allowing it time to resync:
Code:
/dev/md2:
        Version : 0.90
  Creation Time : Sun Jul 13 08:05:55 2008
     Raid Level : raid1
     Array Size : 732574464 (698.64 GiB 750.16 GB)
  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Mon Jun 24 07:42:49 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook)
         Events : 0.180998

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
I also ran $ sudo smartctl -t long /dev/sdc and no hardware issues were detected. As long as I do not reboot, /dev/md2 seems to work fine. Does anyone have any suggestions?

Last edited by fireball1974; 06-24-2013 at 09:33 PM. Reason: additional information
 
Old 06-25-2013, 12:39 AM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
This
Code:
    Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1
indicates it thinks that that disk is a hot spare, not an original member of md2.

I'd ensure you have 2(!) backups of md2, then run through the steps in section 5.2.4 here http://www.devil-linux.org/documenta...x/ch01s05.html (amending names as reqd), then try a reboot AFTER re-sync completes.
 
1 members found this post helpful.
Old 06-25-2013, 05:43 AM   #3
rajesh_daripalli
LQ Newbie
 
Registered: Jan 2009
Location: India
Distribution: Redhat and Cent OS
Posts: 13

Rep: Reputation: 0
What is the partition type of /dev/sdc1 is it software raid auto i.e "fd" if not chage it to "fd" and check.

What is the output of fdisk -l

Rajesh
 
Old 06-25-2013, 09:55 AM   #4
fireball1974
LQ Newbie
 
Registered: Apr 2008
Posts: 11

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by chrism01 View Post
This
Code:
    Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1
indicates it thinks that that disk is a hot spare, not an original member of md2.

I'd ensure you have 2(!) backups of md2, then run through the steps in section 5.2.4 here http://www.devil-linux.org/documenta...x/ch01s05.html (amending names as reqd), then try a reboot AFTER re-sync completes.
So you are saying I have to zero the superblock of /dev/sdc1 and then add it to the array as if it is a new device? Is there any way to check the superblock on /dev/sdc1 to verify that the superblock is the problem? Does adding it to the array rewrite the superblock? If so, why wasn't the faulty superblock overwritten when I added it in the past?
 
Old 06-25-2013, 11:21 PM   #5
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
I'm saying its confused about the status of the disk, so make it treat it as a new disk,
You can check superblocks http://linux.die.net/man/8/mdadm
Quote:
--examine
The device should be a component of an md array. mdadm will read the md superblock of the device and display the contents. If --brief or --scan is given, then multiple devices that are components of the one array are grouped together and reported in a single entry suitable for inclusion in mdadm.conf.

Having --scan without listing any devices will cause all devices listed in the config file to be examined.
Have a good read of that page; there's a lot of references to the superblock.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Device in software RAID 10 array: clean, degraded. Ouch? batfastad Linux - Hardware 1 01-09-2013 06:57 PM
[SOLVED] mdadm: only give one device per ARRAY line: /dev/md/:raid and array laughing_man77 Linux - Hardware 4 03-23-2012 04:05 PM
software raid - device rearrangement after reboot when drives are disconnected rtspitz Linux - Hardware 5 07-08-2007 08:00 PM
software Firewire 800 RAID array loses formatting on reboot monty Linux - Hardware 3 11-29-2004 11:14 PM
software raid - add device wrongly marked faulty back into array? snoozy Linux - General 2 06-27-2003 02:11 PM


All times are GMT -5. The time now is 09:14 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration