LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Device in software RAID 10 array: clean, degraded. Ouch? (https://www.linuxquestions.org/questions/linux-hardware-18/device-in-software-raid-10-array-clean-degraded-ouch-4175444882/)

batfastad 01-09-2013 06:32 PM

Device in software RAID 10 array: clean, degraded. Ouch?
 
Hi everyone

Even though this is a software RAID question I thought I'd post it in hardware as it sort of leans more that way IMO.

I've got 4x 500GB drives in software RAID.
/dev/md0 is RAID 1 and mounted to /boot
/dev/md1 is RAID 10 and is swap
/dev/md2 is RAID 10 and is the main system and data device

I looked at mdadm this evening and noticed on md2
Code:

State : clean, degraded
    Number  Major  Minor  RaidDevice State
      0      8        3        0      active sync  /dev/sda3
      1      0        0        1      removed
      2      8      35        2      active sync  /dev/sdc3
      3      8      51        3      active sync  /dev/sdd3

But checking md0 and md1 all drives were active sync and the device state was clean.

Here's the full outputs from mdadm for each device and also the output from /proc/mdstat
http://pastebin.com/VL0uYdU9

So it looks like /dev/sdb1 and /dev/sdb2 are functioning in /dev/md0 and /dev/md1 respectively.
But /dev/sdb3 has dropped out (or removed, apparently) from /dev/md2

Is this a major problem? i.e. a data loss problem?

With RAID 10 my data is ok unless I lose the same drive on the opposite stripe, or so I believe.

Or is it something weird with just one device that can be re-added?

Anyone seen anything like this before?

I'm going to grep logs and check when this might have happened but unfortunately I've not been able to check this machine for a while so it might have been like this for a long time.

Cheers, B

batfastad 01-09-2013 06:57 PM

Ok so I did some log grepping and noticed this pair of log lines...
Code:

Dec  9 04:25:37 hostname smartd[3199]: Device: /dev/sdb, 1 Currently unreadable (pending) sectors
Dec  9 04:25:37 hostname smartd[3199]: Device: /dev/sdb, 1 Offline uncorrectable sectors

Repeating every 30 minutes so this has been the case for a while.

But then on Jan 7th an idiot user rebooted the server, thinking it would solve a mail relay problem. Here's the the boot from /var/log/messages...
http://pastebin.com/jGVsDD54

Anyone know what might have happened to /dev/sdb3?
From the smartd warnings it looks like it's failed a SMART data check.

Why do /dev/sdb1 and /dev/sdb2 appear to be functioning ok and just /dev/sdb3 failed?
Just a particular sector that happens to be in sdb3?

Is it possible to re-add this drive to the array?

Or should I bin it and replace with a fresh drive?

Cheers, B


All times are GMT -5. The time now is 10:21 AM.