LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 03-03-2011, 03:50 PM   #1
twinkiestar
LQ Newbie
 
Registered: Mar 2011
Posts: 3

Rep: Reputation: 0
Question mdadm marks one drive faulty, how do I know what is wrong


I recently upgraded my Freenas 0.72 based NAS into a ubuntu 10.10 server based NAS. The mdadm tool is easy to use following the guide, but the raid building is so slow - 10 hours for a 4TB raid 5 (3x2TB), it wasn't this long on the initial setup on freenas.. anyway..

the pain comes after I rebooted the machine, then I found my raid mount failed and mdadm says /dev/sdb failed, but I cannot verify log anywhere that tells me what is wrong with /dev/sdb. I checked /var/log/messages and syslog, cannot find anything error related to sdb. when I try to add it back, it says
Code:
mdadm /dev/md0 -a /dev/sdb1
mdadm: Cannot open /dev/sdb1: Device or resource busy
so I try to remove it, it tells me
Code:
mdadm /dev/md0 -r /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: No such device or address
a more deeper look of sdb1
Code:
mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0cbf13fa:d1ac85d7:0cc163c1:e6c0f4aa (local to host yaonas)
  Creation Time : Tue Mar  1 22:43:11 2011
     Raid Level : raid5
  Used Dev Size : 1953513408 (1863.02 GiB 2000.40 GB)
     Array Size : 3907026816 (3726.03 GiB 4000.80 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Mar  2 12:29:32 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 8a66bd5e - correct
         Events : 19

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
I also used smartctl to check the disk, it says
Code:
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
so, it appears mdadm think sdb is bad, but I don't know why, should I ask the manufacture for a replacement (I used this drive for about 2 month)
 
Old 03-03-2011, 08:09 PM   #2
tommylovell
Member
 
Registered: Nov 2005
Distribution: Raspbian, Debian, Ubuntu
Posts: 386

Rep: Reputation: 105Reputation: 105
Ok. I'll ask. What indication do you have that the drive has failed? What does 'cat /proc/mdstat' show?

I think the three displays you did show a healthy RAID array and drive.

My healthy RAID1 array.
Code:
[root@athlonz ~]# cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sda5[0] sdc5[1]
      1440026304 blocks [2/2] [UU]
      
md0 : active raid1 sda2[0] sdc2[1]
      25005056 blocks [2/2] [UU]
      
unused devices: <none>
You can't add an already added device.
Code:
[root@athlonz ~]# mdadm /dev/md0 -a /dev/sdc2
mdadm: Cannot open /dev/sdc2: Device or resource busy
You can't remove a device unless it has failed (or you've failed it with an "mdadm -f").
Code:
[root@athlonz ~]# mdadm /dev/md0 -r /dev/sdc2
mdadm: hot remove failed for /dev/sdc2: Device or resource busy
"active sync" state is good.
Code:
[root@athlonz ~]# mdadm --examine /dev/sdc2
/dev/sdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 397c7395:9c3c48fa:68e69357:c9f2169f
  Creation Time : Tue Mar 10 06:59:43 2009
     Raid Level : raid1
  Used Dev Size : 25005056 (23.85 GiB 25.61 GB)
     Array Size : 25005056 (23.85 GiB 25.61 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Thu Mar  3 20:02:14 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 4a74129d - correct
         Events : 632582


      Number   Major   Minor   RaidDevice State
this     1       8       34        1      active sync   /dev/sdc2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       34        1      active sync   /dev/sdc2
[root@athlonz ~]#
 
Old 03-04-2011, 08:35 AM   #3
twinkiestar
LQ Newbie
 
Registered: Mar 2011
Posts: 3

Original Poster
Rep: Reputation: 0
when I do
mdadm --detail /dev/md0, it has this line for the sdb1
1 1 8 17 1 faulty removed /dev/sdb1

so definitely something wrong with this drive, just I cannot find out what is wrong, there doesn't seem to have bad sector, and SMART didn't show any error either, so I am curious on what basis did mdadm think the drive is "faulty"...
 
Old 03-04-2011, 10:06 AM   #4
tommylovell
Member
 
Registered: Nov 2005
Distribution: Raspbian, Debian, Ubuntu
Posts: 386

Rep: Reputation: 105Reputation: 105
This looks bad. If it is "faulty removed", then md0 should reflect that (I think the state should show 'degraded', yours shows 'clean'), and you should be able to add it back in. Should...

I had some problems with a RAID1 with a pair of Seagate drives. But they showed an error in syslog when they failed, also showed as faulty, and could be easily re-added. Not a problem that SMART could detect. After I upgraded the firmware on the drives the problem went away and they've been flawless since.

I know there is a '--force' flag that can be used, but I think there are some dangers using it and would leave it as a last resort.

You stated that your raid mount failed at boot time. Have you been able to manually mount it since the boot? If so, I'd suggest getting a backup of that md device. Then as a last resort, you can try '--force' or try wiping that drive of metadata (last 128KB or so of the /dev/sdb1 partition) and then add it as if it were a new drive.

All risky business. Good luck.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mdadm - removing faulty spare carlmarshall Linux - Server 8 03-09-2023 03:31 PM
[Fedora 9]mdadm + faulty spare setkos Linux - Newbie 0 10-30-2008 10:17 AM
single quotation marks pointed in wrong direction newbiesforever Linux - Software 1 11-30-2006 12:30 AM
mdadm shows 2 faulty drives steven.wong Linux - General 2 08-21-2006 04:39 AM
scandisk marks badblocks wrong? apfel Linux - Hardware 3 11-03-2005 08:48 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration