LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   RAID degraded, partition missing from md0 (https://www.linuxquestions.org/questions/linux-hardware-18/raid-degraded-partition-missing-from-md0-4175483697/)

reano 11-07-2013 01:57 AM

RAID degraded, partition missing from md0
 
Hey guys,
We're having a very weird issue at work. Our Ubuntu server has 6 drives, set up with RAID1 as follows:

/dev/md0, consisting of:
/dev/sda1
/dev/sdb1

/dev/md1, consisting of:
/dev/sda2
/dev/sdb2

/dev/md2, consisting of:
/dev/sda3
/dev/sdb3

/dev/md3, consisting of:
/dev/sdc1
/dev/sdd1

/dev/md4, consisting of:
/dev/sde1
/dev/sdf1

As you can see, md0, md1 and md2 all use the same 2 drives (split into 3 partitions). I also have to note that this is done via ubuntu software raid, not hardware raid.

Today, the /md0 RAID1 array shows as degraded - it is missing the /dev/sdb1 drive. But since /dev/sdb1 is only a partition (and /dev/sdb2 and /dev/sdb3 are working fine), it's obviously not the drive that's gone AWOL, it seems the partition itself is missing.

How is that even possible? And what could we do to fix it?

My output of cat /proc/mdstat:

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md1 : active raid1 sda2[0] sdb2[1]
      24006528 blocks super 1.2 [2/2] [UU]


md2 : active raid1 sda3[0] sdb3[1]
      1441268544 blocks super 1.2 [2/2] [UU]


md0 : active raid1 sda1[0]
      1464710976 blocks super 1.2 [2/1] [U_]


md3 : active raid1 sdd1[1] sdc1[0]
      2930133824 blocks super 1.2 [2/2] [UU]


md4 : active raid1 sdf2[1] sde2[0]
      2929939264 blocks super 1.2 [2/2] [UU]


unused devices: <none>


Any help would be greatly appreciated!

evo2 11-07-2013 02:09 AM

Hi,

it's not so unusual to have problems with just one partition on a disk.

You can try to rebuild with the existing sdb, or you can replace the sdb and then rebuild. See for example http://www.howtoforge.com/replacing_..._a_raid1_array for the latter option.

However, before doing anything make sure you are familiar with: https://raid.wiki.kernel.org/index.php/Linux_Raid

Evo2.

reano 11-07-2013 02:13 AM

Quote:

Originally Posted by evo2 (Post 5059874)
Hi,

it's not so unusual to have problems with just one partition on a disk.

You can try to rebuild with the existing sdb, or you can replace the sdb and then rebuild. See for example http://www.howtoforge.com/replacing_..._a_raid1_array for the latter option.

However, before doing anything make sure you are familiar with: https://raid.wiki.kernel.org/index.php/Linux_Raid

Evo2.

Thanks Evo2. Can you please explain how I'd go about trying the first option (rebuild with the existing sdb)? Safely, that is :P

evo2 11-07-2013 02:23 AM

Hi,

Quote:

Originally Posted by reano (Post 5059877)
Thanks Evo2. Can you please explain how I'd go about trying the first option (rebuild with the existing sdb)? Safely, that is :P

didn't remember off the top of my head but from a quick scan of https://raid.wiki.kernel.org/index.php/Reconstruction and the mdadm man page it looks like the first thing to try should be:
Code:

mdadm --assemble --scan
However, please check for yourself.

Evo2.

reano 11-07-2013 02:28 AM

Quote:

Originally Posted by evo2 (Post 5059883)
Hi,



didn't remember off the top of my head but from a quick scan of https://raid.wiki.kernel.org/index.php/Reconstruction and the mdadm man page it looks like the first thing to try should be:
Code:

mdadm --assemble --scan
However, please check for yourself.

Evo2.

Thanks - I've been doing a bit of reading on mdadm --assemble as well. Will this not damage or endanger any of the other raid devices or the raid setup itself? I can't have any of the other partitions or md-devices go down, as our mail services etc run on this same server.

reano 11-07-2013 06:02 AM

Actually, let me clarify - if I do a:

Code:

mdadm --assemble --scan
Then it will essentially be doing the same as:

Code:

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1
My main concern here is, while it's doing that, what's happening with md0? Because md0 is online right now (albeit without it's sdb1 mirror, only with sda1) and the root filesystem is mounted on md0. So if I do an assemble, will it interrupt the filesystem in any way, or can I safely do it while the server is running with users connected to it? (which is 24/7 unfortunately).

vishesh 11-07-2013 07:51 AM

I think its better to stop md device. What is output of mdadm --detail /dev/md0

Thanks

reano 11-07-2013 07:56 AM

I can't stop the device :(
Also, the / root filesystem is mounted on md0.

The output you requested is:

Code:

/dev/md0:
        Version : 1.2
  Creation Time : Sat Dec 29 17:09:45 2012
    Raid Level : raid1
    Array Size : 1464710976 (1396.86 GiB 1499.86 GB)
  Used Dev Size : 1464710976 (1396.86 GiB 1499.86 GB)
  Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Thu Nov  7 15:55:07 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

          Name : lia:0  (local to host lia)
          UUID : eb302d19:ff70c7bf:401d63af:ed042d59
        Events : 26216

    Number  Major  Minor  RaidDevice State
      0      8        1        0      active sync  /dev/sda1
      1      0        0        1      removed

What's interesting is that it shows sdb1 as removed, not failed or spare.

vishesh 11-07-2013 08:04 AM

I think if its showing removed that following command should recover

mdadm /dev/md0 -a /dev/sdb1

Thanks

reano 11-07-2013 08:13 AM

Quote:

Originally Posted by vishesh (Post 5060032)
I think if its showing removed that following command should recover

mdadm /dev/md0 -a /dev/sdb1

Thanks

Is that not the same as mdadm /dev/md0 --add /dev/sdb1 ? If so, that doesn't work (see above for the error message I got when I tried that).

vishesh 11-07-2013 08:51 AM

I am unable to see any error message above . Ideally for replacing device , I follow

mdadm /dev/md0 -f /dev/sdb1
mdadm /dev/md0 -r /dev/sdb1
mdadm /dev/md0 -a /dev/sdb1

Thanks

reano 11-07-2013 08:54 AM

Quote:

Originally Posted by vishesh (Post 5060054)
I am unable to see any error message above . Ideally for replacing device , I follow

mdadm /dev/md0 -f /dev/sdb1
mdadm /dev/md0 -r /dev/sdb1
mdadm /dev/md0 -a /dev/sdb1

Thanks

Ah sorry, seems I didn't post the result in the original post. When I do the -a (or --add) I get the following:

Code:

mdadm: add new device failed for /dev/sdb1 as 2: Invalid argument
I haven't tried to do it in that order (first f, then r, then a). I can't damage anything further than it already is, can I? Keep in mind that sda1 and sdb1 (in other words, md0) contains the root filesystem. At the moment md0 seems to run only on sda1 (and not on sdb1). At least the server is still running.

reano 11-08-2013 12:34 AM

Got the following results:

Code:

root@lia:~# mdadm /dev/md0 -f /dev/sdb1
mdadm: set device faulty failed for /dev/sdb1:  No such device

root@lia:~# mdadm /dev/md0 -r /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: No such device or address

root@lia:~# mdadm /dev/md0 -a /dev/sdb1
mdadm: add new device failed for /dev/sdb1 as 2: Invalid argument


reano 11-13-2013 01:13 AM

Hate to bump a thread, but I still need help with this. Any advice, anyone? :)

evo2 11-13-2013 01:18 AM

Hi,

mdadm doesn't seem to see /dev/sdb1 at all. I suggest you investigate its status with other tools. Eg fdisk

Evo2.


All times are GMT -5. The time now is 11:55 PM.