LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   raid5 with mdadm does not ron or rebuild (http://www.linuxquestions.org/questions/linux-general-1/raid5-with-mdadm-does-not-ron-or-rebuild-505361/)

adiehl 11-27-2006 06:00 PM

raid5 with mdadm does not ron or rebuild
 
Hi,

I created a software raid with mdadm, raid5, on 4 sata-drives.
Everything worked fine, the raid was built in background and I copied my data on the md-device.
After that, building wasn't complete, I had to reboot some times but the build process continued without problems.
Now, after another reboot (build still not complete), the raid does not become accessible.
I am using 4 300 GB drives with 1 partition (type fd linux raid autodetect) on brand-new sata-drives.

dmesg says:
md: md1 stopped.
md: unbind<sda1>
md: export_rdev(sda1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: md1 stopped.
md: bind<sdb1>
md: bind<sdc1>
md: bind<sdd1>
md: bind<sda1>
md: md1: raid array is not clean -- starting background reconstruction
raid5: device sda1 operational as raid disk 0
raid5: device sdc1 operational as raid disk 2
raid5: device sdb1 operational as raid disk 1
raid5: cannot start dirty degraded array for md1
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:sdc1
raid5: failed to run raid set md1
md: pers->run() failed ...

---

When trying to start raid manually:
root@server ~# mdadm --assemble --force /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: failed to RUN_ARRAY /dev/md1: Input/output error

Details:
root@server ~# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Sun Nov 26 16:54:24 2006
Raid Level : raid5
Device Size : 293033536 (279.46 GiB 300.07 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Mon Nov 27 01:13:09 2006
State : active, degraded, Not Started
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 64K

UUID : 000dc389:67464f1a:8527aa2f:cdb725ee
Events : 0.41809

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 0 0 3 removed

4 8 49 - spare /dev/sdd1

---
So why is there a removed drive and a spare? I didn't define any spare device ...

I tried anyhting I found about this issue, but nothing helps.

Distro is Gentoo with Kernel 2.6.18 with all required modules built-in.
I tried the same with Mandriva and Debian, same problem.

Cables & Drives are OK (tested on this and another system).

I think there might be a problem that an active raid device is recognized as a spare drive?

I would appreciate any help in this, as I have important personal data on the raid-array which is currently not backed up.

cwilkins 11-28-2006 12:07 PM

Hi adiehl,
Sounds like we may both be in the same (sinking?) boat and neither of us have been rescued yet. I was going to simply post a link to all my sordid details over on Linux Forums, but I'm not allowed, so I'll repost them here.

You might want to step through and see how closely they match your details. Maybe we can gang up on this at least...

I have got to get this array back up today -- the natives are getting restless...

-cw-

Post 1:

Ok, I'm a Linux software raid veteran and I have the scars to prove it (google for mddump if you're bored), but that's not doing me much good now. I'm at the end of my rope... er... SATA cable. Help? Please??

The subject platform is a PC running FC5 (Fedora Core 5, patched latest) with eight 400gb SATA drives (/dev/sd[b-i]1) assembled into a RAID6 md0 device. Originally built with mdadm. No LVM or other exotics. /dev/md0 is a /data filesystem, nothing there needed at boot time. It's been humming along nicely for months.

Then... This morning I found that /dev/sdb1 had been kicked out of the array and there was the requisite screaming in /var/log/messages about failed read/writes, SMART errors, highly miffed SATA controllers, etc., all associated with /dev/sdb1. (It appears to have been a temporary failure -- badblocks found no problems.) Tried shutting the system down cleanly, which didn't seem to be working, so finally crossed my fingers and hit the reset button.

No surprise, it booted back up refusing to assemble the array. More specfically:

Code:

Nov 27 19:03:52 ornery kernel: md: bind<sdb1>
Nov 27 19:03:52 ornery kernel: md: bind<sdd1>
Nov 27 19:03:52 ornery kernel: md: bind<sde1>
Nov 27 19:03:52 ornery kernel: md: bind<sdf1>
Nov 27 19:03:52 ornery kernel: md: bind<sdg1>
Nov 27 19:03:52 ornery kernel: md: bind<sdh1>
Nov 27 19:03:52 ornery kernel: md: bind<sdi1>
Nov 27 19:03:52 ornery kernel: md: bind<sdc1>
Nov 27 19:03:52 ornery kernel: md: kicking non-fresh sdb1 from array!
Nov 27 19:03:52 ornery kernel: md: unbind<sdb1>
Nov 27 19:03:52 ornery kernel: md: export_rdev(sdb1)
Nov 27 19:03:52 ornery kernel: md: md0: raid array is not clean -- starting back
ground reconstruction
Nov 27 19:03:52 ornery kernel: raid5: device sdc1 operational as raid disk 1
Nov 27 19:03:52 ornery kernel: raid5: device sdi1 operational as raid disk 7
Nov 27 19:03:52 ornery kernel: raid5: device sdh1 operational as raid disk 6
Nov 27 19:03:52 ornery kernel: raid5: device sdg1 operational as raid disk 5
Nov 27 19:03:52 ornery kernel: raid5: device sdf1 operational as raid disk 4
Nov 27 19:03:52 ornery kernel: raid5: device sde1 operational as raid disk 3
Nov 27 19:03:52 ornery kernel: raid5: device sdd1 operational as raid disk 2
Nov 27 19:03:52 ornery kernel: raid5: cannot start dirty degraded array for md0
Nov 27 19:03:52 ornery kernel: RAID5 conf printout:
Nov 27 19:03:52 ornery kernel:  --- rd:8 wd:7 fd:1
Nov 27 19:03:52 ornery kernel:  disk 1, o:1, dev:sdc1
Nov 27 19:03:52 ornery kernel:  disk 2, o:1, dev:sdd1
Nov 27 19:03:52 ornery kernel:  disk 3, o:1, dev:sde1
Nov 27 19:03:52 ornery kernel:  disk 4, o:1, dev:sdf1
Nov 27 19:03:52 ornery kernel:  disk 5, o:1, dev:sdg1
Nov 27 19:03:52 ornery kernel:  disk 6, o:1, dev:sdh1
Nov 27 19:03:52 ornery kernel:  disk 7, o:1, dev:sdi1
Nov 27 19:03:52 ornery kernel: raid5: failed to run raid set md0
Nov 27 19:03:52 ornery kernel: md: pers->run() failed ...

Code:

[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      2734961152 blocks

unused devices: <none>

Attempts to force assembly fail:

Code:

[root@ornery ~]# mdadm -S /dev/md0
[root@ornery ~]# mdadm --assemble --force --scan /dev/md0
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

Leaving out the bad drive:

Code:

[root@ornery ~]# mdadm -S /dev/md0
[root@ornery ~]# mdadm --assemble --force /dev/md0 /dev/sd[c-i]1
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
[root@ornery ~]# mdadm -S /dev/md0
[root@ornery ~]# mdadm --assemble --force --run /dev/md0 /dev/sd[c-i]1
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

Trying to fail or remove the bad drive doesn't work either:

Code:

[root@ornery ~]# mdadm -f /dev/md0 /dev/sdb1
mdadm: set device faulty failed for /dev/sdb1:  No such device
[root@ornery ~]# mdadm -r /dev/md0 /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: No such device

A quick check of the event counters shows that only /dev/sdb is stale:

Code:

[root@ornery ~]# mdadm -E /dev/sd[b-i]1 | grep Event
        Events : 0.851758
        Events : 0.854919
        Events : 0.854919
        Events : 0.854919
        Events : 0.854919
        Events : 0.854919
        Events : 0.854919
        Events : 0.854919

Here's a full examine from one of the good drives:

Code:

[root@ornery ~]# mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.03
          UUID : d57cea81:3be21b7d:183a67d9:782c3329
  Creation Time : Tue Mar 21 11:14:56 2006
    Raid Level : raid6
    Device Size : 390708736 (372.61 GiB 400.09 GB)
    Array Size : 2344252416 (2235.65 GiB 2400.51 GB)
  Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Mon Nov 27 10:10:36 2006
          State : active
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
      Checksum : ebd6e3a8 - correct
        Events : 0.854919


      Number  Major  Minor  RaidDevice State
this    1      8      33        1      active sync  /dev/sdc1

  0    0      0        0        0      removed
  1    1      8      33        1      active sync  /dev/sdc1
  2    2      8      49        2      active sync  /dev/sdd1
  3    3      8      65        3      active sync  /dev/sde1
  4    4      8      81        4      active sync  /dev/sdf1
  5    5      8      97        5      active sync  /dev/sdg1
  6    6      8      113        6      active sync  /dev/sdh1
  7    7      8      129        7      active sync  /dev/sdi1

And detail for the array:

Code:

[root@ornery ~]# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 21 11:14:56 2006
    Raid Level : raid6
    Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 8
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Nov 27 10:10:36 2006
          State : active, degraded
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

    Chunk Size : 256K

          UUID : d57cea81:3be21b7d:183a67d9:782c3329
        Events : 0.854919

    Number  Major  Minor  RaidDevice State
  9421816      0        0    1912995864      removed
      1      8      33        1      active sync  /dev/sdc1
      2      8      49        2      active sync  /dev/sdd1
      3      8      65        3      active sync  /dev/sde1
      4      8      81        4      active sync  /dev/sdf1
      5      8      97        5      active sync  /dev/sdg1
      6      8      113        6      active sync  /dev/sdh1
      7      8      129        7      active sync  /dev/sdi1

So I've obviously got a degraded array. Where does the "dirty" part come in? Why can't I simply force this thing back together in active degraded mode with 7 drives and then add a fresh /dev/sdb1?

I know as a last resort I can create a "new" array over my old one and as long as I get everything juuuuust right, it'll work, but that seems a rather drastic solution to what should be a trivial (and all to common) situation -- dealing with a single failed drive. I mean... I run RAID6 to provide a little extra protection, not to slam into these kinds of brick walls. Heck, I might as well run RAID0! ARGH!!! Ok... ok... I'll calm down.

FWIW, here's my mdadm.conf:

Code:

[root@ornery ~]# grep -v '^#' /etc/mdadm.conf
DEVICE /dev/sd[bcdefghi]1
ARRAY /dev/md0 UUID=d57cea81:3be21b7d:183a67d9:782c3329
MAILADDR root

Have I missed something obvious? Thanks in advance for any clues...

Followup Post:

Ok, done a bit more poking around... I tried zeroing out the superblock on the failed device and adding it back into the array. It just sat there looking stupid. The status of the new drive became "sync", the array status remained inactive, and no resync took place:

Code:

[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdb1[0](S) sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      3125669888 blocks

unused devices: <none>

Another thing I noticed was the new drive didn't fill the slot for the missing drive, but instead occupied a new slot. Here's a detail for the array:

Code:

[root@ornery ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 21 11:14:56 2006
    Raid Level : raid6
    Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Nov 27 10:10:36 2006
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

    Chunk Size : 256K

          UUID : d57cea81:3be21b7d:183a67d9:782c3329
        Events : 0.854919

    Number  Major  Minor  RaidDevice State
  4150256      0        0    1912995872      removed
      1      8      33        1      active sync  /dev/sdc1
      2      8      49        2      active sync  /dev/sdd1
      3      8      65        3      active sync  /dev/sde1
      4      8      81        4      active sync  /dev/sdf1
      5      8      97        5      active sync  /dev/sdg1
      6      8      113        6      active sync  /dev/sdh1
      7      8      129        7      active sync  /dev/sdi1

      0      8      17        -      active sync  /dev/sdb1

It's like it's just adding the new /dev/sdb1 in as a spare or something. My hunch is that the problem stems from the superblock indicating that the bad device is simply "removed" rather than failed. Yet trying to fail the device... well, failed.

Barring any sudden insights from my fellow Linuxens, it's looking like I have another romp with mddump looming in my future. By my reckoning, I would need to set the SB's to indicate that device 0's status is failed rather than removed, and set the counters to indicate 1 failed device and 7 active/working devices.

If anyone has suggestions, feel free to jump in at any time!! :-)

cwilkins 11-28-2006 03:47 PM

The silence is deafening!
 
Ok, I tried hacking up the superblocks with mddump. The good news is I didn't screw anything up permanently. The bad news is I made no progress either.

Ultimately, I started reading through the kernel source and wandered into a helpful text file Documentation/md.txt in the kernel source tree. I was able to start the array, for reading at least. (baby steps...) Here's how:

Code:

[root@ornery ~]# cat /sys/block/md0/md/array_state
inactive
[root@ornery ~]# echo "clean" > /sys/block/md0/md/array_state
[root@ornery ~]# cat /sys/block/md0/md/array_state
clean
[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      2344252416 blocks level 6, 256k chunk, algorithm 2 [8/7] [_UUUUUUU]

unused devices: <none>
[root@ornery ~]# mount -o ro /dev/md0 /data
[root@ornery ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda2            226G  46G  168G  22% /
/dev/hda1            251M  52M  187M  22% /boot
/dev/shm              2.9G    0  2.9G  0% /dev/shm
/dev/sda2              65G  35G  27G  56% /var
/dev/md0              2.2T  307G  1.8T  15% /data

At least I can get to my data now. Yay!

cwilkins 11-29-2006 11:49 AM

Backup successful!

So after that, I did the following:

Code:

umount /data
Code:

mdadm /dev/md0 -a /dev/sdb1
The drive was added without error. A quick check of the array:

Code:

[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb1[8] sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      2344252416 blocks level 6, 256k chunk, algorithm 2 [8/7] [_UUUUUUU]
      [>....................]  recovery =  0.2% (823416/390708736) finish=13924.4min speed=465K/sec

unused devices: <none>

...and...

Code:

[root@ornery ~]# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 21 11:14:56 2006
    Raid Level : raid6
    Array Size : 2344252416 (2235.65 GiB 2400.51 GB)
    Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Nov 29 11:03:51 2006
          State : clean, degraded, recovering
 Active Devices : 7
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 1

    Chunk Size : 256K

 Rebuild Status : 0% complete

          UUID : d57cea81:3be21b7d:183a67d9:782c3329
        Events : 0.854924

    Number  Major  Minor  RaidDevice State
      8      8      17        0      spare rebuilding  /dev/sdb1
      1      8      33        1      active sync  /dev/sdc1
      2      8      49        2      active sync  /dev/sdd1
      3      8      65        3      active sync  /dev/sde1
      4      8      81        4      active sync  /dev/sdf1
      5      8      97        5      active sync  /dev/sdg1
      6      8      113        6      active sync  /dev/sdh1
      7      8      129        7      active sync  /dev/sdi1

Now that's what I was looking for! It's moving kinda slow right now, probably because I'm also doing an fsck.

I can't be certain, but I think the problem was that the state of the good drives (and the array) were marked as "active" rather than "clean." (active == dirty?) I expect this was caused by doing a hard reset on a system with a degraded array, in the midst of it being brought to a crawl trying to talk to the failed drive. Seems like some work might be needed to be able to handle these situations a little more gracefully.

Anyway, it appears I might be firmly on the road to recovery now. (If not, you'll hear the screams...) Hopefully my posts will be helpful to others encountering this problem.

-cw-

bnuytten 12-24-2006 06:55 AM

raid5 + LVM
 
I experienced a similar problem. Using raid5 on 4 drives and LVM+ext3. After manually overriding the arrays state
Code:

echo "clean" > /sys/block/md0/md/array_state
I checked the events on all disks and the array itself
Code:

[root@juno ~]# mdadm --examine /dev/hd[bdfh]1 | grep Event
        Events : 0.87645
        Events : 0.87645
        Events : 0.87644
        Events : 0.87462
[root@juno ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sat Nov  4 02:38:57 2006
    Raid Level : raid5
    Device Size : 156288256 (149.05 GiB 160.04 GB)
  Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Dec 24 07:31:10 2006
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

        Layout : left-symmetric
    Chunk Size : 64K

          UUID : 916d12f4:0df2cd68:594a1080:6da31000
        Events : 0.87645

    Number  Major  Minor  RaidDevice State
      0      3      65        0      active sync  /dev/hdb1
      1      22      65        1      active sync  /dev/hdd1
      2      33      65        2      active sync  /dev/hdf1
      0      0        0        0      removed

As you all know I need n-1 good drives in a RAID5 array to recover the data. In this case I need three. But I only have two according to the events. :eek: So I took the three best, i.e. those three closest to the value of the md array itself. Using the same technique described above, I was able to recover all my data. Phew! :cool:

myrons41 03-22-2007 07:43 PM

Instructive, the part on fixing the /dev/mdN through applying writes on /sys/block/mdN/.. as documented in the /usr/linux/Documentation/md.txt (cwilkins post).
My raid:
sA2-AT8:/home/miroa # mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
Creation Time : Thu Mar 22 23:10:03 2007
Raid Level : raid5
Device Size : 34700288 (33.09 GiB 35.53 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Fri Mar 23 00:53:09 2007
State : clean, Not Started
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

UUID : b52708b8:2c956410:02c2c543:95cb1048
Events : 0.20

Number Major Minor RaidDevice State
0 8 7 0 active sync /dev/sda7
1 8 23 1 active sync /dev/sdb7
2 8 39 2 active sync /dev/sdc7
3 8 55 3 active sync /dev/sdd7
4 8 71 4 active sync /dev/sde7
sA2-AT8:/home/miroa #
I thought that "Not Started" could be some indication...
As no mke2fs /dev/md3 works on it.
sA2-AT8:/home/miroa # mke2fs /dev/md3
mke2fs 1.39 (29-May-2006)
mke2fs: Device size reported to be zero. Invalid partition specified, or
partition table wasn't reread after running fdisk, due to
a modified partition being busy and in use. You may need to reboot
to re-read your partition table.
But I checked the sizes, they're ok.
sA2-AT8:/home/miroa # cat /sys/block/md3/md/dev-sd?7/size
34700288
34700288
34700288
34700288
34700288
Or maybe other kind sizes are in question here? Yes!!!
This is wrong. The raid in top has the size the same as the components:
34700288
It should read:
138801152 (which is 4x),
similarly as this one in the same box of mine:
sA2-AT8:/home/miroa # mdadm -D /dev/md2
/dev/md2:
Version : 00.90.03
Creation Time : Thu Mar 22 20:23:00 2007
Raid Level : raid5

Array Size : 606227968 (578.14 GiB 620.78 GB)
Device Size : 151556992 (144.54 GiB 155.19 GB

I have just now, while writing this, understood what is very wrong with my
/dev/md3.
Earlier I fiddled with issueing
echo "check" > /sys/block/md3/md/array_state (that got it rebuilding itself)
and
echo "idle" > /sys/block/md3/md/sync_action (that did nothing in my case)
because I wanted to stop or run the array (none could be done -how could it?)
My question is:
How do I fix this?
Suse 10.2, updated regularly online. Arc x86_64.
I even tried:
mdadm --zero-superblock /dev/sdN7
and deleted all partitions /dev/sd[a-e]7 and recreated them, and even tried
formating them in desperation... (useless the latter, as raid info resides in
those arcane things called superblocks, not in mundane user-space regular disk
blocks...
Getting at my wits' end...
Don't even want to go to sleep till I try some more to bring this raid to
obedience. Argh!
Anyone has ideas on this?

myrons41 03-22-2007 08:03 PM

Well, I decided to reboot (as if I didn't too many times already in the last hours) just to make sure. And what was:
> Raid Level : raid5
> Device Size : 34700288 (33.09 GiB 35.53 GB)
is now:
Raid Level : raid5
Array Size : 138801152 (132.37 GiB 142.13 GB)
Device Size : 34700288 (33.09 GiB 35.53 GB)
And the mke2fs did its job just fine as well...

bnuytten 03-23-2007 02:58 AM

Quote:

Originally Posted by myrons41
Well, I decided to reboot (as if I didn't too many times already in the last hours) just to make sure. And what was:
> Raid Level : raid5
> Device Size : 34700288 (33.09 GiB 35.53 GB)
is now:
Raid Level : raid5
Array Size : 138801152 (132.37 GiB 142.13 GB)
Device Size : 34700288 (33.09 GiB 35.53 GB)
And the mke2fs did its job just fine as well...


From your starting point, a clean raid5 array, I would have advised to just try to start the array. If this is succesfull, you should get a output that looks like this:
Code:

mdadm --run /dev/md3
mdadm: /dev/md3 has been started with 5 drives.

Since you created the RAID5 array just yesterday and you ordered a "check", I assume it was still rebuilding it's parity data when you issued the command:
Quote:

echo "idle" > /sys/block/md3/md/sync_action (that did nothing in my case)
It probably did do something. You first instructed the array to start checking/rebuilding and then you said to the array: stop synchronizing/rebuilding your disks which left the array "assembled", but not fully "started".

The device size was reported zero by the mkfs utility probably because the array was in this half stopped, half started state. Rebooting the machine causes your RAID devices to be stopped on shutdown (mdadm --stop /dev/md3) and restarted on startup (mdadm --assemble /dev/md3 /dev/sd[a-e]7).

myrons41 03-23-2007 05:40 AM

Quote:

Originally Posted by bnuytten
Since you created the RAID5 array just yesterday and you ordered a "check", I assume it was still rebuilding ...

No. I wouldn't do such a thing.
The problem was altogether different than your suggestions.
Take another look at those size reports of mine,
how could it run or do anything really?
Otherwise, why delve into it, when it's solved?
Unless someone else reading this got similar issue as mine was, and is coping with it at this time.

linux1windows0 04-08-2008 08:13 PM

Thanks to C Wilson for the following insight

[root@ornery ~]# cat /sys/block/md0/md/array_state
inactive
[root@ornery ~]# echo "clean" > /sys/block/md0/md/array_state
[root@ornery ~]# cat /sys/block/md0/md/array_state
clean


Once this was done I was able to use mdadm /dev/md0 -a /dev/sdc2 which was the drive that was corrupted and the system rebuilt itself as it should. It has been 2 days and I cannot detect any issues with the original fault. As far as I can tell there was a power interruption which resulted in the storage of some sort of faulty data which prevented the autorebuild of the array using the commonly recommended mdadm -A --force /dev/md0 even when specifying all drives. It always resulted in the same
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error. My array is comprised of 3 drives 1 of which was kicked out due to, I believe, a power supply issue, which ultimately appears to have been related to massive buildup of dust I am embarrassed to say. The drive was good as evidenced by the boot up state and the disk SMART evaluation, and the recent few days. Once using C Wilsons method above all was repaired.

Thanks again. Much nicer to recover the array than lose it all as has been my experience with Windows on several occasions.

HellesAngel 10-11-2009 02:12 PM

Very similar problems, but slightly different results
 
Edit: Moved to a new thread here.

Perhaps someone will be able to help me, I hope so...

My server is running openSUSE 11.1 and I built a RAID5 array using Yast and three identical Samsung 1TB disks that mount as /dev/sdb1 /dev/sdc1 and /dev/sdd1. Everything ran fine for a couple of months then suddenly, for no apparent reason, the computer failed to start. It boots from another disk /dev/sda1 but tries to mount the raid array /dev/md0 and as this fails the boot stops at a rescue prompt. Not good.

Sadly there's no logging at this point so everything I give here as information has been typed in the long way so be gentle if I'm a bit sparse on details.

The first sign of trouble is in the boot text:
Quote:

Starting MD Raid md: md0 stopped.
md: kicking non-fresh sdc1 from array!
Then a full recovery is apparently attempted using 2 out of 3 devices but then this fails:
Quote:

md0: bitmap initialisation failed: -5
md0: failed to create bitmap (-5)
mdadm: failed to RUN_ARRAY /dev/md/0: Input/output error
I've tried the solutions above but they didn't work and now I'm starting the process of learning mdadm and RAID in general. What seems odd is that all the disks seem OK, the BIOS sees them, when booted with a live CD they're all present, it seems that somehow the RAID configuration has been damaged.

Firstly it would be helpful if I could boot the system and repair the RAID array with logging available so how can I remove the RAID array from the boot process? It's not needed to get the system going so this should be possible.

Tomorrow I'll buy another clean disk and add it to the array to see if that helps but in the meantime can anyone offer any help? I know my way around Linux a bit but RAID is something new.

jonbers 02-16-2010 10:31 PM

thanks a lot cwilkins, I solve my problem thru this code...

[root@ornery ~]# cat /sys/block/md0/md/array_state
inactive
[root@ornery ~]# echo "clean" > /sys/block/md0/md/array_state
[root@ornery ~]# cat /sys/block/md0/md/array_state
clean
[root@ornery ~]# cat /proc/mdsta

taenus 07-26-2010 10:01 AM

I'm running into a very similar problem on a Ubuntu 10.04 system on my RAID6 volume. All the tests are errors pretty much line up exactly. I'm attempting the echo "clean" fix, but getting this error:

Code:

root@localhost:~# echo 'clean' > /sys/block/md1/md/array_state
-su: echo: write error: Invalid argument

Any ideas why?


All times are GMT -5. The time now is 05:32 AM.