LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux > Linux - General
User Name
Password
Linux - General This forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Tags used in this thread
Popular LQ Tags , ,

Reply
 
Thread Tools
Old 11-27-2006, 06:00 PM   #1
adiehl
LQ Newbie
 
Registered: Nov 2006
Location: Ludwigshafen, Germany
Distribution: Gentoo
Posts: 1
Thanked: 0
raid5 with mdadm does not ron or rebuild


[Log in to get rid of this advertisement]
Hi,

I created a software raid with mdadm, raid5, on 4 sata-drives.
Everything worked fine, the raid was built in background and I copied my data on the md-device.
After that, building wasn't complete, I had to reboot some times but the build process continued without problems.
Now, after another reboot (build still not complete), the raid does not become accessible.
I am using 4 300 GB drives with 1 partition (type fd linux raid autodetect) on brand-new sata-drives.

dmesg says:
md: md1 stopped.
md: unbind<sda1>
md: export_rdev(sda1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: md1 stopped.
md: bind<sdb1>
md: bind<sdc1>
md: bind<sdd1>
md: bind<sda1>
md: md1: raid array is not clean -- starting background reconstruction
raid5: device sda1 operational as raid disk 0
raid5: device sdc1 operational as raid disk 2
raid5: device sdb1 operational as raid disk 1
raid5: cannot start dirty degraded array for md1
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:sdc1
raid5: failed to run raid set md1
md: pers->run() failed ...

---

When trying to start raid manually:
root@server ~# mdadm --assemble --force /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: failed to RUN_ARRAY /dev/md1: Input/output error

Details:
root@server ~# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Sun Nov 26 16:54:24 2006
Raid Level : raid5
Device Size : 293033536 (279.46 GiB 300.07 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Mon Nov 27 01:13:09 2006
State : active, degraded, Not Started
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 64K

UUID : 000dc389:67464f1a:8527aa2f:cdb725ee
Events : 0.41809

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 0 0 3 removed

4 8 49 - spare /dev/sdd1

---
So why is there a removed drive and a spare? I didn't define any spare device ...

I tried anyhting I found about this issue, but nothing helps.

Distro is Gentoo with Kernel 2.6.18 with all required modules built-in.
I tried the same with Mandriva and Debian, same problem.

Cables & Drives are OK (tested on this and another system).

I think there might be a problem that an active raid device is recognized as a spare drive?

I would appreciate any help in this, as I have important personal data on the raid-array which is currently not backed up.
adiehl is offline  
Tag This Post , ,
Reply With Quote
Old 11-28-2006, 12:07 PM   #2
cwilkins
LQ Newbie
 
Registered: Nov 2006
Posts: 3
Thanked: 0
Hi adiehl,
Sounds like we may both be in the same (sinking?) boat and neither of us have been rescued yet. I was going to simply post a link to all my sordid details over on Linux Forums, but I'm not allowed, so I'll repost them here.

You might want to step through and see how closely they match your details. Maybe we can gang up on this at least...

I have got to get this array back up today -- the natives are getting restless...

-cw-

Post 1:

Ok, I'm a Linux software raid veteran and I have the scars to prove it (google for mddump if you're bored), but that's not doing me much good now. I'm at the end of my rope... er... SATA cable. Help? Please??

The subject platform is a PC running FC5 (Fedora Core 5, patched latest) with eight 400gb SATA drives (/dev/sd[b-i]1) assembled into a RAID6 md0 device. Originally built with mdadm. No LVM or other exotics. /dev/md0 is a /data filesystem, nothing there needed at boot time. It's been humming along nicely for months.

Then... This morning I found that /dev/sdb1 had been kicked out of the array and there was the requisite screaming in /var/log/messages about failed read/writes, SMART errors, highly miffed SATA controllers, etc., all associated with /dev/sdb1. (It appears to have been a temporary failure -- badblocks found no problems.) Tried shutting the system down cleanly, which didn't seem to be working, so finally crossed my fingers and hit the reset button.

No surprise, it booted back up refusing to assemble the array. More specfically:

Code:
Nov 27 19:03:52 ornery kernel: md: bind<sdb1>
Nov 27 19:03:52 ornery kernel: md: bind<sdd1>
Nov 27 19:03:52 ornery kernel: md: bind<sde1>
Nov 27 19:03:52 ornery kernel: md: bind<sdf1>
Nov 27 19:03:52 ornery kernel: md: bind<sdg1>
Nov 27 19:03:52 ornery kernel: md: bind<sdh1>
Nov 27 19:03:52 ornery kernel: md: bind<sdi1>
Nov 27 19:03:52 ornery kernel: md: bind<sdc1>
Nov 27 19:03:52 ornery kernel: md: kicking non-fresh sdb1 from array!
Nov 27 19:03:52 ornery kernel: md: unbind<sdb1>
Nov 27 19:03:52 ornery kernel: md: export_rdev(sdb1)
Nov 27 19:03:52 ornery kernel: md: md0: raid array is not clean -- starting back
ground reconstruction
Nov 27 19:03:52 ornery kernel: raid5: device sdc1 operational as raid disk 1
Nov 27 19:03:52 ornery kernel: raid5: device sdi1 operational as raid disk 7
Nov 27 19:03:52 ornery kernel: raid5: device sdh1 operational as raid disk 6
Nov 27 19:03:52 ornery kernel: raid5: device sdg1 operational as raid disk 5
Nov 27 19:03:52 ornery kernel: raid5: device sdf1 operational as raid disk 4
Nov 27 19:03:52 ornery kernel: raid5: device sde1 operational as raid disk 3
Nov 27 19:03:52 ornery kernel: raid5: device sdd1 operational as raid disk 2
Nov 27 19:03:52 ornery kernel: raid5: cannot start dirty degraded array for md0
Nov 27 19:03:52 ornery kernel: RAID5 conf printout:
Nov 27 19:03:52 ornery kernel:  --- rd:8 wd:7 fd:1
Nov 27 19:03:52 ornery kernel:  disk 1, o:1, dev:sdc1
Nov 27 19:03:52 ornery kernel:  disk 2, o:1, dev:sdd1
Nov 27 19:03:52 ornery kernel:  disk 3, o:1, dev:sde1
Nov 27 19:03:52 ornery kernel:  disk 4, o:1, dev:sdf1
Nov 27 19:03:52 ornery kernel:  disk 5, o:1, dev:sdg1
Nov 27 19:03:52 ornery kernel:  disk 6, o:1, dev:sdh1
Nov 27 19:03:52 ornery kernel:  disk 7, o:1, dev:sdi1
Nov 27 19:03:52 ornery kernel: raid5: failed to run raid set md0
Nov 27 19:03:52 ornery kernel: md: pers->run() failed ...
Code:
[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      2734961152 blocks

unused devices: <none>
Attempts to force assembly fail:

Code:
[root@ornery ~]# mdadm -S /dev/md0
[root@ornery ~]# mdadm --assemble --force --scan /dev/md0
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
Leaving out the bad drive:

Code:
[root@ornery ~]# mdadm -S /dev/md0
[root@ornery ~]# mdadm --assemble --force /dev/md0 /dev/sd[c-i]1
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
[root@ornery ~]# mdadm -S /dev/md0
[root@ornery ~]# mdadm --assemble --force --run /dev/md0 /dev/sd[c-i]1
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
Trying to fail or remove the bad drive doesn't work either:

Code:
[root@ornery ~]# mdadm -f /dev/md0 /dev/sdb1
mdadm: set device faulty failed for /dev/sdb1:  No such device
[root@ornery ~]# mdadm -r /dev/md0 /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: No such device
A quick check of the event counters shows that only /dev/sdb is stale:

Code:
[root@ornery ~]# mdadm -E /dev/sd[b-i]1 | grep Event
         Events : 0.851758
         Events : 0.854919
         Events : 0.854919
         Events : 0.854919
         Events : 0.854919
         Events : 0.854919
         Events : 0.854919
         Events : 0.854919
Here's a full examine from one of the good drives:

Code:
[root@ornery ~]# mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : d57cea81:3be21b7d:183a67d9:782c3329
  Creation Time : Tue Mar 21 11:14:56 2006
     Raid Level : raid6
    Device Size : 390708736 (372.61 GiB 400.09 GB)
     Array Size : 2344252416 (2235.65 GiB 2400.51 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Mon Nov 27 10:10:36 2006
          State : active
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ebd6e3a8 - correct
         Events : 0.854919


      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       65        3      active sync   /dev/sde1
   4     4       8       81        4      active sync   /dev/sdf1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       8      113        6      active sync   /dev/sdh1
   7     7       8      129        7      active sync   /dev/sdi1
And detail for the array:

Code:
[root@ornery ~]# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 21 11:14:56 2006
     Raid Level : raid6
    Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 8
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Nov 27 10:10:36 2006
          State : active, degraded
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

           UUID : d57cea81:3be21b7d:183a67d9:782c3329
         Events : 0.854919

    Number   Major   Minor   RaidDevice State
   9421816       0        0    1912995864      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      active sync   /dev/sdh1
       7       8      129        7      active sync   /dev/sdi1
So I've obviously got a degraded array. Where does the "dirty" part come in? Why can't I simply force this thing back together in active degraded mode with 7 drives and then add a fresh /dev/sdb1?

I know as a last resort I can create a "new" array over my old one and as long as I get everything juuuuust right, it'll work, but that seems a rather drastic solution to what should be a trivial (and all to common) situation -- dealing with a single failed drive. I mean... I run RAID6 to provide a little extra protection, not to slam into these kinds of brick walls. Heck, I might as well run RAID0! ARGH!!! Ok... ok... I'll calm down.

FWIW, here's my mdadm.conf:

Code:
[root@ornery ~]# grep -v '^#' /etc/mdadm.conf
DEVICE /dev/sd[bcdefghi]1
ARRAY /dev/md0 UUID=d57cea81:3be21b7d:183a67d9:782c3329
MAILADDR root
Have I missed something obvious? Thanks in advance for any clues...

Followup Post:

Ok, done a bit more poking around... I tried zeroing out the superblock on the failed device and adding it back into the array. It just sat there looking stupid. The status of the new drive became "sync", the array status remained inactive, and no resync took place:

Code:
[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdb1[0](S) sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      3125669888 blocks

unused devices: <none>
Another thing I noticed was the new drive didn't fill the slot for the missing drive, but instead occupied a new slot. Here's a detail for the array:

Code:
[root@ornery ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 21 11:14:56 2006
     Raid Level : raid6
    Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Nov 27 10:10:36 2006
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

           UUID : d57cea81:3be21b7d:183a67d9:782c3329
         Events : 0.854919

    Number   Major   Minor   RaidDevice State
   4150256       0        0    1912995872      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      active sync   /dev/sdh1
       7       8      129        7      active sync   /dev/sdi1

       0       8       17        -      active sync   /dev/sdb1
It's like it's just adding the new /dev/sdb1 in as a spare or something. My hunch is that the problem stems from the superblock indicating that the bad device is simply "removed" rather than failed. Yet trying to fail the device... well, failed.

Barring any sudden insights from my fellow Linuxens, it's looking like I have another romp with mddump looming in my future. By my reckoning, I would need to set the SB's to indicate that device 0's status is failed rather than removed, and set the counters to indicate 1 failed device and 7 active/working devices.

If anyone has suggestions, feel free to jump in at any time!! :-)
cwilkins is offline     Reply With Quote
Old 11-28-2006, 03:47 PM   #3
cwilkins
LQ Newbie
 
Registered: Nov 2006
Posts: 3
Thanked: 0
The silence is deafening!

Ok, I tried hacking up the superblocks with mddump. The good news is I didn't screw anything up permanently. The bad news is I made no progress either.

Ultimately, I started reading through the kernel source and wandered into a helpful text file Documentation/md.txt in the kernel source tree. I was able to start the array, for reading at least. (baby steps...) Here's how:

Code:
[root@ornery ~]# cat /sys/block/md0/md/array_state
inactive
[root@ornery ~]# echo "clean" > /sys/block/md0/md/array_state
[root@ornery ~]# cat /sys/block/md0/md/array_state
clean
[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      2344252416 blocks level 6, 256k chunk, algorithm 2 [8/7] [_UUUUUUU]

unused devices: <none>
[root@ornery ~]# mount -o ro /dev/md0 /data
[root@ornery ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda2             226G   46G  168G  22% /
/dev/hda1             251M   52M  187M  22% /boot
/dev/shm              2.9G     0  2.9G   0% /dev/shm
/dev/sda2              65G   35G   27G  56% /var
/dev/md0              2.2T  307G  1.8T  15% /data
At least I can get to my data now. Yay!
cwilkins is offline     Reply With Quote
Old 11-29-2006, 11:49 AM   #4
cwilkins
LQ Newbie
 
Registered: Nov 2006
Posts: 3
Thanked: 0
Backup successful!

So after that, I did the following:

Code:
umount /data
Code:
mdadm /dev/md0 -a /dev/sdb1
The drive was added without error. A quick check of the array:

Code:
[root@ornery ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb1[8] sdc1[1] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2]
      2344252416 blocks level 6, 256k chunk, algorithm 2 [8/7] [_UUUUUUU]
      [>....................]  recovery =  0.2% (823416/390708736) finish=13924.4min speed=465K/sec

unused devices: <none>
...and...

Code:
[root@ornery ~]# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 21 11:14:56 2006
     Raid Level : raid6
     Array Size : 2344252416 (2235.65 GiB 2400.51 GB)
    Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Nov 29 11:03:51 2006
          State : clean, degraded, recovering
 Active Devices : 7
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 1

     Chunk Size : 256K

 Rebuild Status : 0% complete

           UUID : d57cea81:3be21b7d:183a67d9:782c3329
         Events : 0.854924

    Number   Major   Minor   RaidDevice State
       8       8       17        0      spare rebuilding   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      active sync   /dev/sdh1
       7       8      129        7      active sync   /dev/sdi1
Now that's what I was looking for! It's moving kinda slow right now, probably because I'm also doing an fsck.

I can't be certain, but I think the problem was that the state of the good drives (and the array) were marked as "active" rather than "clean." (active == dirty?) I expect this was caused by doing a hard reset on a system with a degraded array, in the midst of it being brought to a crawl trying to talk to the failed drive. Seems like some work might be needed to be able to handle these situations a little more gracefully.

Anyway, it appears I might be firmly on the road to recovery now. (If not, you'll hear the screams...) Hopefully my posts will be helpful to others encountering this problem.

-cw-
cwilkins is offline     Reply With Quote
Old 12-24-2006, 06:55 AM   #5
bnuytten
LQ Newbie
 
Registered: Dec 2006
Posts: 2
Thanked: 0
Talking raid5 + LVM

I experienced a similar problem. Using raid5 on 4 drives and LVM+ext3. After manually overriding the arrays state
Code:
echo "clean" > /sys/block/md0/md/array_state
I checked the events on all disks and the array itself
Code:
[root@juno ~]# mdadm --examine /dev/hd[bdfh]1 | grep Event
         Events : 0.87645
         Events : 0.87645
         Events : 0.87644
         Events : 0.87462
[root@juno ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sat Nov  4 02:38:57 2006
     Raid Level : raid5
    Device Size : 156288256 (149.05 GiB 160.04 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Dec 24 07:31:10 2006
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 916d12f4:0df2cd68:594a1080:6da31000
         Events : 0.87645

    Number   Major   Minor   RaidDevice State
       0       3       65        0      active sync   /dev/hdb1
       1      22       65        1      active sync   /dev/hdd1
       2      33       65        2      active sync   /dev/hdf1
       0       0        0        0      removed
As you all know I need n-1 good drives in a RAID5 array to recover the data. In this case I need three. But I only have two according to the events. So I took the three best, i.e. those three closest to the value of the md array itself. Using the same technique described above, I was able to recover all my data. Phew!
bnuytten is offline     Reply With Quote
Old 03-22-2007, 07:43 PM   #6
myrons41
LQ Newbie
 
Registered: Nov 2002
Location: Zagreb, Croatia
Distribution: Suse 10.2
Posts: 8
Thanked: 0
Instructive, the part on fixing the /dev/mdN through applying writes on /sys/block/mdN/.. as documented in the /usr/linux/Documentation/md.txt (cwilkins post).
My raid:
sA2-AT8:/home/miroa # mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
Creation Time : Thu Mar 22 23:10:03 2007
Raid Level : raid5
Device Size : 34700288 (33.09 GiB 35.53 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Fri Mar 23 00:53:09 2007
State : clean, Not Started
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

UUID : b52708b8:2c956410:02c2c543:95cb1048
Events : 0.20

Number Major Minor RaidDevice State
0 8 7 0 active sync /dev/sda7
1 8 23 1 active sync /dev/sdb7
2 8 39 2 active sync /dev/sdc7
3 8 55 3 active sync /dev/sdd7
4 8 71 4 active sync /dev/sde7
sA2-AT8:/home/miroa #
I thought that "Not Started" could be some indication...
As no mke2fs /dev/md3 works on it.
sA2-AT8:/home/miroa # mke2fs /dev/md3
mke2fs 1.39 (29-May-2006)
mke2fs: Device size reported to be zero. Invalid partition specified, or
partition table wasn't reread after running fdisk, due to
a modified partition being busy and in use. You may need to reboot
to re-read your partition table.
But I checked the sizes, they're ok.
sA2-AT8:/home/miroa # cat /sys/block/md3/md/dev-sd?7/size
34700288
34700288
34700288
34700288
34700288
Or maybe other kind sizes are in question here? Yes!!!
This is wrong. The raid in top has the size the same as the components:
34700288
It should read:
138801152 (which is 4x),
similarly as this one in the same box of mine:
sA2-AT8:/home/miroa # mdadm -D /dev/md2
/dev/md2:
Version : 00.90.03
Creation Time : Thu Mar 22 20:23:00 2007
Raid Level : raid5

Array Size : 606227968 (578.14 GiB 620.78 GB)
Device Size : 151556992 (144.54 GiB 155.19 GB

I have just now, while writing this, understood what is very wrong with my
/dev/md3.
Earlier I fiddled with issueing
echo "check" > /sys/block/md3/md/array_state (that got it rebuilding itself)
and
echo "idle" > /sys/block/md3/md/sync_action (that did nothing in my case)
because I wanted to stop or run the array (none could be done -how could it?)
My question is:
How do I fix this?
Suse 10.2, updated regularly online. Arc x86_64.
I even tried:
mdadm --zero-superblock /dev/sdN7
and deleted all partitions /dev/sd[a-e]7 and recreated them, and even tried
formating them in desperation... (useless the latter, as raid info resides in
those arcane things called superblocks, not in mundane user-space regular disk
blocks...
Getting at my wits' end...
Don't even want to go to sleep till I try some more to bring this raid to
obedience. Argh!
Anyone has ideas on this?
myrons41 is offline     Reply With Quote
Old 03-22-2007, 08:03 PM   #7
myrons41
LQ Newbie
 
Registered: Nov 2002
Location: Zagreb, Croatia
Distribution: Suse 10.2
Posts: 8
Thanked: 0
Well, I decided to reboot (as if I didn't too many times already in the last hours) just to make sure. And what was:
> Raid Level : raid5
> Device Size : 34700288 (33.09 GiB 35.53 GB)
is now:
Raid Level : raid5
Array Size : 138801152 (132.37 GiB 142.13 GB)
Device Size : 34700288 (33.09 GiB 35.53 GB)
And the mke2fs did its job just fine as well...
myrons41 is offline     Reply With Quote
Old 03-23-2007, 02:58 AM   #8
bnuytten
LQ Newbie
 
Registered: Dec 2006
Posts: 2
Thanked: 0
Quote:
Originally Posted by myrons41
Well, I decided to reboot (as if I didn't too many times already in the last hours) just to make sure. And what was:
> Raid Level : raid5
> Device Size : 34700288 (33.09 GiB 35.53 GB)
is now:
Raid Level : raid5
Array Size : 138801152 (132.37 GiB 142.13 GB)
Device Size : 34700288 (33.09 GiB 35.53 GB)
And the mke2fs did its job just fine as well...

From your starting point, a clean raid5 array, I would have advised to just try to start the array. If this is succesfull, you should get a output that looks like this:
Code:
mdadm --run /dev/md3
mdadm: /dev/md3 has been started with 5 drives.
Since you created the RAID5 array just yesterday and you ordered a "check", I assume it was still rebuilding it's parity data when you issued the command:
Quote:
echo "idle" > /sys/block/md3/md/sync_action (that did nothing in my case)
It probably did do something. You first instructed the array to start checking/rebuilding and then you said to the array: stop synchronizing/rebuilding your disks which left the array "assembled", but not fully "started".

The device size was reported zero by the mkfs utility probably because the array was in this half stopped, half started state. Rebooting the machine causes your RAID devices to be stopped on shutdown (mdadm --stop /dev/md3) and restarted on startup (mdadm --assemble /dev/md3 /dev/sd[a-e]7).
bnuytten is offline     Reply With Quote
Old 03-23-2007, 05:40 AM   #9
myrons41
LQ Newbie
 
Registered: Nov 2002
Location: Zagreb, Croatia
Distribution: Suse 10.2
Posts: 8
Thanked: 0
Quote:
Originally Posted by bnuytten
Since you created the RAID5 array just yesterday and you ordered a "check", I assume it was still rebuilding ...
No. I wouldn't do such a thing.
The problem was altogether different than your suggestions.
Take another look at those size reports of mine,
how could it run or do anything really?
Otherwise, why delve into it, when it's solved?
Unless someone else reading this got similar issue as mine was, and is coping with it at this time.

Last edited by myrons41; 03-23-2007 at 05:42 AM..
myrons41 is offline     Reply With Quote
Old 04-08-2008, 08:13 PM   #10
linux1windows0
LQ Newbie
 
Registered: Apr 2008
Posts: 1
Thanked: 0
Smile

Thanks to C Wilson for the following insight

[root@ornery ~]# cat /sys/block/md0/md/array_state
inactive
[root@ornery ~]# echo "clean" > /sys/block/md0/md/array_state
[root@ornery ~]# cat /sys/block/md0/md/array_state
clean


Once this was done I was able to use mdadm /dev/md0 -a /dev/sdc2 which was the drive that was corrupted and the system rebuilt itself as it should. It has been 2 days and I cannot detect any issues with the original fault. As far as I can tell there was a power interruption which resulted in the storage of some sort of faulty data which prevented the autorebuild of the array using the commonly recommended mdadm -A --force /dev/md0 even when specifying all drives. It always resulted in the same
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error. My array is comprised of 3 drives 1 of which was kicked out due to, I believe, a power supply issue, which ultimately appears to have been related to massive buildup of dust I am embarrassed to say. The drive was good as evidenced by the boot up state and the disk SMART evaluation, and the recent few days. Once using C Wilsons method above all was repaired.

Thanks again. Much nicer to recover the array than lose it all as has been my experience with Windows on several occasions.
linux1windows0 is offline     Reply With Quote

Reply

Bookmarks


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
RAID5 using mdadm, how to mount /dev/md0? ggduff Linux - Software 4 11-14-2007 01:59 AM
Growing RAID5 with mdadm not working in 2.6.17? Fredde87 Linux - Software 1 08-24-2006 03:45 AM
grow a raid5 with mdadm (kernel 2.6.17) best practice? DD.Jarod Linux - Software 1 08-14-2006 11:24 AM
LXer: Let's Get Ron Gilbert on Our Side LXer Syndicated Linux News 0 07-07-2006 02:54 AM
raid5 rebuild JVWay Linux - General 3 09-20-2005 01:11 PM


All times are GMT -5. The time now is 07:47 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
RSS2  LQ Podcast
RSS2  LQ Radio
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration