LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   RAID 5 --ADD Not completing at 100% (http://www.linuxquestions.org/questions/linux-server-73/raid-5-add-not-completing-at-100-a-757151/)

horde 09-23-2009 12:11 AM

RAID 5 --ADD Not completing at 100%
 
Hi All,

I had a disk fail on a raid array and rebuilt it thus:

I unmounted the filesystem and then issued:

mdadm /dev/md1 --fail /dev/sdf1
mdadm /dev/md1 --remove /dev/sdf1
mdadm /dev/md1 --add /dev/sdf1

And then monitored it using mdadm --detail and /proc/mdstat

> mdadm --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Sat Nov 22 13:42:02 2008
Raid Level : raid5
Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Wed Sep 23 13:45:02 2009
State : active, degraded, recovering
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 128K

Rebuild Status : 91% complete

UUID : 0f2561a9:81e8cd4a:10e4ddb4:49a3b4a7
Events : 0.693864

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
5 8 81 3 spare rebuilding /dev/sdf1
4 8 33 4 active sync /dev/sdc1

> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sdf1[5] sdb1[0] sdc1[4] sde1[2] sdd1[1]
3907039744 blocks level 5, 128k chunk, algorithm 2 [5/4] [UUU_U]
[===================>.] recovery = 99.9% (488036992/488379968) finish=0.0min speed=56148K/sec
bitmap: 10/233 pages [40KB], 2048KB chunk

unused devices: <none>
> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sdf1[5] sdb1[0] sdc1[4] sde1[2] sdd1[1]
3907039744 blocks level 5, 128k chunk, algorithm 2 [5/4] [UUU_U]
[====================>] recovery =100.0% (488392576/488379968) finish=2802575022411.7min speed=54850K/sec
bitmap: 10/233 pages [40KB], 2048KB chunk

unused devices: <none>
> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sdf1[5] sdb1[0] sdc1[4] sde1[2] sdd1[1]
3907039744 blocks level 5, 128k chunk, algorithm 2 [5/4] [UUU_U]
[====================>] recovery =100.1% (489231232/488379968) finish=2812115125554.9min speed=54664K/sec
bitmap: 10/233 pages [40KB], 2048KB chunk

> mdadm --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Sat Nov 22 13:42:02 2008
Raid Level : raid5
Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Wed Sep 23 14:00:02 2009
State : active, degraded
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 128K

UUID : 0f2561a9:81e8cd4a:10e4ddb4:49a3b4a7
Events : 0.693868

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
5 8 81 3 spare rebuilding /dev/sdf1
4 8 33 4 active sync /dev/sdc1


It is no longer recovering. How do I get the spare activated?

horde 09-24-2009 06:43 AM

It has finally recovered. All disks are now active - it took about 3hrs after reachine 100% (whcih it reached after 2 hrs) to eventually marlk all as active.

Originally it was active, degraded, recovering

after about 2 hrs it moved to active, degraded

and then after another 3 hours it moved to active

The raid-5 array was 5 X 1Tb disks so working with 4Gb of data

Hope this information helps someone in the future

chrism01 09-24-2009 09:16 PM

That's the prob with large disks now avail on std PCs; the disk & backplane HW isn't up to the job really.
Fast HW costs more....

horde 10-01-2009 04:06 AM

and to be honest is the trade off we make - speed vs cash - in this cash I was willing to wait the time - it just would have been nice if /proc/mdstat showed me the correct expected time and not finish=2812115125554.9min (which by my estimates is about 50K years - a bit too long for me to wait)


All times are GMT -5. The time now is 09:36 PM.