Ok, this is going to be a long one (lots of logs and stuff). Thanks in advance for reading it, and possibly helping me save my data.
I am installing new disks into my system, and wanted to set them up using RAID.
Here is my old config:
Quote:
/dev/hda: 200GB
/dev/hda1: 20GB (not used)
/dev/hda2: 180GB (mostly full - non essential data)
/dev/hdb: 30GB
/dev/hdb1: 20GB ( root filesystem, mostly full)
/dev/hdb2: 2GB swap
/dev/hdb3: used to be windows install, not used anymore
|
-------------
Then i installed 2 new 200GB SATA drives. My new config would be:
Quote:
/dev/hda: 200GB
/dev/hda1: 20GB
/dev/hda2: 180GB
/dev/sda: 200GB
/dev/sda1: 180GB
/dev/sda2: 20GB
/dev/sdb: 200GB
/dev/sdb1: 180GB
/dev/sdb2: 20GB
/dev/md0: RAID 1 20GB (extra safe...I can lose 2 of the 3 drives and still boot up ok)
/dev/sda2
/dev/sdb2
/dev/hda1
/dev/md1: RAID5 360GB (I can lose 1 of the 3 drives, and still have all my data, plus it's combined into a single large drive)
/dev/sda1
/dev/sdb1
/dev/hda2
|
-----------------
Now, in order to migrate my data, here was my plan:
1. install the two new drives
2. create a RAID5 array, md1 from sda1 & sdb1, and 'missing' as the third drive (so that it runs in degraded mode)
3. copy all my data from /dev/hda2 to /dev/md1
4. add /dev/hda2 to /dev/md1, and have it resync the parity
Now this is where I got stuck (haven't gotten to md0 yet). The data seemed to copy fine, and then I unmounted hda2, and then added it to the md1 array. It started resyncing, and got to maybe 10% fine. But after a while, something odd happened. If I cat /proc/mdstat, it would cycle between "Resyncing...0%", then the next time it would say "Resync=DELAYED" then the next time, it wouldn't show it resyncing at all.
And it started generating HUGE amounts of logs in /var/log/syslog and /var/log/messages, about 350 MB before i stopped sysklog & klog (because my disk is already almost full). Here is the end of the log output:
Code:
Dec 4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec 4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec 4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec 4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec 4 23:22:33 drorex kernel: md: md1: sync done.
Dec 4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec 4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec 4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec 4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec 4 23:22:33 drorex kernel: md: md1: sync done.
Dec 4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec 4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec 4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec 4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec 4 23:22:33 drorex kernel: md: md1: sync done.
Dec 4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec 4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec 4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec 4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec 4 23:22:33 drorex kernel: md: md1: sync done.
Dec 4 23:22:33 drorex exiting on signal 15
It's basically the same thing over and over, repeating.
So then I stopped the array, and tried restarting it, but now it says my disks are failed or something?
Code:
root@drorex:~# mdadm --assemble /dev/md1 /dev/sda1 /dev/sdb1 /dev/hda2
mdadm: /dev/md1 assembled from 1 drive and 1 spare - not enough to start the array.
root@drorex:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : inactive sda1[0] hda2[3] sdb1[1]
527373440 blocks
unused devices: <none>
Here is the output of mdadm --examine for all of the partitions in md1:
Code:
/dev/sda1:
Magic : a92b4efc
Version : 00.90.01
UUID : aef026ea:a658dd3d:d83036ce:4ad342a2
Creation Time : Sun Dec 4 18:27:00 2005
Raid Level : raid5
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Sun Dec 4 23:24:16 2005
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 3
Spare Devices : 1
Checksum : b33aa022 - correct
Events : 0.1049267
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 3 2 2 spare /dev/hda2
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.01
UUID : aef026ea:a658dd3d:d83036ce:4ad342a2
Creation Time : Sun Dec 4 18:27:00 2005
Raid Level : raid5
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Sun Dec 4 22:57:33 2005
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : b31b8bf7 - correct
Events : 0.31676
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 3 2 2 spare /dev/hda2
/dev/hda2:
Magic : a92b4efc
Version : 00.90.01
UUID : aef026ea:a658dd3d:d83036ce:4ad342a2
Creation Time : Sun Dec 4 18:27:00 2005
Raid Level : raid5
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Sun Dec 4 23:24:16 2005
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 3
Spare Devices : 1
Checksum : b33aa01f - correct
Events : 0.1049267
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 3 2 3 spare /dev/hda2
0 0 8 1 0 active sync /dev/sda1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 3 2 3 spare /dev/hda2
It looks like something weird is going on, with the different reports of 'faulty' and 'spare'.
Again, the raid array was initially created with /dev/sda1 & /dev/sdb1, then /dev/hda2 was added to it.
Is there someway I can reset the flags so they aren't marked as 'failed'?
I'd really like a way to do this without losing my data, I know it's all in there somewhere.
Thanks again for anyone who can help, I had a bunch of stuff I don't want to lose in there