LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   RAID 5 mdadm --grow interrupted - what to do next? (http://www.linuxquestions.org/questions/linux-server-73/raid-5-mdadm-grow-interrupted-what-to-do-next-602671/)

damiendusha 11-27-2007 02:28 AM

RAID 5 mdadm --grow interrupted - what to do next?
 
Hello all,

(Note to moderators... if this is better in another forum, feel free to move it)

I have recently tried adding to my RAID 5 array and I was a little careless and shutdown the system before mdadm --grow operation was complete.

The existing setup was 3x500GB HDDs setup as RAID 5, and the operation was to add another 500BG HDD. There is another 200GB PATA drive that has Fedora 7 (x86-64, running a compiled vanilla 2.6.23.1 kernel).

I used the following commands to attempt to add to the raid array:

mdadm --add /dev/md0 /dev/sde1
mdadm --grow /dev/md0 --raid-devices=4

At which point shortly after, I did a very stupid stupid thing and shut the machine off before heading out of the house (habit...).

When I next tried to boot (as runlevel 1 - I was following instructions http://scotgate.org/?p=107), it spat up and could not mount due to a bad superblock (the raid array is mounted as /home).

I am taking comfort in the mdadm man page that says: "Increasing the number of active devices in a RAID5 is much more effort. Every block in the array will need to be read and written back to a new location. From 2.6.17, the Linux Kernel is able to do this safely, including restart and interrupted "reshape"."

So now, I am wondering what to do next. What information do I need to provide, or is there a magic recovery sequence of commands that I should enter?

Thank you all for your patience.
Damien.

damiendusha 11-27-2007 07:08 AM

If it helps anyone, this is what comes up with...

-----------
# cat /proc/mdstat
personalities
unused devices: <none>


-----------------
# mdadm --detail /dev/md0

/dev/md0 appears not to be active.

------------------

I have done this under runlevel 1, because I was (ok, still am) paranoid about it. I am quite happy to take advice as to what my next steps should be.

Cheers
Damien.

damiendusha 11-28-2007 01:11 AM

OK, I am slowly learning, and hopefully I'll be able to provide enough info here:

My existing hard drives in the array are:
/dev/sdb1
/dev/sdc1
/dev/sdd1

And the operation is to add to the array:
/dev/sde1

The raid array /dev/md0 was mounted as /home in fstab. In order to avoid further problems, I booted with a fedora live cd and commented that line out of fstab to prevent it from trying to mount it.

Now for the interesting bits:

[root@localhost ~]# cat /proc/mdstat
Personalities :
unused devices: <none>
------------------------------------------------

[root@localhost ~]# mdadm --detail --verbose /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
[root@localhost ~]#

-------------------------------------------------
[root@localhost ~]# cat /etc/mdadm.conf

# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid5 num-devices=3 uuid=cc733ea9:351a5757:130997a6:4f6655ac

-------------------------------------
Now, let's examine each of the devices:
[root@localhost ~]# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.91.00
UUID : cc733ea9:351a5757:130997a6:4f6655ac
Creation Time : Sun Jun 24 17:42:55 2007
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Reshape pos'n : 442368 (432.07 MiB 452.98 MB)
Delta Devices : 1 (3->4)

Update Time : Tue Nov 27 17:42:14 2007
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : b81e656f - correct
Events : 0.132

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1


[root@localhost ~]# mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 00.91.00
UUID : cc733ea9:351a5757:130997a6:4f6655ac
Creation Time : Sun Jun 24 17:42:55 2007
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Reshape pos'n : 442368 (432.07 MiB 452.98 MB)
Delta Devices : 1 (3->4)

Update Time : Tue Nov 27 17:42:14 2007
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : b81e6581 - correct
Events : 0.132

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1




[root@localhost ~]# mdadm --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 00.91.00
UUID : cc733ea9:351a5757:130997a6:4f6655ac
Creation Time : Sun Jun 24 17:42:55 2007
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Reshape pos'n : 442368 (432.07 MiB 452.98 MB)
Delta Devices : 1 (3->4)

Update Time : Tue Nov 27 17:42:14 2007
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : b81e6593 - correct
Events : 0.132

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 2 8 49 2 active sync /dev/sdd1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1


[root@localhost ~]# mdadm --examine /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 00.91.00
UUID : cc733ea9:351a5757:130997a6:4f6655ac
Creation Time : Sun Jun 24 17:42:55 2007
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Reshape pos'n : 442368 (432.07 MiB 452.98 MB)
Delta Devices : 1 (3->4)

Update Time : Tue Nov 27 17:42:14 2007
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : b81e65a5 - correct
Events : 0.132

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1


------------------------------------------------

I'm not sure what I need to do to kick-start the process and make it all good again. Is anyone able to provide some pointers?

Cheers
Damien.

damiendusha 12-07-2007 06:33 AM

It's been a long time coming, but I managed to work out the answer.

The RAID array was intact, but needs to be assembled:

mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

And then
cat /proc/mdstat
will give the status of it building (it took my array 4 days!)

But growing the array does not update the /etc/mdadm.conf file. To update:
mdadm --detail --scan >> /etc/mdadm.conf

and then
vi /etc/mdadm.conf

to get rid of the old entry.

Now, you should be set to grow the file system on the array. It wasn't pretty, or fun, but I got there in the end.

zac_haryy 07-27-2008 09:59 PM

I was wondering if you had data on your existing array and if after you got it to start assembling if you lost your data or not? I am having the same problem that you are but I can't loose my data :(

damiendusha 07-30-2008 01:47 AM

Quote:

Originally Posted by zac_haryy (Post 3228165)
I was wondering if you had data on your existing array and if after you got it to start assembling if you lost your data or not? I am having the same problem that you are but I can't loose my data :(

After 4 days of stressing, the data turned out intact. It apparently a "safe" process, but that didn't help my nerves :)

zac_haryy 07-30-2008 08:00 AM

What version of mdadm do you have installed? I have read that since v2.6.4 if a grow process is interrupted it can be restarted. So I just want to clarify your version of mdadm. To find out just put "mdadm -V" (I believe). Thank you!

EDIT: I just read your original post and noticed that you said something about v2.6.17. Is this the version that you had on there before you started the grow process? Right now I have v2.6.2 and not really sure how to update it easy.

-haryy

damiendusha 08-04-2008 07:25 PM

Quote:

Originally Posted by zac_haryy (Post 3230907)
What version of mdadm do you have installed? I have read that since v2.6.4 if a grow process is interrupted it can be restarted. So I just want to clarify your version of mdadm. To find out just put "mdadm -V" (I believe). Thank you!

EDIT: I just read your original post and noticed that you said something about v2.6.17. Is this the version that you had on there before you started the grow process? Right now I have v2.6.2 and not really sure how to update it easy.

-haryy

At the time, I was running Fedora 7 (x86-64, running a compiled vanilla 2.6.23.1 kernel).

Since then, I have swapped over to ubuntu 7.10, configured and remounted everything, and have since upgraded again to 8.04

Anyway, the version I currently have is:
damien@mediabox:~$ mdadm -V
mdadm - v2.6.3 - 20th August 2007

Probably the only thing I can suggest (and I suspect you'll want to get other opinions before doing this) is move the HDDs to a machine with a more modern kernel (or compile a shiny new kernel for your machine), reassemble the array and hope that it all comes out in the wash.

I wish I could help more, but this is pretty much all the experience I have.

damiendusha 08-04-2008 07:27 PM

By the way, have you checked out the RAID wiki?

http://linux-raid.osdl.org/index.php/Main_Page

Not sure what's in there but it may help.

panfist 03-18-2010 09:02 PM

Quote:

Originally Posted by damiendusha (Post 2983119)
The RAID array was intact, but needs to be assembled:

mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

I have almost the same problem as the OP. I started growing an array, but noticed that in over an hour the speed of the op never increased over 0K/s, so I tried to reboot. Now, when I reassemble my array, it segfaults.

I'm using kernel 2.6.31-14-generic and mdadm v2.6.7.1.

For an hour after starting the grow command, the speed of the operation was 0K/s the entire time. I thought something must be wrong, and the best course of action would be to reboot and start over from a clean boot. I'm not a linux expert so I thought that if I rebooted everything would try to exit gracefully.

After one hour, the system still had not finished shutting down. I then did alt-sysreq RSEISUB waiting over one minute between each command. I've included the syslog of what happened up until the next start up, but put it last because it's by far the longest.

When I rebooted, the array seemed to be up, but mounting it resulted in a bad FS type error, even when I tried to specify it (ext4). After stopping the inactive array, and trying to reassemble it, mdadm crashed to segmentation fault.
Is it possible to recover the data? We have backups, but they're spread out over 1500 DVDs.*

When I examine the drives, the output looks pretty much like this for each drive (6 drives say active and 4 say clean, corresponding to the 6 original and 4 added drives):
$ mdadm --examine /dev/sda
/dev/sda:
Magic : a92b4efc
Version : 00.91.00
UUID : 56c16545:07db76d6:e368bf24:bd0fce41
Creation Time : Tue Feb2 09:58:58 2010
Raid Level : raid5
Used Dev Size : 976762368 (931.51 GiB 1000.20 GB)
Array Size : 8790861312 (8383.62 GiB 9001.84 GB)
Raid Devices : 10
Total Devices : 10
Preferred Minor : 0
Reshape pos'n : 0
Delta Devices : 4 (6->10)
Update Time : Thu Mar 18 23:33:40 2010
State : active
Active Devices : 10
Working Devices : 10
Failed Devices : 0
Spare Devices : 0
Checksum : 79904299 - correct
Events : 270611
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 4 804active sync /dev/sda
0 0 8 160active sync /dev/sdb
1 1 8 961active sync /dev/sdg
2 2 81122active sync /dev/sdh
3 3 8 483active sync /dev/sdd
4 4 804active sync /dev/sda
5 5 8 325active sync /dev/sdc
6 6 81606active sync /dev/sdk
7 7 81447active sync /dev/sdj
8 8 81288active sync /dev/sdi
9 9 8 809active sync /dev/sdf


All times are GMT -5. The time now is 07:30 AM.