LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 04-30-2012, 09:08 AM   #1
hakon.gislason
LQ Newbie
 
Registered: Apr 2012
Posts: 2

Rep: Reputation: Disabled
Failed drive while converting raid5 to raid6, then a hard reboot


Hello,
I've been having frequent drive "failures", as in, they are reported failed/bad and mdadm sends me an email telling me things went wrong, etc... but after a reboot or two, they are perfectly fine again. I'm not sure what it is, but this server is quite new and I think there might be more behind it, bad memory or the motherboard (I've been having other issues as well). I've had 4 drive "failures" in this month, all different drives except for one, which "failed" twice, and all have been fixed with a reboot or rebuild (all drives reported bad by mdadm passed an extensive SMART test).
Due to this, I decided to convert my raid5 array to a raid6 array while I find the root cause of the problem.

I started the conversion right after a drive failure & rebuild, but as it had converted/reshaped aprox. 4%(if I remember correctly, and it was going really slowly, ~7500 minutes to completion), it reported another drive bad, and the conversion to raid6 stopped (it said "rebuilding", but the speed was 0K/sec and the time left was a few million minutes.
After that happened, I tried to stop the array and reboot the server, as I had done previously to get the reportedly "bad" drive working again, but It wouldn't stop the array or reboot, neither could I unmount it, it just hung whenever I tried to do something with /dev/md0. After trying to reboot a few times, I just killed the power and re-started it. Admittedly this was probably not the best thing I could have done at that point.

I have backup of ca. 80% of the data on there, it's been a month since the last complete backup (because I ran out of backup disk space).

So, the big question, can the array be activated, and can it complete the conversion to raid6? And will I get my data back?
I hope the data can be rescued, and any help I can get would be much appreciated!

I'm fairly new to raid in general, and have been using mdadm for about a month now.
Here's some data:

Code:
root@axiom:~# mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.2 UUID=cfedbfc1:feaee982:4e92ccf4:45e08ed1 name=axiom.is:0 

root@axiom:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdc[6] sde[7] sdb[5] sda[4]
      7814054240 blocks super 1.2

root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
mdadm: /dev/md0 is already in use.

root@axiom:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0

root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
mdadm: Failed to restore critical section for reshape, sorry.
      Possibly you needed to specify the --backup-file

root@axiom:~# mdadm --assemble --scan --force --run /dev/md0 --backup-file=/root/mdadm-backup-file
mdadm: Failed to restore critical section for reshape, sorry.

root@axiom:~# fdisk -l | grep 2000
Disk /dev/sda doesn't contain a valid partition table
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes

root@axiom:~# mdadm --examine /dev/sd{a,b,c,e,f}
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0   (local to host axiom.is )
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : b11a7424:fc470ea7:51ba6ea0:158c0ce6

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : 76ecd244 - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 3
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x6
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0   (local to host axiom.is )
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
Recovery Offset : 161546240 sectors
          State : active
    Device UUID : 8389f39f:cc7fa027:f10cf717:1d41d40b

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : 19ef8090 - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 4
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0   (local to host axiom.is )
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b2cec17f:e526b42e:9e69e46b:23be5163

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : a29b468a - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 1
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0   (local to host axiom.is )
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 21c799cd:58be3156:6830865b:fa984134

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : d882780e - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 2
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0   (local to host axiom.is )
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 8b043488:8379f327:5f00e0fe:6a1e0bee

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sat Apr 28 22:57:36 2012
       Checksum : c122639f - correct
         Events : 138241

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)
 
Old 05-05-2012, 11:35 AM   #2
hakon.gislason
LQ Newbie
 
Registered: Apr 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
Unhappy

Over 225 views and nobody can help me?
I'd really appreciate help in getting this array online again.
 
Old 05-05-2012, 12:56 PM   #3
lithos
Senior Member
 
Registered: Jan 2010
Location: SI : 45.9531, 15.4894
Distribution: CentOS, OpenNA/Trustix, testing desktop openSuse 12.1 /Cinnamon/KDE4.8
Posts: 1,144

Rep: Reputation: 217Reputation: 217Reputation: 217
Hi,

I'm sorry to read you have trouble with your RAID, but I see you're using a software RAID within Linux, which I don't know and I don't use.

I would like to recommend you in future to use a TRUE Hardware RAID controller which works on a hardware level, not software (in linux).
I don't intend to make any commercial ads or anything like it, just to point you what a server should be using for RAID.

I wish one with mdadm experience will help you out.


good luck

Last edited by lithos; 05-05-2012 at 12:58 PM.
 
Old 09-15-2012, 08:47 AM   #4
arandall
LQ Newbie
 
Registered: Sep 2012
Posts: 1

Rep: Reputation: Disabled
Interesting that you have been experiencing similar issues to me. Your post was a while ago but perhaps this will help someone else out.

One thing that I noticed in your post, that prompted my reply, is that you are not partitioning your drives. Typically one uses a single primary partition on all the raid drives with a partition type of 0xFD (Linux RAID) - option 't' in fdisk.

Now onto the failed drives.

I have noticed in the last few months on one of my set-ups where I do not use partitions on the drives that if a disk changes from a block size of 4096 bytes to 512 bytes (can be seen when you run `blockdev --getbsz /dev/sd?`). The number of blocks reported by `cat /proc/partitions` changes and it directly related to drives being marked as faulty in an array, as expected. Often a reboot, as you describe, would allow me to re-add the drive back in the array and it could go on for weeks before hitting the problem again.

Eg. This is with Seagate 2Tb drives, notice the #blocks are different:

Code:
# cat /proc/partitions # (extract)
major minor  #blocks  name

   8       32 1953513527 sdc
   8       33 1953512001 sdc1
   8       16 1953514584 sdb
   8       17 1953512001 sdb1

# blockdev --getbsz /dev/sdc
512
# blockdev --getbsz /dev/sdb
4096
I never got to the bottom of why the block size changed but as this happened a number of times I changed the set-up to use partitions as mentioned above. With the partitions in place partition size or /dev/sd?1 remains the same size regardless of the block size reported and the RAID is happy.
 
Old 03-07-2019, 04:23 AM   #5
devdol
Member
 
Registered: Dec 2005
Distribution: debian (testing/unstable)
Posts: 68

Rep: Reputation: 17
Appending the "--invalid-backup" option in addition to "--backup-file=..." seems to do the trick.

After rebooting a stuck server while reshaping (RAID5 to RAID6), hence similar situation like OP above and still relevant, we got a somewhat terrifying error message:

Code:
mdadm --stop /dev/md1
mdadm --assemble --force /dev/md1 /dev/sd[abcde]4 --backup-file=/path/to/md1.bak 
"mdadm: Failed to restore critical section for reshape, sorry."
However, the same sequence of instructions with additional "--invalid-backup"
Code:
mdadm --stop /dev/md1
mdadm --assemble --force /dev/md1 /dev/sd[abcde]4 --backup-file=/path/to/md1.bak --invalid-backup
lead to "mdadm: /dev/md1 has been started with 5 drives.

This behaviour was always reproducible for this RAID.

It took us a long time to find this solution, as we thought it was pointless to specify a backup file with a simultaneous statement that it was worthless. So this note may help one or the other )

Last edited by devdol; 03-07-2019 at 04:56 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] RAID5 soft inactive after 1 drive fault and reboot. How to reactivate it? gianfrus Linux - Server 6 03-12-2012 09:14 PM
mdadm - RAID5 to RAID6, Spare won't become Active Fmstrat Linux - General 7 06-21-2011 09:52 PM
Reshaping RAID5 array to RAID6 Ken Emerson Ubuntu 1 05-30-2011 05:30 PM
MDADM RAID5 coversion to RAID6 and drive sizes. kripz Linux - Server 2 12-03-2009 06:33 AM
How to copy ext2fs from failed hard drive to good drive? DogWalker Linux - Hardware 2 08-30-2004 10:52 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration