LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 09-12-2007, 08:29 AM   #1
somebox
LQ Newbie
 
Registered: Sep 2007
Posts: 3

Rep: Reputation: 0
Unhappy Recovering a Raid 5 array, mdadm mess-up


I've been searching through this site for some raid answers, but found nothing specific to my problem. This is my first post, so here goes

I have a debian Etch server, and my /home partition is set up as a RAID 5 array, with four SATA 250GB disks (750GB total). I recently returned from vacation and found that the machine was locked. After rebooting, /home did not mount. Here's what showed in syslog:

Code:
Sep 12 05:18:42 workshop kernel: md: bind<sdb1>
Sep 12 05:18:42 workshop kernel: md: bind<sda1>
Sep 12 05:18:42 workshop kernel: md: bind<sdd1>
Sep 12 05:18:42 workshop kernel: md: bind<sdc1>
Sep 12 05:18:42 workshop kernel: md: kicking non-fresh sda1 from array!
Sep 12 05:18:42 workshop kernel: md: unbind<sda1>
Sep 12 05:18:42 workshop kernel: md: export_rdev(sda1)
Sep 12 05:18:42 workshop kernel: md: kicking non-fresh sdb1 from array!
Sep 12 05:18:42 workshop kernel: md: unbind<sdb1>
Sep 12 05:18:42 workshop kernel: md: export_rdev(sdb1)
Sep 12 05:18:42 workshop kernel: md: md0: raid array is not clean -- starting backgro
und reconstruction
Sep 12 05:18:42 workshop kernel: raid5: device sdc1 operational as raid disk 2
Sep 12 05:18:42 workshop kernel: raid5: device sdd1 operational as raid disk 3
Sep 12 05:18:42 workshop kernel: raid5: not enough operational devices for md0 (2/4 f
ailed)
Sep 12 05:18:42 workshop kernel: RAID5 conf printout:
Sep 12 05:18:42 workshop kernel:  --- rd:4 wd:2 fd:2
Sep 12 05:18:42 workshop kernel:  disk 2, o:1, dev:sdc1
Sep 12 05:18:42 workshop kernel:  disk 3, o:1, dev:sdd1
Sep 12 05:18:42 workshop kernel: raid5: failed to run raid set md0
Sep 12 05:18:42 workshop kernel: md: pers->run() failed ...
Sep 12 05:18:42 workshop kernel: Attempting manual resume
Sep 12 05:18:42 workshop kernel: EXT3-fs: INFO: recovery required on readonly filesys
tem.
Sep 12 05:18:42 workshop kernel: EXT3-fs: write access will be enabled during recover
y.
So, it seemed that two out of the four disks were failed. I was hoping that the drives overheated, perhaps the machine was not cleanly rebooted, etc. Two drives out of four drive raid5 set is not good.

I captured the output of mdadm --examine for all the disks:

Code:
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
  Creation Time : Sat Apr 22 22:55:01 2006
     Raid Level : raid5
    Device Size : 244195904 (232.88 GiB 250.06 GB)
     Array Size : 732587712 (698.65 GiB 750.17 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Sep  3 13:00:35 2007
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : e679baca - correct
         Events : 0.2488136

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8        1        1      active sync   /dev/sda1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8        1        1      active sync   /dev/sda1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
  Creation Time : Sat Apr 22 22:55:01 2006
     Raid Level : raid5
    Device Size : 244195904 (232.88 GiB 250.06 GB)
     Array Size : 732587712 (698.65 GiB 750.17 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Sep  3 13:00:35 2007
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : e679bad8 - correct
         Events : 0.2488136

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8        1        1      active sync   /dev/sda1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
  Creation Time : Sat Apr 22 22:55:01 2006
     Raid Level : raid5
    Device Size : 244195904 (232.88 GiB 250.06 GB)
     Array Size : 732587712 (698.65 GiB 750.17 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Sep  3 13:02:51 2007
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : e653c444 - correct
         Events : 0.2488139

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 43e20969:a2d1e5ba:94f7c737:27a0793c
  Creation Time : Sat Apr 22 22:55:01 2006
     Raid Level : raid5
    Device Size : 244195904 (232.88 GiB 250.06 GB)
     Array Size : 732587712 (698.65 GiB 750.17 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Mon Sep  3 13:02:51 2007
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : e653c456 - correct
         Events : 0.2488139

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       49        3      active sync   /dev/sdd1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
Notice that the different disks had a different idea of what the state of the array was. I hoped that at worst, there was only one faulty disk.

I decided from the above output that I should try to reassemble the array. In the past, mdadm was pretty smart about trying to resync the disks. However, I made a big mistake. I typed the following command:

Code:
# mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sd[a-d]
So, madam took a long time to rebuild the array, and then I could not mount it. I tried to reboot, no help. Here's the error from mount:

Code:
# mount /home
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
Looking at /proc/mdstat:

Code:
# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sda[0] sdd[3] sdc[2] sdb[1]
      732595392 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>
In horror, I realized that mdadm had built the array using the whole disks, instead of partitions. I wanted /dev/sda1, /dev/sdb1, etc ... NOT /dev/sda, /dev/sdb, etc!

Here's where I get really confused. If I look at the disks with fdisk, the partitions are still there, but two of them are just regular linux partitions (not raid autodetect):

Code:
 $ fdisk -l /dev/sda

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       30401   244196001   fd  Linux raid autodetect

 $ fdisk -l /dev/sdb

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       30401   244196032   83  Linux

$ fdisk -l /dev/sdc

Disk /dev/sdc: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       30401   244196001   fd  Linux raid autodetect

$ fdisk -l /dev/sdd

Disk /dev/sdd: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       30401   244196032   83  Linux
But it gets even stranger... I no longer see the partitions in /dev:

Code:
$ ls /dev/sd*
/dev/sda  /dev/sdb  /dev/sdc  /dev/sdd
And when I try to assemble the array now, mdadm can't find those old partitions:

Code:
$ mdadm --assemble /dev/md0 --verbose /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sda1: No such file or directory
mdadm: /dev/sda1 has no superblock - assembly aborted
So, I'm in a real bind. I don't know if my data is still on the drives (and of course, I REALLY want to recover it, only some of it is backed up). I can't see the old partitions on the drives, despite the fact that fdisk does see something.

Is it possible that my mdadm --create command wiped my disks somehow? I though mdadm was careful to check for existing raid partitions!

Any help would be greatly appreciated!
 
Old 09-12-2007, 11:27 AM   #2
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 326Reputation: 326Reputation: 326Reputation: 326
By running a create on an existing array, you've destroyed the superblocks; mdadm warns you about the existing contents of the drives when you run the command. Restore whatever data you have from backup.

This has been said many times, but it bears repeating: RAID is not a substitute for backup. It's intended to increase uptime (data availability), and does not provide data archiving.
 
Old 09-12-2007, 11:52 AM   #3
somebox
LQ Newbie
 
Registered: Sep 2007
Posts: 3

Original Poster
Rep: Reputation: 0
Question Oh Crap

Wow, this sucks. The thing is, I did not get any warning, because I specified the wrong partitions (eg, /dev/sda instead of /dev/sda1). Is there really no way to reconstruct this array now? I can see raid partitions at /dev/sd[a-d]1 -- but I can't access them as the are not in /dev ... can anyone suggest something to try?
 
Old 10-17-2007, 08:32 AM   #4
fnaaijkens
LQ Newbie
 
Registered: Oct 2007
Posts: 1

Rep: Reputation: 0
the power of mdadm

I did something like that once.
I just created new reiserfs on the raid discs.
Then I rebuild everything by reiserfsck --rebuild-tree --scan-whole-partition.

I recovered almost 100% of the files, and some older versions of them, too.
@500,000 files, a bit confusing, but in combination with a backup (that you restore OVER the recovered, data) your recovery-rate might be pretty good!

f
 
Old 10-17-2007, 06:57 PM   #5
JimBass
Senior Member
 
Registered: Oct 2003
Location: New York City
Distribution: Debian Sid 2.6.32
Posts: 2,100

Rep: Reputation: 48
And don't make the mistake of doing software RAID on something that is important. A hardware RAID card will cost about $300. I've had almost the identical setup as you, 4 250 GB satas in RAID 5, but with a 3com controller card running it. Obviously it isn't the RAIDs fault that you gave a bad command, but it always seems if you care about your data, its worth the additional cost.

Peace,
JimBass
 
  


Reply

Tags
raid


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
mdadm: re-assembling raid array on new installation hamish Linux - Server 3 06-10-2007 02:23 PM
mdadm (get my array working again) dadriver Linux - Software 1 01-20-2006 06:56 PM
using mdadm to grow an existing RAID 1 array kevburger Linux - Hardware 1 08-18-2005 08:17 AM
recovering from a dead raid 5 array aaronj Linux - Software 2 06-03-2005 04:13 AM
Recovering RAID-5 array after OS crash IMNOboist Linux - Hardware 0 12-14-2004 12:04 AM


All times are GMT -5. The time now is 07:11 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration