LinuxQuestions.org - RAID5 Array Recovery after OS upgrade

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - RAID5 Array Recovery after OS upgrade (https://www.linuxquestions.org/questions/linux-server-73/raid5-array-recovery-after-os-upgrade-546549/)

hazmatt20

04-16-2007 03:36 PM

RAID5 Array Recovery after OS upgrade

I need to know if I can fix this or if I should bite the bullet and start reloading my dvd backups. Again. I have 90-95% backed up, but it's about 1.6 TB of data on hundreds of DVDs, so you know how painful reloading is.

I have 6 400 GB SATA drives on a RAID5 array mounted to /home. System is Ubuntu 6.06 Server kernel 2.6.15 with mdadm (don't know the version). I recently installed a new motherboard with more on-board SATA connectors as I was also planning to start adding more drives. The plan was to add a 3 bay enclosure that can hold 5 drives and setup 3 500 GB drives in another RAID5 array now and expand to 5 later. There were two issues with this, both of which I realize now could probably have been resolved if I had simply taken the time to learn how to compile a new kernel. The network drivers weren't loaded when it booted (two on the board), so I had to use a card, and I remember reading that I would need a newer kernel than was available in the apt repository for ubuntu 6.06 to expand an array.

So, instead of compiling a new kernel, I decided to do a fresh install of 6.10 server. During install, there was a some problem with DHCP, and it took me back to the menu. I got it sorted out but didn't realize until it finished that it managed to skip several sections of the installation, including user setup. With no login, I ran the recovery. When it finished, it looked ok. mdadm showed the device as /dev/md0(what it had been), and it mounted fine. If everything else had been fine, it would have been what I wanted, however, the install had missed other stuff besides user accounts, for example, not only were there no apt sources configured, there were no man pages installed.

Another reinstall, all the way through this time. mdadm didn't set it up correctly this time. Here is the current status.

Quote:

# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sun Apr 15 19:41:18 2007
Raid Level : raid5
Array Size : 1953556480 (1863.06 GiB 2000.44 GB)
Device Size : 390711296 (372.61 GiB 400.09 GB)
Raid Devices : 6
Total Devices : 6
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Mon Apr 16 02:17:39 2007
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

UUID : 9db5f426:b7ce1681:eb04cbd7:2b95de32
Events : 0.2

Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
1 8 96 1 active sync /dev/sdg
2 8 80 2 active sync /dev/sdf
3 8 48 3 active sync /dev/sdd
4 8 128 4 active sync /dev/sdi
5 8 112 5 active sync /dev/sdh

Quote:

Disk /dev/md0: 2000.4 GB, 2000441835520 bytes
255 heads, 63 sectors/track, 243206 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/md0p1 1 48641 390708801 fd Linux raid autodetect

The only other thing is that last night, it showed it as degraded and resyncing one drive, and it finished the resync. What should my next step be?

Quakeboy02

04-16-2007 03:57 PM

Quote:

What should my next step be?

I think I would rule out reinstalling the kernel again. :)

Quote:

State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0

What's the problem? The array is clean and there are no failed devices. Granted I'm not an expert on mdadm, but...

hazmatt20

04-16-2007 04:05 PM

Quote:

Originally Posted by Quakeboy02

What's the problem? The array is clean and there are no failed devices. Granted I'm not an expert on mdadm, but...

Heh, sorry. The problem is that I can't mount /dev/md0. fdisk /dev/md0 shows a 400 GB /dev/md0p1 (I dunno).

Quote:

#mount /dev/md0 /mnt/md0
mount: you must specify the filesystem type

#mount /dev/md0p1 /mnt/md0
mount: special device /dev/md0p1 does not exist

Quakeboy02

04-16-2007 04:11 PM

Quote:

Heh, sorry. The problem is that I can't mount /dev/md0. fdisk /dev/md0 shows a 400 GB /dev/md0p1 (I dunno).

Take a look at this thread. dgar is pretty sharp on this stuff, and he haunts the raid posts, too, so maybe he'll chime in and fix you up.

http://www.linuxquestions.org/questi...d.php?t=544557

hazmatt20

04-16-2007 06:08 PM

I looked at it, but it doesn't solve this problem since I want to keep my data. Does help a little with learning about mdadm, though.

Quakeboy02

04-16-2007 07:10 PM

Did you build a new mdadm.conf as a result of this, or did you keep the one that had been on it? You didn't run mdadm -create did you? Where did the current mdadm.conf come from (auto gen or did you make it?) and what are its contents?

rtspitz

04-16-2007 07:49 PM

just a guess...

I've read a german thread about the very same error.
someone had built a raid array with /dev/sda1, /dev/sdb1 and so on.
after a kernel update his raid seemed ok, but mdadm showed /dev/sda, /dev/sdb and so on as members. fdisk -l would still show /dev/sda1, /dev/sdb1 were there.

there was also the discrepancy between device/array size.
his solution was to zero the superblocks on the false mebers /dev/sda, /dev/sdb .... + reboot

Code:

mdadm --zero-superblock /dev/sd[a-e]

https://lists.uni-koeln.de/pipermail...er/011313.html

in case you want to give it a shot, I can translate it in detail

hazmatt20

04-16-2007 10:41 PM

I don't think I ran mdadm --create. The mdadm.conf was auto-generated.

Quote:

DEVICE partitions
ARRAY /dev/md0 level=raid5 num-devices=6 UUID=9db5f426:b7ce1681:eb04cbd7:2b95de32

That German thread is exactly what mine is doing. If you don't mind translating the solution, I'd be grateful. I tried

Quote:

# mdadm --zero-superblock /dev/sd[d-i]
mdadm: Couldn't open /dev/sdd for write - not zeroing
mdadm: Couldn't open /dev/sde for write - not zeroing
mdadm: Couldn't open /dev/sdf for write - not zeroing
mdadm: Couldn't open /dev/sdg for write - not zeroing
mdadm: Couldn't open /dev/sdh for write - not zeroing
mdadm: Couldn't open /dev/sdi for write - not zeroing

(sd[a-c] are the 3 500 GB drives)

Quakeboy02

04-16-2007 11:15 PM

Quote:

The mdadm.conf was auto-generated.

When I was messing with mdadm, it once created several arrays out of just random drives on my system just after installing it. How reassembling it by hand. Have you tried that?

Code:

mdadm --stop /dev/md0

mdadm --assemble /dev/md0 --level=5 /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi

If this works, then you need to create a new mdadm.conf file for it.

hazmatt20

04-16-2007 11:52 PM

Um, so it may be worse now. I stopped the array and assembled it, but it came up the same as before with /dev/md0p1 in fdisk. Stopped it again and tried the zero-superblock because last time I had the command out of order and didn't stop it. It worked. Tried to assemble, and it said,

Quote:

# mdadm --assemble /dev/md0 /dev/sd[e-i]
mdadm: no recogniseable superblock

Rebooted, and it did the same thing. I'll point out that when it booted, it made md0 out of sd[a-c] (the 500's) and md1 with 4 of the 6 400's. After stopping both, assemble gave me the no recogniseable superblock error for both arrays even though I didn't run zero-superblock on the sd[a-c] array.

Quakeboy02

04-16-2007 11:58 PM

Code:

mdadm --assemble /dev/md0 /dev/sd[e-i]

I don't know the consequences of not specifying which type of raid during an assemble operation. Also, is that a typo, or did you really use sd[e-i]? Or did you type something completely different but you're reporting this?

hazmatt20

04-16-2007 11:58 PM

This thread mentions mdadm --assemble --force. Would it be a bad idea to try it?

http://www.issociate.de/board/post/2...lyaborted.html

Quakeboy02

04-16-2007 11:59 PM

First, I'd like to know exactly what you typed.

rtspitz

04-17-2007 06:29 AM

this one will be interesting as well:

http://kev.coolcavemen.com/2007/03/h...d-superblocks/

basically what is discussed here is recovery of a raid5 after zero-ing all the superblocks of the included partitions, seems to work.

I've tested it with vmware and a raid1, killed all superblocks, mdadm would not assemble/start the array. then tried above mentioned --create and lo and behold, I could mount it, no data was lost. mdadm complained about a preexisting filesystem, but I forced it to do its magic and it worked.

hazmatt20

04-17-2007 09:28 AM

Quote:

Originally Posted by Quakeboy02

First, I'd like to know exactly what you typed.

Sorry. We updated at the same time.

Quote:

mdadm: option --level not valid in assemble mode

I tried both of these.

Quote:

# mdadm --assemble /dev/md0 /dev/sd[d-i]
mdadm: no recogniseable superblock
mdadm: /dev/sdd has no superblock - assembly aborted
# mdadm --assemble /dev/md0 /dev/sd[d-i]1
mdadm: cannot open device /dev/sdd1: Device or resource busy
mdadm: /dev/sdd1 has no superblock - assembly aborted

rtspitz

04-17-2007 11:24 AM

Quote:

Originally Posted by hazmatt20

That German thread is exactly what mine is doing. If you don't mind translating the solution, I'd be grateful. I tried

translation of the last part with the "solution":

Code:

at the time I created the raid I must have made a mistake, which showed up right now. 

apparently I had created persistent superblocks on the devices (/dev/sd[a-e]) as well as

on the partitions (/dev/sd[a-e]1). 

after zero-ing the superblocks with "mdadm --zero-superblock /dev/sd[-e]" and rebooting,

the partitions showed up in /proc/partitions again and the raid was operational and could

be mounted without any errors.



this night was not subject to entertainment tax. (:

hazmatt20

04-17-2007 02:01 PM

Well, I decided to give mdadm --create a shot.

Code:

mdadm --create /dev/md0 --verbose --level=5 --raid-devices=6 /dev/sd[d-i]1

mdadm: layout defaults to left-symmetric

mdadm: chunk size defaults to 64K

mdadm: /dev/sdd1 appears to contain an ext2fs file system

    size=1953543680K  mtime=Sun Apr 15 18:40:03 2007

mdadm: /dev/sdd1 appears to be part of a raid array:

    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007

mdadm: /dev/sde1 appears to be part of a raid array:

    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007

mdadm: /dev/sdf1 appears to be part of a raid array:

    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007

mdadm: /dev/sdg1 appears to contain an ext2fs file system

    size=1953543680K  mtime=Sun Apr 15 18:40:02 2007

mdadm: /dev/sdg1 appears to be part of a raid array:

    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007

mdadm: /dev/sdh1 appears to be part of a raid array:

    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007

mdadm: /dev/sdi1 appears to be part of a raid array:

    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007

mdadm: size set to 390708736K

Continue creating array? y

mdadm: array /dev/md0 started.



#cat /proc/mdstat

Personalities : [raid5] [raid4]

md0 : active raid5 sdi1[6] sdh1[4] sdg1[3] sdf1[2] sde1[1] sdd1[0]

      1953543680 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUUU_]

      [>....................]  recovery =  0.1% (419712/390708736) finish=340.8min speed=19077K/sec



# mdadm --detail /dev/md0

/dev/md0:

        Version : 00.90.03

  Creation Time : Tue Apr 17 14:57:02 2007

    Raid Level : raid5

    Array Size : 1953543680 (1863.04 GiB 2000.43 GB)

    Device Size : 390708736 (372.61 GiB 400.09 GB)

  Raid Devices : 6

  Total Devices : 6

Preferred Minor : 0

    Persistence : Superblock is persistent



    Update Time : Tue Apr 17 14:57:41 2007

          State : clean, degraded, recovering

 Active Devices : 5

Working Devices : 6

 Failed Devices : 0

  Spare Devices : 1



        Layout : left-symmetric

    Chunk Size : 64K



 Rebuild Status : 1% complete



          UUID : ce16308c:c13226e7:126d5cca:b4ac2ebe

        Events : 0.3



    Number  Major  Minor  RaidDevice State

      0      8      49        0      active sync  /dev/sdd1

      1      8      65        1      active sync  /dev/sde1

      2      8      81        2      active sync  /dev/sdf1

      3      8      97        3      active sync  /dev/sdg1

      4      8      113        4      active sync  /dev/sdh1

      6      8      129        5      spare rebuilding  /dev/sdi1

So, I'll let it go for a few hours and check back.

hazmatt20

04-17-2007 06:31 PM

Alright, so it finished resyncing. Now we get

Code:

# mount /dev/md0 md0

mount: wrong fs type, bad option, bad superblock on /dev/md0,

      missing codepage or other error

      In some cases useful info is found in syslog - try

      dmesg | tail  or so



# fsck /dev/md0

fsck 1.39 (29-May-2006)

e2fsck 1.39 (29-May-2006)

Group descriptors look bad... trying backup blocks...

fsck.ext3: Bad magic number in super-block while trying to open /dev/md0



The superblock could not be read or does not describe a correct ext2

filesystem.  If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate superblock:

    e2fsck -b 8193 <device>

Is there anything else we can try, or is it game over?

Quakeboy02

04-17-2007 06:37 PM

You've kind of lost me here. I was under the impression that the create option created a new array and threw away anything that previously existed. As far as I was understand, the data was gone when you ran create.

hazmatt20

04-17-2007 07:00 PM

I was going by this article posted earlier. http://kev.coolcavemen.com/2007/03/h...d-superblocks/

rtspitz

04-17-2007 07:26 PM

there is a utility called testdisk http://www.cgsecurity.org/wiki/TestDisk which can scan devices with ext2/ext3 for backup superblocks and help recover them.

some hints:

http://www.cgsecurity.org/wiki/Advan...kup_SuperBlock

you could run:

testdisk /dev/md0

then: [PROCEED], [NONE], [Advanced], [Superblock]

if this works you should get some output like this:

superblock 0, blocksize=1024
superblock 8193, blocksize=1024
...
...

with that you can tell fsck.ext3 (or equivalent on your system) to use a backup superblock like e.g.:

/sbin/fsck.ext3 -b 8193 -B 1024 /dev/md0

if that doesn't work I'm at my wits' end.

hazmatt20

04-17-2007 10:02 PM

Well, testdisk didn't show any partitions under advanced, so I'm running analyse. It's going to take a good while, but I'm going to start making plans to start reloading data. I'll post an update when it finishes.

hazmatt20

04-18-2007 06:11 PM

Alright, well, it analyse didn't detect stuff correctly and just gave a bunch of garbage, so I'm pretty positive it's gone. So many DVDs to reload! Oh, well. Thanks for your help.

One last thing, what precautions should I take in the future to increase my chances of recovery? I know now to run dist-upgrade install of installing from disk, but other than that and backing up my mdadm.conf, what should I do?

Quakeboy02

04-18-2007 06:34 PM

I've been thinking about this today, and I wonder if the problem could have been avoided if you hadn't had your drives connected when you installed mdadm. I mentioned that I installed mdadm once and it created a bunch of junk on the drives I had connected. It may or may not be a problem, but it's something to think about if you have to reinstall for any reason. You might also think about creating a backup of your non-data files. There are a number of good backup systems out there. I just used tar along with a trivial script I wrote. I actually did a restore (from Knoppix) of the boot/non-data image I keep on my data disks recently, and it worked just fine.

hazmatt20

04-19-2007 12:52 AM

Well, I've almost got everything working, but I've got a few snags. Two parts.

First, I want the 6 400GB drives to start as md0 and the 3 500GB drives to start as md1. When I reboot, md0 starts with 2 of the 3 500GB drives and resyncs with the third while md1 starts with 4 of the 6 400GB drives. mdadm.conf is currently

Code:

# cat mdadm.conf

DEVICE partitions

MAILADDR root

ARRAY /dev/md0 level=raid5 num-devices=6 UUID=d295489e:1146f6bf:10e91e6c:42385ae5

ARRAY /dev/md1 level=raid5 num-devices=3 UUID=27088f9d:f6aea8a8:60e614e7:ea4536bf

Secondly, I'm setting up the two arrays on an LVM. I've already done this much.

Code:

pvcreate /dev/md0

pvcreate /dev/md1

vgcreate RAID_GROUP /dev/md0 /dev/md1

modprobe dm-mod

lvcreate -L2.72T -nmedia RAID_GROUP

mkfs.ext3 /dev/RAID_GROUP/media

After that, I could mount /dev/RAID_GROUP/media normally. After it reboots and I get the raid arrays back up, I activate it with

Code:

vgchange -a y RAID_GROUP

If I want it to activate on startup, should I just add that line to rc.local (assuming the raid arrays come up first), or is there a better way to do it before local file systems are mounted?

hazmatt20

04-19-2007 08:41 AM

Well, now on reboot, md0 comes up correctly but as the 3 500GB drives, so I'll just leave that the way it is. md1, however, only comes up with 4 drives on startup. Once I get a console,

Code:

mdadm -A /dev/md1 /dev/sd[d-i]

works, so I could run it under rc.local, but is there a cleaner way?

All times are GMT -5. The time now is 06:29 PM.