LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   RAID5 Array Recovery after OS upgrade (https://www.linuxquestions.org/questions/linux-server-73/raid5-array-recovery-after-os-upgrade-546549/)

hazmatt20 04-16-2007 03:36 PM

RAID5 Array Recovery after OS upgrade
 
I need to know if I can fix this or if I should bite the bullet and start reloading my dvd backups. Again. I have 90-95% backed up, but it's about 1.6 TB of data on hundreds of DVDs, so you know how painful reloading is.

I have 6 400 GB SATA drives on a RAID5 array mounted to /home. System is Ubuntu 6.06 Server kernel 2.6.15 with mdadm (don't know the version). I recently installed a new motherboard with more on-board SATA connectors as I was also planning to start adding more drives. The plan was to add a 3 bay enclosure that can hold 5 drives and setup 3 500 GB drives in another RAID5 array now and expand to 5 later. There were two issues with this, both of which I realize now could probably have been resolved if I had simply taken the time to learn how to compile a new kernel. The network drivers weren't loaded when it booted (two on the board), so I had to use a card, and I remember reading that I would need a newer kernel than was available in the apt repository for ubuntu 6.06 to expand an array.

So, instead of compiling a new kernel, I decided to do a fresh install of 6.10 server. During install, there was a some problem with DHCP, and it took me back to the menu. I got it sorted out but didn't realize until it finished that it managed to skip several sections of the installation, including user setup. With no login, I ran the recovery. When it finished, it looked ok. mdadm showed the device as /dev/md0(what it had been), and it mounted fine. If everything else had been fine, it would have been what I wanted, however, the install had missed other stuff besides user accounts, for example, not only were there no apt sources configured, there were no man pages installed.

Another reinstall, all the way through this time. mdadm didn't set it up correctly this time. Here is the current status.

Quote:

# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sun Apr 15 19:41:18 2007
Raid Level : raid5
Array Size : 1953556480 (1863.06 GiB 2000.44 GB)
Device Size : 390711296 (372.61 GiB 400.09 GB)
Raid Devices : 6
Total Devices : 6
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Mon Apr 16 02:17:39 2007
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

UUID : 9db5f426:b7ce1681:eb04cbd7:2b95de32
Events : 0.2

Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
1 8 96 1 active sync /dev/sdg
2 8 80 2 active sync /dev/sdf
3 8 48 3 active sync /dev/sdd
4 8 128 4 active sync /dev/sdi
5 8 112 5 active sync /dev/sdh
Quote:

Disk /dev/md0: 2000.4 GB, 2000441835520 bytes
255 heads, 63 sectors/track, 243206 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/md0p1 1 48641 390708801 fd Linux raid autodetect

The only other thing is that last night, it showed it as degraded and resyncing one drive, and it finished the resync. What should my next step be?

Quakeboy02 04-16-2007 03:57 PM

Quote:

What should my next step be?
I think I would rule out reinstalling the kernel again. :)

Quote:

State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
What's the problem? The array is clean and there are no failed devices. Granted I'm not an expert on mdadm, but...

hazmatt20 04-16-2007 04:05 PM

Quote:

Originally Posted by Quakeboy02
What's the problem? The array is clean and there are no failed devices. Granted I'm not an expert on mdadm, but...

Heh, sorry. The problem is that I can't mount /dev/md0. fdisk /dev/md0 shows a 400 GB /dev/md0p1 (I dunno).

Quote:

#mount /dev/md0 /mnt/md0
mount: you must specify the filesystem type

#mount /dev/md0p1 /mnt/md0
mount: special device /dev/md0p1 does not exist

Quakeboy02 04-16-2007 04:11 PM

Quote:

Heh, sorry. The problem is that I can't mount /dev/md0. fdisk /dev/md0 shows a 400 GB /dev/md0p1 (I dunno).
Take a look at this thread. dgar is pretty sharp on this stuff, and he haunts the raid posts, too, so maybe he'll chime in and fix you up.

http://www.linuxquestions.org/questi...d.php?t=544557

hazmatt20 04-16-2007 06:08 PM

I looked at it, but it doesn't solve this problem since I want to keep my data. Does help a little with learning about mdadm, though.

Quakeboy02 04-16-2007 07:10 PM

Did you build a new mdadm.conf as a result of this, or did you keep the one that had been on it? You didn't run mdadm -create did you? Where did the current mdadm.conf come from (auto gen or did you make it?) and what are its contents?

rtspitz 04-16-2007 07:49 PM

just a guess...

I've read a german thread about the very same error.
someone had built a raid array with /dev/sda1, /dev/sdb1 and so on.
after a kernel update his raid seemed ok, but mdadm showed /dev/sda, /dev/sdb and so on as members. fdisk -l would still show /dev/sda1, /dev/sdb1 were there.

there was also the discrepancy between device/array size.
his solution was to zero the superblocks on the false mebers /dev/sda, /dev/sdb .... + reboot

Code:

mdadm --zero-superblock /dev/sd[a-e]
https://lists.uni-koeln.de/pipermail...er/011313.html

in case you want to give it a shot, I can translate it in detail

hazmatt20 04-16-2007 10:41 PM

I don't think I ran mdadm --create. The mdadm.conf was auto-generated.

Quote:

DEVICE partitions
ARRAY /dev/md0 level=raid5 num-devices=6 UUID=9db5f426:b7ce1681:eb04cbd7:2b95de32
That German thread is exactly what mine is doing. If you don't mind translating the solution, I'd be grateful. I tried

Quote:

# mdadm --zero-superblock /dev/sd[d-i]
mdadm: Couldn't open /dev/sdd for write - not zeroing
mdadm: Couldn't open /dev/sde for write - not zeroing
mdadm: Couldn't open /dev/sdf for write - not zeroing
mdadm: Couldn't open /dev/sdg for write - not zeroing
mdadm: Couldn't open /dev/sdh for write - not zeroing
mdadm: Couldn't open /dev/sdi for write - not zeroing
(sd[a-c] are the 3 500 GB drives)

Quakeboy02 04-16-2007 11:15 PM

Quote:

The mdadm.conf was auto-generated.
When I was messing with mdadm, it once created several arrays out of just random drives on my system just after installing it. How reassembling it by hand. Have you tried that?

Code:

mdadm --stop /dev/md0
mdadm --assemble /dev/md0 --level=5 /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi

If this works, then you need to create a new mdadm.conf file for it.

hazmatt20 04-16-2007 11:52 PM

Um, so it may be worse now. I stopped the array and assembled it, but it came up the same as before with /dev/md0p1 in fdisk. Stopped it again and tried the zero-superblock because last time I had the command out of order and didn't stop it. It worked. Tried to assemble, and it said,

Quote:

# mdadm --assemble /dev/md0 /dev/sd[e-i]
mdadm: no recogniseable superblock
Rebooted, and it did the same thing. I'll point out that when it booted, it made md0 out of sd[a-c] (the 500's) and md1 with 4 of the 6 400's. After stopping both, assemble gave me the no recogniseable superblock error for both arrays even though I didn't run zero-superblock on the sd[a-c] array.

Quakeboy02 04-16-2007 11:58 PM

Code:

mdadm --assemble /dev/md0 /dev/sd[e-i]
I don't know the consequences of not specifying which type of raid during an assemble operation. Also, is that a typo, or did you really use sd[e-i]? Or did you type something completely different but you're reporting this?

hazmatt20 04-16-2007 11:58 PM

This thread mentions mdadm --assemble --force. Would it be a bad idea to try it?

http://www.issociate.de/board/post/2...lyaborted.html

Quakeboy02 04-16-2007 11:59 PM

First, I'd like to know exactly what you typed.

rtspitz 04-17-2007 06:29 AM

this one will be interesting as well:

http://kev.coolcavemen.com/2007/03/h...d-superblocks/

basically what is discussed here is recovery of a raid5 after zero-ing all the superblocks of the included partitions, seems to work.

I've tested it with vmware and a raid1, killed all superblocks, mdadm would not assemble/start the array. then tried above mentioned --create and lo and behold, I could mount it, no data was lost. mdadm complained about a preexisting filesystem, but I forced it to do its magic and it worked.

hazmatt20 04-17-2007 09:28 AM

Quote:

Originally Posted by Quakeboy02
First, I'd like to know exactly what you typed.

Sorry. We updated at the same time.

Quote:

mdadm: option --level not valid in assemble mode
I tried both of these.

Quote:

# mdadm --assemble /dev/md0 /dev/sd[d-i]
mdadm: no recogniseable superblock
mdadm: /dev/sdd has no superblock - assembly aborted
# mdadm --assemble /dev/md0 /dev/sd[d-i]1
mdadm: cannot open device /dev/sdd1: Device or resource busy
mdadm: /dev/sdd1 has no superblock - assembly aborted

rtspitz 04-17-2007 11:24 AM

Quote:

Originally Posted by hazmatt20
That German thread is exactly what mine is doing. If you don't mind translating the solution, I'd be grateful. I tried

translation of the last part with the "solution":

Code:

at the time I created the raid I must have made a mistake, which showed up right now.
apparently I had created persistent superblocks on the devices (/dev/sd[a-e]) as well as
on the partitions (/dev/sd[a-e]1).
after zero-ing the superblocks with "mdadm --zero-superblock /dev/sd[-e]" and rebooting,
the partitions showed up in /proc/partitions again and the raid was operational and could
be mounted without any errors.

this night was not subject to entertainment tax. (:


hazmatt20 04-17-2007 02:01 PM

Well, I decided to give mdadm --create a shot.

Code:

mdadm --create /dev/md0 --verbose --level=5 --raid-devices=6 /dev/sd[d-i]1
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: /dev/sdd1 appears to contain an ext2fs file system
    size=1953543680K  mtime=Sun Apr 15 18:40:03 2007
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007
mdadm: /dev/sde1 appears to be part of a raid array:
    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007
mdadm: /dev/sdf1 appears to be part of a raid array:
    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007
mdadm: /dev/sdg1 appears to contain an ext2fs file system
    size=1953543680K  mtime=Sun Apr 15 18:40:02 2007
mdadm: /dev/sdg1 appears to be part of a raid array:
    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007
mdadm: /dev/sdh1 appears to be part of a raid array:
    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007
mdadm: /dev/sdi1 appears to be part of a raid array:
    level=raid5 devices=6 ctime=Sun Mar 11 00:22:58 2007
mdadm: size set to 390708736K
Continue creating array? y
mdadm: array /dev/md0 started.

#cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdi1[6] sdh1[4] sdg1[3] sdf1[2] sde1[1] sdd1[0]
      1953543680 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUUU_]
      [>....................]  recovery =  0.1% (419712/390708736) finish=340.8min speed=19077K/sec

# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Apr 17 14:57:02 2007
    Raid Level : raid5
    Array Size : 1953543680 (1863.04 GiB 2000.43 GB)
    Device Size : 390708736 (372.61 GiB 400.09 GB)
  Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Apr 17 14:57:41 2007
          State : clean, degraded, recovering
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

        Layout : left-symmetric
    Chunk Size : 64K

 Rebuild Status : 1% complete

          UUID : ce16308c:c13226e7:126d5cca:b4ac2ebe
        Events : 0.3

    Number  Major  Minor  RaidDevice State
      0      8      49        0      active sync  /dev/sdd1
      1      8      65        1      active sync  /dev/sde1
      2      8      81        2      active sync  /dev/sdf1
      3      8      97        3      active sync  /dev/sdg1
      4      8      113        4      active sync  /dev/sdh1
      6      8      129        5      spare rebuilding  /dev/sdi1

So, I'll let it go for a few hours and check back.

hazmatt20 04-17-2007 06:31 PM

Alright, so it finished resyncing. Now we get

Code:

# mount /dev/md0 md0
mount: wrong fs type, bad option, bad superblock on /dev/md0,
      missing codepage or other error
      In some cases useful info is found in syslog - try
      dmesg | tail  or so

# fsck /dev/md0
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
Group descriptors look bad... trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

Is there anything else we can try, or is it game over?

Quakeboy02 04-17-2007 06:37 PM

You've kind of lost me here. I was under the impression that the create option created a new array and threw away anything that previously existed. As far as I was understand, the data was gone when you ran create.

hazmatt20 04-17-2007 07:00 PM

I was going by this article posted earlier. http://kev.coolcavemen.com/2007/03/h...d-superblocks/

rtspitz 04-17-2007 07:26 PM

there is a utility called testdisk http://www.cgsecurity.org/wiki/TestDisk which can scan devices with ext2/ext3 for backup superblocks and help recover them.

some hints:

http://www.cgsecurity.org/wiki/Advan...kup_SuperBlock


you could run:

testdisk /dev/md0

then: [PROCEED], [NONE], [Advanced], [Superblock]

if this works you should get some output like this:

superblock 0, blocksize=1024
superblock 8193, blocksize=1024
...
...

with that you can tell fsck.ext3 (or equivalent on your system) to use a backup superblock like e.g.:

/sbin/fsck.ext3 -b 8193 -B 1024 /dev/md0


if that doesn't work I'm at my wits' end.

hazmatt20 04-17-2007 10:02 PM

Well, testdisk didn't show any partitions under advanced, so I'm running analyse. It's going to take a good while, but I'm going to start making plans to start reloading data. I'll post an update when it finishes.

hazmatt20 04-18-2007 06:11 PM

Alright, well, it analyse didn't detect stuff correctly and just gave a bunch of garbage, so I'm pretty positive it's gone. So many DVDs to reload! Oh, well. Thanks for your help.

One last thing, what precautions should I take in the future to increase my chances of recovery? I know now to run dist-upgrade install of installing from disk, but other than that and backing up my mdadm.conf, what should I do?

Quakeboy02 04-18-2007 06:34 PM

I've been thinking about this today, and I wonder if the problem could have been avoided if you hadn't had your drives connected when you installed mdadm. I mentioned that I installed mdadm once and it created a bunch of junk on the drives I had connected. It may or may not be a problem, but it's something to think about if you have to reinstall for any reason. You might also think about creating a backup of your non-data files. There are a number of good backup systems out there. I just used tar along with a trivial script I wrote. I actually did a restore (from Knoppix) of the boot/non-data image I keep on my data disks recently, and it worked just fine.

hazmatt20 04-19-2007 12:52 AM

Well, I've almost got everything working, but I've got a few snags. Two parts.

First, I want the 6 400GB drives to start as md0 and the 3 500GB drives to start as md1. When I reboot, md0 starts with 2 of the 3 500GB drives and resyncs with the third while md1 starts with 4 of the 6 400GB drives. mdadm.conf is currently

Code:

# cat mdadm.conf
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid5 num-devices=6 UUID=d295489e:1146f6bf:10e91e6c:42385ae5
ARRAY /dev/md1 level=raid5 num-devices=3 UUID=27088f9d:f6aea8a8:60e614e7:ea4536bf

Secondly, I'm setting up the two arrays on an LVM. I've already done this much.

Code:

pvcreate /dev/md0
pvcreate /dev/md1
vgcreate RAID_GROUP /dev/md0 /dev/md1
modprobe dm-mod
lvcreate -L2.72T -nmedia RAID_GROUP
mkfs.ext3 /dev/RAID_GROUP/media

After that, I could mount /dev/RAID_GROUP/media normally. After it reboots and I get the raid arrays back up, I activate it with

Code:

vgchange -a y RAID_GROUP
If I want it to activate on startup, should I just add that line to rc.local (assuming the raid arrays come up first), or is there a better way to do it before local file systems are mounted?

hazmatt20 04-19-2007 08:41 AM

Well, now on reboot, md0 comes up correctly but as the 3 500GB drives, so I'll just leave that the way it is. md1, however, only comes up with 4 drives on startup. Once I get a console,

Code:

mdadm -A /dev/md1 /dev/sd[d-i]
works, so I could run it under rc.local, but is there a cleaner way?


All times are GMT -5. The time now is 06:29 PM.