[SOLVED] Raid Repair Now wont boot - Other mounting problems

Ser Olmy · 05-20-2012, 05:08 PM

The could be related to /dev/md1, but it could also be that the system takes an inordinate amount of time to boot due to repeated errors reading the partition table of /dev/sdb. I wouldn't give up just yet.

However, if at any time an mdadm command was entered that could be interpreted as "resync the array using data from /dev/sdbx", all bets are off.

Is there no possibility for out-of-band management or remote access to this server?

bigstack · 05-20-2012, 05:24 PM

Quote:

Originally Posted by Ser Olmy

The could be related to /dev/md1, but it could also be that the system takes an inordinate amount of time to boot due to repeated errors reading the partition table of /dev/sdb. I wouldn't give up just yet.

However, if at any time an mdadm command was entered that could be interpreted as "resync the array using data from /dev/sdbx", all bets are off.

Is there no possibility for out-of-band management or remote access to this server?

This is a dedicated Server from OVH. I have been using a netboot-rescue mode to do what im doing as of now. I just dont understand Y i cant mount a drive like I used to be able to. What do you mean by consol.. the rescue mode is a net boot that whats i get... I can boot to other kernals over the netboot aswell but never come online

Ser Olmy · 05-20-2012, 05:35 PM

Quote:

Originally Posted by bigstack

This is a dedicated Server from OVH. I have been using a netboot-rescue mode to do what im doing as of now. I just dont understand Y i cant mount a drive like I used to be able to.

Clearly, the file system has been damaged. Exactly how that happened, is another matter.

When the 2nd drive failed, the md subsystem correctly booted /dev/sdb3 out of /dev/md3. Then you added it back, which is unfortunate, but by itself that shouldn't have caused data corruption. Unless, as I've said, something convinced md that /dev/sdb3 rather than /dev/sda3 contained the authoritative part of the mirror set.

What puzzles me is that md didn't also remove /dev/sdb1 from /dev/md1. OK, so the defective sectors of /dev/sdb may not have been located in the area occupied by /dev/sdb1, but then why the boot problems?

If you can boot the server using netboot-rescue (PXE?), you should run fsck on /dev/md1. If you can repair the boot partition, the server should boot as normal and you can proceed trying to fix /dev/md3.

bigstack · 05-20-2012, 05:47 PM

OK when i try to do the fsck on md1 (boot area) i get

root@rescue:~# fsck -fc /dev/md1
fsck from util-linux-ng 2.17.2
fsck: fsck.swap: not found
fsck: Error 2 while executing fsck.swap for /dev/md1

same for md3

Ser Olmy · 05-20-2012, 05:54 PM

Quote:

Originally Posted by bigstack

OK when i try to do the fsck on md1 (boot area) i get

root@rescue:~# fsck -fc /dev/md1
fsck from util-linux-ng 2.17.2
fsck: fsck.swap: not found
fsck: Error 2 while executing fsck.swap for /dev/md1

same for md3

OK, this is potentially a major issue.

fsck tries to auto-detect the file system, and concludes that this must be a swap partition. I don't know how broken a filesystem has to be for fsck to reach that conclusion, but my guess is that the damage to the superblock must be pretty severe.

You need to think long and hard about whether you have something valuable on /dev/md1 or not, and if it can be recovered from backups. If you do and you don't have a backup, go no further. Otherwise, you should proceed as outlined below.

You can force fsck to treat a partition as containing a certain file system, and that's what you need to do here. If your filesystem was/is ext4, run fsck -t ext4 /dev/md1. Whatever you do, do not specify the wrong filesystem.

bigstack · 05-20-2012, 05:59 PM

I think i Might have declared the entire drive or sda3 or sdb3 a swap partition with some swapon -a thing but still shat shouldnt format the data there? it was an ext3 partition.

bigstack · 05-20-2012, 06:10 PM

Ok i ran this

Code:

root@rescue:~# fsck -t ext3 /dev/md1
fsck from util-linux-ng 2.17.2
fsck: fsck.swap: not found
fsck: Error 2 while executing fsck.swap for /dev/md1

but again it errors?

Ser Olmy · 05-20-2012, 06:12 PM

Are you kidding? How does one accidentally declare two data partitions as swap space?

mkswap most certainly writes data to the drive, potentially destroying the superblock. Fortunately for you, there are multiple copies of the superblock spread across the disk. Also, unless you also "accidentally" activated the partitions with swapon, nothing else was written to the partitions.

I don't think you have much to lose by doing an fsck on /dev/md1, but again, only you know that it contains (or used to contain).

Edit: I misread "swapon" as "mkswap". Sorry, your partitions are irreparably damaged. Time to dig out the backups.

bigstack · 05-20-2012, 06:22 PM

lol the backup was my raid lol so all is lost? what about e2fsck.. the drive was not formatted i find it hard to believe that it can erase the data?

I found this is appears this guy did a mkswap and was able to bring it back. http://www.linuxmisc.com/1-linux-set...ae15cd9083.htm

Ser Olmy · 05-20-2012, 06:35 PM

mkswap wipes out the superblock (and possibly more), but the superblock can be recovered.

swapon activates swapping on a partition, which means you're saying "please, pretty please, write random data from memory all over this partition".

I think we've found the root of your problems. By all means, run fsck.ext3 /dev/md1 if you think it might work.

Ser Olmy · 05-20-2012, 06:39 PM

Oh, and BTW: RAID is not a form of backup. It's an insurance against the inevitable failure of hard drives. It does not protect you from any other kind of potentially data-destroying event, of which there are legion.

bigstack · 05-20-2012, 06:42 PM

Quote:

Originally Posted by Ser Olmy

mkswap wipes out the superblock (and possibly more), but the superblock can be recovered.

swapon activates swapping on a partition, which means you're saying "please, pretty please, write random data from memory all over this partition".

I think we've found the root of your problems. By all means, run fsck.ext3 /dev/md1 if you think it might work.

I just ran

fsck.ext3 /dev/md1

It appears to be doing something...

Now i see this

Code:

Free blocks count wrong for group #206 (32510, counted=19402).
Fix<y>? yes

Free blocks count wrong for group #207 (32510, counted=22213).
Fix<y>? yes

Free blocks count wrong for group #208 (32510, counted=21993).
Fix<y>? yes

Free blocks count wrong for group #209 (32510, counted=22042).
Fix<y>? yes

Free blocks count wrong for group #210 (32510, counted=25457).
Fix<y>? yes

I hit yes a couple times but this appears to loop forever or should i keep hitting enter forever

bigstack · 05-20-2012, 06:46 PM

Ok i canceled that...

I found this
http://www.cyberciti.biz/faq/recover...ted-partition/

I ran dumpe2fs /dev/md1 | grep superblock and I got this output Which i assume is good.

Code:

root@rescue:~# dumpe2fs /dev/md1 | grep superblock
dumpe2fs 1.41.12 (17-May-2010)
  Primary superblock at 0, Group descriptors at 1-3
  Backup superblock at 32768, Group descriptors at 32769-32771
  Backup superblock at 98304, Group descriptors at 98305-98307
  Backup superblock at 163840, Group descriptors at 163841-163843
  Backup superblock at 229376, Group descriptors at 229377-229379
  Backup superblock at 294912, Group descriptors at 294913-294915
  Backup superblock at 819200, Group descriptors at 819201-819203
  Backup superblock at 884736, Group descriptors at 884737-884739
  Backup superblock at 1605632, Group descriptors at 1605633-1605635
  Backup superblock at 2654208, Group descriptors at 2654209-2654211
  Backup superblock at 4096000, Group descriptors at 4096001-4096003
  Backup superblock at 7962624, Group descriptors at 7962625-7962627
root@rescue:~#

Should I continue that guide? What is next. Im at my wits end

UPDATE following down the list on that page i used 819200 for my super block number and now I can mount the drive well md1

Should the same thing work with md3?

Ser Olmy · 05-20-2012, 06:48 PM

The command line switches "-p" and "-y" will make fsck.ext3 repair damage automatically, and assume "yes" to all questions. Needless to say, this is very dangerous.

Specify an alternate superblock with the -b parameter.

Edit: Like this: fsck.ext3 -b 7962624 -p -y /dev/md1

bigstack · 05-20-2012, 06:52 PM

Ok MD1 i can reach

When i tried to run I get the below... Or should i first do the ext3. code u sent me then cancel it after a bit.

Code:

root@rescue:/mnt# dumpe2fs /dev/md3 | grep superblock
dumpe2fs 1.41.12 (17-May-2010)
dumpe2fs: Bad magic number in super-block while trying to open /dev/md3
Couldn't find valid filesystem superblock.
root@rescue:/mnt#

Could I use (sample code)

Quote:

$ dumpe2fs /dev/sda6
- got just the “Bad magic number in super-block while trying to open…” message
used
$ mke2fs -n /dev/sda6
got the 2nd super block same as in the example.
Used:
$ fsck -b 32768 /dev/sda6
to fix.

to fix