filesystem rebuild error,
Hi everyone,
i've got a redhat 7.3 server which contained a raid 0 scsi array. One of the disks failed and after repeated attempts to resurrect the disk we admitted defeat, found a replacement disk and rebuilt the array. We had a backup image on an IDE disk. Put that in the machine and copied across with dd. After fixing a few problems with Lilo not being able to see the root partition (and lilo itself not working for a while, because the array had changed from /dev/sda6 to /dev/sda3) finally got the OS to boot. Now the following error is occuring: ------------------------------- Error message(s): Checking root filesystem /: The filesystem size (according to the superblock) is 17165444 blocks The physical size of the device is 17147379 blocks Either the superblock or the partition table is likely to be corrupt! /: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. ------ End message ------ Ran fsck, it again identifies a device size mismatch according to the superblock, and prompts again, so ran fsck and on Pass 1 things are fine until block 17163085, then numerous more errors occur. Pass 2 then kicks in checking the directory structure , get the following: ------------ Missing '.' in directory inode 8570453 Setting filetype for entry '.' in Error readking block 17147383 (Attempting to read block from filesystem resulted in short read). Ignore error? ------------- then get a fair few more of these (assuming same number as in 1st pass) Anyways after a few more errors and EXT2 directory corrupted messages, fsck finishes, reboot, then it gets to the exact same point in the boot and the same occurs again. So a bit stumped at the moment. I'm thinking that when the filesystem was rebuilt that the sizes dont match and that the partition table is not accurate. Either that or the superblock is wrong. Suggestions on how to proceed would be great. Cheers, Pete |
Quote:
Which leads to: Quote:
--Abid Kazmi |
Hi Abid, thanks a bunch for the response.
The replacement disk was from another server and was the same make, model and size (as in GB). We still have the other disk, we got it to spin again and recovered partial data off it, but only 14gb of the 17 on it. We could probably rebuild that disk, not sure though what is exactly wrong with it except that it wasn't spinning but is now, so wouldn't be that comfortable in deploying it again. One other thing, probably demonstrating my naive ways, but booting with knoppix (been doing that a fair bit) we have been able to browse and open the files that were copied back across, so the data looks like its fine on disk. Thanks again for the input, Cheers, Pete |
Maybe the older disk will fail in the near future and I wouldn't recommend you putting it back in there. I have three servers at my school, all with four raid drives at 17GB each, one for part and the rest of the data.
Anyways, I am myself getting a little confused at this point. 1.) You had a 17GB disk that failed. 2.) You managed to recover 14GB of files. 3.) You don't/or shouldn't put it back into the raid. 4.) You got another same drive of another server. 5.) Placed it in and got the filesystem error: Code:
/: The filesystem size (according to the superblock) is 17165444 blocks Quote:
I believe that is your problem, the filesystem is overriding the new drive that you put in, causing the error Either the superblock or the partition table is likely to be corrupt!. Now I don't understand WHY it would be doing that when you got the SAME drive in size and model. From that, I believe you should wipe out the drive you got from the other server and place it back into your server that had a failed drive. Again, if it does not bother you or your work, I have contacted one of my colleague to help you with this problem and back up my statements. I will post in while. --Abid Kazmi |
Hi again Abid,
just to clarify, 'cause the last post was probably a bit muddled, what has happened is: 1) The server had 4 17gb disks in a raid 0 config. 2) Disk 3/4 failed, raid 0 gone 3) Attempted to retrieve data from the failed drive, only got 14gb, should have been more, so discarded. 4) Replaced disk 3/4 with a disk of same make and size from another machine 5) Rebuilt raid 0 array across the 4 disks making 1 logical drive 6) From a backup image stored on an IDE disk, dd'd the data onto the raid array 7) Stuffed around with LILO to get the machine to boot, finally booted after a bit of stuffing around 8) OS manages to mount '/' boot continues but then we run into these fsck issues and these remain unresolved. Hope that clarifys things somewhat on what has happened on this end. Sorry if this wasn't clear in the first place, and thanks again for the help so far. Cheers, Pete |
As a follow-up to this, and perhaps to add more diagnositc info, a run of gpart (with flags -f -l -c -v -v -v /dev/sda) produced an error code of 3 and the following output:
Code:
Code:
Cheers, Pete |
Wow... thanks for all that. Took in ten minutes to go through each line. :Pengy: Ok, back to the topic.
Code:
* Warning: Discarded 2 overlapping partition guesses. Thanks --Abid Kazmi |
Thanks again Abid,
this time recopied the data across and have found a potential snag. Using 'dd' we got the following: Code:
root@ttyp0[knoppix]# dd < /dev/hda > /dev/sda May try again with this, but have got the system working off the backup IDE disk at the moment as my patience was wearing out a bit :( Anyways the system is working, so may reconfigure the raid array to raid 5 (as it should have been in the first place and all of this would have been a lot easier!) and use it for a system backup. But yeah might retry with a rebuild of the array and see if that has an impact. Otherwise that may well explain the fsck differences! But can't figure out why as the drives are the same, only thing I can think of is that there were a fair few bad blocks that have been picked up and hence flagged. Anyways thanks again, Cheers, Pete |
Your welcome Pete.
Anyways, did you see the usage statistics on /dev/sda? Maybe the hdd really IS full? Any with this kind of stuff, you got to have patience. I have a 30gig ide hdd that i've been trying to fix for more than a year (did buy a replacement though). So, if you want, try the raid5 config and post the results. --Abid Kazmi |
All times are GMT -5. The time now is 02:19 AM. |