fatal error: root partiton corrupted, can't mount it, can't fsck it.
I spend the last couple of weeks installing lfs. A few days ago, I started noticing a few error messages during boot up. All of them file system related. Things like orphen inodes being deleted. Then once I got the error message that /etc/mtab could not be read (again during boot-up) because of an io error. So I booted knoppix and fscked the root partiton. The problem went away, for a while. When I first booted today, I saw some error messages again. This worried me, but I hoped the would go away. But booting the second time today resulted in an terrifying site: an kernel panic, something about an io error.
The most awfull thing is 1) I spend something like 5 hours today installing kde :( 2) I can't fsck the root partiton, cos if I do (doing it from knoppix, btw), I get : Code:
e2fsck /dev/hda4 Is this an corrupt partition, a bad hard drive (I can mount other partition fine btw), a corrupt file system? What? |
"A few days ago, I started noticing a few error messages during boot up. All of them file system related."
Were these messages fsck messages? Are you running fsck on every boot? If not then why was your startup script choosing to run fsck? "Is this an corrupt partition, a bad hard drive (I can mount other partition fine btw), a corrupt file system? What?" I would say that the first thing to check for is a bad entry in the partition table. You can make a basic check on your partition table by booting a rescue CD and using fdisk to print the partition table. One very risky thing that you could try is to delete the bad partition and then reallocate it exactly the same as before. If a bad partition table entry is the problem then this would recreate the partition table entry without disturbing the filesytem. I would only do this as a last resort before giving up and either reinstalling everything or restoring from backup. The second most likely source of your problem is a bad block as the very first block in the filesystem. When you successfully ran fsck on knoppix you did not mention any bad block messages. If fsck had found any bad blocks in the file system it would have told you so and asked your permission to make the bad blocks unallocatable. If you reinstall or restore then you should try to format the partition before you reinstall or restore. ___________________________________ Be prepared. Create a LifeBoat CD. http://users.rcn.com/srstites/LifeBo...home.page.html Steve Stites |
Thanks for the help jailbait.
The error messages were not from fsck. My root partition is formated as ext3. During boot up, I would see messages like "deleting orphen inode <bla>". fsck would run after every 39 (or so) mount that occures. During the last fsck (by the systems init scripts, not by me on knoppix), problems were reported, but it said that those were resolved. I will post the output of fdisk as soon as I'm able to. Right now i'm running an windows program that scans and recovers ext2/3 partitions. I'l see first how that goes. With a bit of luck, my root partition will be recoverd after that. When I ran fsck on my root partition did complain about errors, though I don't remember if they were about bad blockes. A question, how can the partition table suddenly get corrupted? That doesn't make sense. And how come there was an progressive deteriation? I have also downloaded an utility that checks the hard drive for physical malfunctions form the web site of the hard drive manufactutor (seagate, btw) Thanks for the help. It's nice to know someone is thinking with me. |
"My root partition is formated as ext3."
When you ran fsck on Knoppix you ran the wrong version of fsck: "e2fsck /dev/hda4 e2fsck 1.32 (09-Nov-2002) e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/hda4 Could this be a zero-length partition?" You should have run fsck.ext3. Try running fsck.ext3 on Knoppix again. Hopefuly e2fsck did not damage your ext3 partition. That also could explain why e2fsck thought that your file system was thoroughly screwed up. ___________________________________ Be prepared. Create a LifeBoat CD. http://users.rcn.com/srstites/LifeBo...home.page.html Steve Stites |
Oh, damn. I thought ext2 and 3 were similar enough so the could be checked by the same checker.
But stil: 1) I had problems even before I ran fsck my self and 2) fsck did solve the problem temporaraly But I'l boot ot knoppix again and run fsck.ext3 on /dev/hda4. See what happens Oh and I get this message from the windows prog i'm using now: Read disk ST360020A 3.6 at position 32506308608 failed after 10 attempts. Data error (cyclic redundancy check) (23) I'm starting to wounder if this might be because of some physical damage on the hard drive. |
"I'm starting to wounder if this might be because of some physical damage on the hard drive."
It is definitely worth running your Seagate diagnostics. "2) fsck did solve the problem temporaraly" If you ever get to the point that you can read the file system again check to see if fsck dropped anything into lost+found. ___________________________________ Be prepared. Create a LifeBoat CD. http://users.rcn.com/srstites/LifeBo...home.page.html Steve Stites |
Okey this is what I have tried:
The windows prog I told you about actually lets you create an partition image, or an image of part of the partition. There was somethig like 3.1 gig's of data written to the root partition, So I said to make an image of the partition form 0 to 4 GB. The windows program gave the warning Code:
Read disk ST360020A 3.6 at position 32506308608 failed after 10 attempts. Data error (cyclic redundancy check) (23) Now my plan is to create the image again up to 4GB, but skip the first super block, that is, begin from 4069 B. That did get rid of the above warning. Once I have an working image, I plan to delete the partition, create a new one, format it and copy all files from the image to /dev/hda4. I that doesn't work, well you could help me by suggesting an what distro to use cos I'm not planning to go thourgh the installation of lfs again! So cross your fingers for me, will ya.... p.s. : I Tried doing "fsck.ext3 /dev/hda4": That was a no go. The same error. |
"Now my plan is to create the image again up to 4GB, but skip the first super block, that is, begin from 4069 B. "
Once you get a file systm that is OK except for missing the first superblock you could try tunning fscl.ext3 against that file system to see if fsck is smart enough to recreate the first superblock from the information available in the other superblocks. ___________________________________ Be prepared. Create a LifeBoat CD. http://users.rcn.com/srstites/LifeBo...home.page.html Steve Stites |
Thanks for your help jailbait. But I guess I pretty much knew I was doomed (or rather my linux installation was). The seagate harddrive checking utility confirmed (after 9 hours of scanning) that the harddrive is physically corrupt. The first super block on the partition was just unreadable. I tried creating an image with dd under knoppix. When every I would do
Code:
dd if=/dev/hda4 of=root.img bs=1024 count=<something> skip=0 What I did now, was to delete /dev/hda4, create a new one in it's place, but then with an offset from the end of /dev/hda3, I there is a gap between the partitions. This worked. I temporarily installed vector linux. Man i'm crying, I had LFS setup exactly the way I wanted to and it was perfect. Just hope this never happens again to me or any one else. |
"Just hope this never happens again to me or any one else."
It will happen again to you and everyone else also. For a while a bad Maxtor card was doing it to me about once a week. That is why I wrote LifeBoat. I suggest that you start taking weekly backups (or daily if you want to be a backup fanatic). That way the most that you lose is a week's work. ___________________________________ Be prepared. Create a LifeBoat CD. http://users.rcn.com/srstites/LifeBo...home.page.html Steve Stites |
I still have a few questions if you don't mind.
How do I tell if this was just an very unfortunate exident (the hd corrupting hit at one the worst places imaginable), or if the corruption had an underlying reason and the same thing could happen again with the new partition. I did not mention this ealier, but somtimes, when I am working on my system, I would get the message Code:
hda: status timeout: status = 0xd0 {busy} I you have any other suggestions, or any one else who is reading this, any help would be appreciated. |
"hda: status timeout: status = 0xd0 {busy}
hda: no DRQ after issuing write ide0: reset: succes done" This error message means that the kernel has successfully recovered from an I/O error while trying to write on ide0 (It does not mean that the write was successful. It just means that the kernel got ide0 back to a usable state again). This could be a hard drive problem or it could be a problem with the IDE chipset. I used to get a lot of this type of error on a faulty addon Maxtor IDE controller card. This could be a problem with your motherboard IDE chipset but the simplest explanation is that these errors were caused by the bad spot that you found on the hard drive, not that a faulty chipset caused a bad spot on the hard drive. ___________________________________ Be prepared. Create a LifeBoat CD. http://users.rcn.com/srstites/LifeBo...home.page.html Steve Stites |
Yea but the thing is, i'm still getting those errors, even though I partitioned out the bad sector.
Googeling on the above messages, I find that they can be caused by two things: 1) If the harddrive and cdrom are attached on the same ide channel, which is indeed the case with me, or 2) A dying harddrive see this e-mail I found on an mailing list Quote:
|
"Yea but the thing is, i'm still getting those errors, even though I partitioned out the bad sector.
Googeling on the above messages, I find that they can be caused by two things: 1) If the harddrive and cdrom are attached on the same ide channel, which is indeed the case with me, or 2) A dying harddrive" You can get the error message for a wide variety of hardware errors. Those two problems are not the only problems that can cause the error message. In my case the problem was different than the above two. "1) If the harddrive and cdrom are attached on the same ide channel, which is indeed the case with me, or" You can attach the hard drive and cdrom to the same cable. What does not work is when the cdrom requires a 40 wire cable and the hard drive requires a 80 wire cable and you put them on the same cable. This is possibly the root cause of your disk errors problem. "2) A dying harddrive" You found a bad spot on the drive. Maybe there are other bad spots on the hard drive that you have not found yet. You can use the Seagate utilities that you downloaded to low level format your hard drive. At the end of the hard drive are some extra blocks. The Seagate utility will reassign bad blocks to the spare blocks. This will work OK as long as you do not have more bad spots than spare blocks. It also thoroughly erases your hard drive so you have to backup before you low level format and then partition, format, and restore after. So if you have a bad drive then maybe a low level format will fix it. Or maybe the drive will continue to die even after a low level format. Or you can say that it is cheaper to just buy a new hard drive than to spend two days working on fixing the old one. I once did a low level format that took three hours. Is the old drive still under warrenty? If I were in your situation I would become a daily backup fanatic. ___________________________________ Be prepared. Create a LifeBoat CD. http://users.rcn.com/srstites/LifeBo...home.page.html Steve Stites |
One more post and I will be out of your hair. I figured out how I could of saved my LFS installation. This is what I should of done:
First, make an image by skipping some kB's of the beginning of the partition, but skip as little as possible, so Code:
Code:
dd if=/dev/zero of=zeros bs=1024 count=4 Code:
cat zeros damaged_image.img > correct_image.img Code:
fsck.ext3 correct_image.img This all it to late for me now. I wish I had this inspiration earlier. I know this will work because I have tried it with an othe partition. |
All times are GMT -5. The time now is 02:12 AM. |