LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-27-2003, 10:30 AM   #1
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Rep: Reputation: 45
fatal error: root partiton corrupted, can't mount it, can't fsck it.


I spend the last couple of weeks installing lfs. A few days ago, I started noticing a few error messages during boot up. All of them file system related. Things like orphen inodes being deleted. Then once I got the error message that /etc/mtab could not be read (again during boot-up) because of an io error. So I booted knoppix and fscked the root partiton. The problem went away, for a while. When I first booted today, I saw some error messages again. This worried me, but I hoped the would go away. But booting the second time today resulted in an terrifying site: an kernel panic, something about an io error.
The most awfull thing is 1) I spend something like 5 hours today installing kde
2) I can't fsck the root partiton, cos if I do (doing it from knoppix, btw), I get :

Code:
e2fsck /dev/hda4
e2fsck 1.32 (09-Nov-2002)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/hda4
Could this be a zero-length partition?
I'm desprate for any help. Like I said, I spend weeks installink linux from scratch.
Is this an corrupt partition, a bad hard drive (I can mount other partition fine btw), a corrupt file system? What?

Last edited by trickykid; 12-27-2003 at 11:09 AM.
 
Old 12-27-2003, 01:52 PM   #2
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,334

Rep: Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547
"A few days ago, I started noticing a few error messages during boot up. All of them file system related."

Were these messages fsck messages? Are you running fsck on every boot? If not then why was your startup script choosing to run fsck?

"Is this an corrupt partition, a bad hard drive (I can mount other partition fine btw), a corrupt file system? What?"

I would say that the first thing to check for is a bad entry in the partition table. You can make a basic check on your partition table by booting a rescue CD and using fdisk to print the partition table.

One very risky thing that you could try is to delete the bad partition and then reallocate it exactly the same as before. If a bad partition table entry is the problem then this would recreate the partition table entry without disturbing the filesytem. I would only do this as a last resort before giving up and either reinstalling everything or restoring from backup.

The second most likely source of your problem is a bad block as the very first block in the filesystem. When you successfully ran fsck on knoppix you did not mention any bad block messages. If fsck had found any bad blocks in the file system it would have told you so and asked your permission to make the bad blocks unallocatable.

If you reinstall or restore then you should try to format the partition before you reinstall or restore.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites
 
Old 12-27-2003, 04:50 PM   #3
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Original Poster
Rep: Reputation: 45
Thanks for the help jailbait.
The error messages were not from fsck. My root partition is formated as ext3. During boot up, I would see messages like "deleting orphen inode <bla>". fsck would run after every 39 (or so) mount that occures.
During the last fsck (by the systems init scripts, not by me on knoppix), problems were reported, but it said that those were resolved.
I will post the output of fdisk as soon as I'm able to. Right now i'm running an windows program that scans and recovers ext2/3 partitions. I'l see first how that goes. With a bit of luck, my root partition will be recoverd after that.
When I ran fsck on my root partition did complain about errors, though I don't remember if they were about bad blockes.
A question, how can the partition table suddenly get corrupted? That doesn't make sense. And how come there was an progressive deteriation?
I have also downloaded an utility that checks the hard drive for physical malfunctions form the web site of the hard drive manufactutor (seagate, btw)
Thanks for the help. It's nice to know someone is thinking with me.

Last edited by qanopus; 12-27-2003 at 04:56 PM.
 
Old 12-27-2003, 05:06 PM   #4
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,334

Rep: Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547
"My root partition is formated as ext3."

When you ran fsck on Knoppix you ran the wrong version of fsck:

"e2fsck /dev/hda4
e2fsck 1.32 (09-Nov-2002)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/hda4
Could this be a zero-length partition?"

You should have run fsck.ext3.

Try running fsck.ext3 on Knoppix again. Hopefuly e2fsck did not damage your ext3 partition.
That also could explain why e2fsck thought that your file system was thoroughly screwed up.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites
 
Old 12-27-2003, 05:30 PM   #5
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Original Poster
Rep: Reputation: 45
Oh, damn. I thought ext2 and 3 were similar enough so the could be checked by the same checker.
But stil:
1) I had problems even before I ran fsck my self and
2) fsck did solve the problem temporaraly

But I'l boot ot knoppix again and run fsck.ext3 on /dev/hda4. See what happens

Oh and I get this message from the windows prog i'm using now:

Read disk ST360020A 3.6 at position 32506308608 failed after 10 attempts. Data error (cyclic redundancy check) (23)

I'm starting to wounder if this might be because of some physical damage on the hard drive.

Last edited by qanopus; 12-27-2003 at 05:37 PM.
 
Old 12-27-2003, 06:21 PM   #6
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,334

Rep: Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547
"I'm starting to wounder if this might be because of some physical damage on the hard drive."

It is definitely worth running your Seagate diagnostics.

"2) fsck did solve the problem temporaraly"

If you ever get to the point that you can read the file system again check to see if fsck dropped anything into lost+found.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites
 
Old 12-28-2003, 04:22 AM   #7
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Original Poster
Rep: Reputation: 45
Okey this is what I have tried:
The windows prog I told you about actually lets you create an partition image, or an image of part of the partition. There was somethig like 3.1 gig's of data written to the root partition, So I said to make an image of the partition form 0 to 4 GB. The windows program gave the warning
Code:
Read disk ST360020A 3.6 at position 32506308608 failed after 10 attempts. Data error (cyclic redundancy check) (23)
at the very beginning of the read proces. But it did gave me an image that I could actually mount via the loop back device on linux. And when I did so I did saw all the directories in the root directory (you know, /etc, /lib, the workes) but I coulden't enter the directories because it said that file system was corruped.
Now my plan is to create the image again up to 4GB, but skip the first super block, that is, begin from 4069 B. That did get rid of the above warning.
Once I have an working image, I plan to delete the partition, create a new one, format it and copy all files from the image to /dev/hda4. I that doesn't work, well you could help me by suggesting an what distro to use cos I'm not planning to go thourgh the installation of lfs again!
So cross your fingers for me, will ya....

p.s. : I Tried doing "fsck.ext3 /dev/hda4": That was a no go. The same error.

Last edited by qanopus; 12-28-2003 at 04:27 AM.
 
Old 12-28-2003, 11:14 AM   #8
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,334

Rep: Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547
"Now my plan is to create the image again up to 4GB, but skip the first super block, that is, begin from 4069 B. "

Once you get a file systm that is OK except for missing the first superblock you could try tunning fscl.ext3 against that file system to see if fsck is smart enough to recreate the first superblock from the information available in the other superblocks.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites

Last edited by jailbait; 12-28-2003 at 11:18 AM.
 
Old 12-28-2003, 04:52 PM   #9
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Original Poster
Rep: Reputation: 45
Thanks for your help jailbait. But I guess I pretty much knew I was doomed (or rather my linux installation was). The seagate harddrive checking utility confirmed (after 9 hours of scanning) that the harddrive is physically corrupt. The first super block on the partition was just unreadable. I tried creating an image with dd under knoppix. When every I would do
Code:
 dd if=/dev/hda4 of=root.img bs=1024 count=<something> skip=0
I would get an input output error. But when the skip parameter is higher or equal to 4 (the superblockes on the partition were 4 kb), it would read the drive. But then I woulden't have a valid partiton image. Your advice to check that image with fsck came to late my friend. I already re-formated the partition.
What I did now, was to delete /dev/hda4, create a new one in it's place, but then with an offset from the end of /dev/hda3, I there is a gap between the partitions. This worked.
I temporarily installed vector linux. Man i'm crying, I had LFS setup exactly the way I wanted to and it was perfect. Just hope this never happens again to me or any one else.
 
Old 12-28-2003, 06:20 PM   #10
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,334

Rep: Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547
"Just hope this never happens again to me or any one else."

It will happen again to you and everyone else also. For a while a bad Maxtor card was doing it to me about once a week. That is why I wrote LifeBoat. I suggest that you start taking weekly backups (or daily if you want to be a backup fanatic). That way the most that you lose is a week's work.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites
 
Old 12-29-2003, 04:21 AM   #11
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Original Poster
Rep: Reputation: 45
I still have a few questions if you don't mind.
How do I tell if this was just an very unfortunate exident (the hd corrupting hit at one the worst places imaginable), or if the corruption had an underlying reason and the same thing could happen again with the new partition. I did not mention this ealier, but somtimes, when I am working on my system, I would get the message

Code:
hda: status timeout: status = 0xd0 {busy}
hda: no DRQ after issuing write
ide0: reset: succes
done
Could this be that underlying reason? I have yet to search google for these error messages. The hd utility from seagate didn't report any other bad sector then where I had expected it. My biggest fear is that this is mobo related.
I you have any other suggestions, or any one else who is reading this, any help would be appreciated.
 
Old 12-29-2003, 10:13 AM   #12
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,334

Rep: Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547
"hda: status timeout: status = 0xd0 {busy}
hda: no DRQ after issuing write
ide0: reset: succes
done"

This error message means that the kernel has successfully recovered from an I/O error while trying to write on ide0 (It does not mean that the write was successful. It just means that the kernel got ide0 back to a usable state again). This could be a hard drive problem or it could be a problem with the IDE chipset. I used to get a lot of this type of error on a faulty addon Maxtor IDE controller card. This could be a problem with your motherboard IDE chipset but the simplest explanation is that these errors were caused by the bad spot that you found on the hard drive, not that a faulty chipset caused a bad spot on the hard drive.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites
 
Old 12-29-2003, 11:24 AM   #13
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Original Poster
Rep: Reputation: 45
Yea but the thing is, i'm still getting those errors, even though I partitioned out the bad sector.
Googeling on the above messages, I find that they can be caused by two things:
1) If the harddrive and cdrom are attached on the same ide channel, which is indeed the case with me, or
2) A dying harddrive

see this e-mail I found on an mailing list

Quote:
[ Humbug *General* list - semi-serious discussions about Humbug and ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]



On Fri, 10 May 2002, James McPherson - Pacrim CPR Engineer wrote:

> > hda: status timeout: status=0xd0 { Busy }
> > hda: no DRQ after issuing WRITE
> > ide0: reset: success
>
> Shaun, it's not a distribution-specific set of messages. The message indicates
> that your /dev/hda returned a "busy" status when the kernel was expecting a DRQ
> response instead, just after it had issued a write. The ide0 channel then reset
> in order to clear the status on /dev/hda.
> I'd recommend you backup your data asap and investigate getting hold of a newer
> disk to use.

What james is trying to say (and is if you know what he is on about) is
that the kernel is telling you that your hard drive is about to die.
It's not a pretty sight when you start seeing those messages, you can
get a little extra life from the drive by finding the dead sectors an
partitioning off about 5MB either side - I don't recomend it but you can
do it. Basically you need a new drive
That does not look good, does it! I will wait a while, see what happens, but i'm thinking of beying a new harddrive.

Last edited by qanopus; 12-29-2003 at 11:25 AM.
 
Old 12-29-2003, 03:36 PM   #14
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,334

Rep: Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547Reputation: 547
"Yea but the thing is, i'm still getting those errors, even though I partitioned out the bad sector.
Googeling on the above messages, I find that they can be caused by two things:
1) If the harddrive and cdrom are attached on the same ide channel, which is indeed the case with me, or
2) A dying harddrive"

You can get the error message for a wide variety of hardware errors. Those two problems are not the only problems that can cause the error message. In my case the problem was different than the above two.

"1) If the harddrive and cdrom are attached on the same ide channel, which is indeed the case with me, or"

You can attach the hard drive and cdrom to the same cable. What does not work is when the cdrom requires a 40 wire cable and the hard drive requires a 80 wire cable and you put them on the same cable. This is possibly the root cause of your disk errors problem.

"2) A dying harddrive"

You found a bad spot on the drive. Maybe there are other bad spots on the hard drive that you have not found yet. You can use the Seagate utilities that you downloaded to low level format your hard drive. At the end of the hard drive are some extra blocks. The Seagate utility will reassign bad blocks to the spare blocks. This will work OK as long as you do not have more bad spots than spare blocks. It also thoroughly erases your hard drive so you have to backup before you low level format and then partition, format, and restore after.

So if you have a bad drive then maybe a low level format will fix it. Or maybe the drive will continue to die even after a low level format.

Or you can say that it is cheaper to just buy a new hard drive than to spend two days working on fixing the old one. I once did a low level format that took three hours. Is the old drive still under warrenty?

If I were in your situation I would become a daily backup fanatic.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites
 
Old 12-31-2003, 04:36 AM   #15
qanopus
Senior Member
 
Registered: Jul 2002
Location: New York
Distribution: Slackware
Posts: 1,358

Original Poster
Rep: Reputation: 45
One more post and I will be out of your hair. I figured out how I could of saved my LFS installation. This is what I should of done:

First, make an image by skipping some kB's of the beginning of the partition, but skip as little as possible, so

Code:
 
dd if=/dev/hda4 of=damaged_image.img bs=1024 count =4194304 skip=4
This way you will create an image of up to 4GB of the drive, but you'l skip the first 4kb. Now an fsck on this image will fail, and this is why. fsck will try to find an backup superblock at a predefined position in the image. But because we have skipped 4kb, it will not find the backup superblock and bail out. The solution to this is to append 4kb of data (what ever data, zero's, junk) to the image file and then fsck it. This is how you do that:

Code:
dd if=/dev/zero of=zeros bs=1024 count=4
now we have an file which is exactally 4kb. We append it to out image by doing

Code:
 cat zeros damaged_image.img > correct_image.img
And to make an vaild image out of that, we fsck it.

Code:
 fsck.ext3 correct_image.img
Now "correct_image.img" sould be mountable by the loop back device.
This all it to late for me now. I wish I had this inspiration earlier. I know this will work because I have tried it with an othe partition.

Last edited by qanopus; 12-31-2003 at 06:59 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Grub SuSe, XP, Redhat Error 17: Cannot mount selected partiton hrg Linux - General 5 08-19-2006 07:47 AM
Grub SuSe, XP, Redhat Error 17: Cannot mount selected partiton hrg Linux - Newbie 3 10-02-2004 08:38 AM
Grub SuSe, XP, Redhat Error 17: Cannot mount selected partiton hrg Linux - Laptop and Netbook 1 07-16-2004 04:34 PM
Grub SuSe, XP, Redhat Error 17: Cannot mount selected partiton hrg Linux - General 1 07-16-2004 09:14 AM
Grub SuSe, XP, Redhat Error 17: Cannot mount selected partiton hrg Linux - Software 1 07-16-2004 09:06 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration