Crash on boot; bad hard-disk?

datawebcorp · 01-25-2002, 09:01 PM

Red Hat Server 7.1. Worked great until I tried booting it this morning. These are the last four lines on the screen before the machine finally stops booting. Of course, like an idiot, I don't have a rescue or boot disk, but I have the original CD's.

Any help would be appreciated.

NET4: Unix domain sockets 1.0/SMP for Linux NET4.0
EXT2-fs: unable to read superblock
isofs_read_super: bread failed, dev=03:06, iso_blknum=16, block=32
kernal panic: VFS: Unable to mount root fs on 03:06

DavidPhillips · 01-25-2002, 11:40 PM

boot the cd and try linux rescue

unSpawn · 01-26-2002, 03:53 AM

...and if it's the superblock turned bad, check your disk with "e2fsck -b X", where X is an alternative superblock location, for regular 1K blocksize systems they start at 1, then add 8192 for the next one, "man e2fsck" for more.

datawebcorp · 01-26-2002, 10:48 AM

Thanks for the suggestions.

After booting to the CD and type Linux Rescue, the system just says "Running Anaconda. Please wait..." and then doesn't do anything. I waited like 10 minutes.

Also, I am not able to run e2fsck. The system doesn't boot, I have no boot disk and the CD doesn't seem to have such a file.

Any other suggestions? And can anyone confirm, is this a HW problem?

DavidPhillips · 01-26-2002, 07:46 PM

try this boot disk

http://www.toms.net/rb/

kervin · 01-31-2002, 02:14 PM

If you suspect it is a hardware problem, *don't* fsck!! It will make it worst.

Try a utility that will copy your harddrive to a to a different one. such as dd_rescue http://www.garloff.de/kurt/linux/ddrescue/

After you copy the drive, run fsck on the new drive.

The problem is that fsck will get confused when it encounters the bad blocks, and may do something stupid.

If it's a hardware problem, fsck won't be able to fix it and most probably will just make the situation worst.

Unfortunately, I had to learn that the hard way

karmicro · 02-10-2002, 07:56 AM

actually this is pretty interesting to us.

we have had several systems that are not booted very often, since the control systems run 24/7 indefinitely.

since version 6.2 we have had a problem with bad superblocks several times with western digital AND seagate 40 gb drives, and with asus and other mother boards...

yes indeed, a little scary.

We have been able to painfully recover by mounting the primary partion, using dumpe2fs to find the backup superblock, and rebuilt the file system...

we not keep a copy of the output of fdisk and the dumpe2fs so that we can rebuild the drives.

the drives checkout fine, and the same drive can be reused with no problems.

we no longer think that this is a hardware issue, but have been unable to find a good answer. Perhaps there are others with systems that have this, but as most people reboot the systems more often, everyone thinks they are having a unique or hardware issue.

we are trying to force reboots just to see if this stops if we reboot one a week/month..

also, in at least one instance, the drive had "falsely failed to read" prior to system restart (which is the only reason it was rebooted).

too many times to be coincidental we think.. and we are not newbies.

micro

karmicro · 02-10-2002, 07:59 AM

ah. forgot.

we have had it happen on 6.2, 7.1, 7.2.

i believe we also had one on a 5.2 machine, and one on a 5.0, but those may have truly been bad drives since they wouldn't come back to life.

DavidPhillips · 02-10-2002, 10:06 PM

I had one Hard drive go out running RH 7.2.

The system was up but it was making noise that sounded like the heads banging on something.

I decided to reboot the system. The hard drive making noise was dead. I did not think to check it before rebooting.

It had been unattended for about a month, I guess the system was only running on the other hard drive.

I have several other systems up for months and no other types of hard drive issues have been seen.

karmicro · 02-11-2002, 01:33 AM

Thanks.

It seems like most of the failures are more than 4 months.

and as i said, we've now had it on wd AND seagate.
at first we thought it was a wd thing.

but we didn't really hear any noise at crash. Just sort of stopped reading soundlessly.

sounds like yours probably didn't come back to life.

out of curiousity was it a wd 40 gb?

theneoprotocol · 02-11-2002, 12:22 PM

Just the other day I had a HD failure. I was dual booting between win ME and Mandrake 8.1 My hardware list is WD 30GB HD, Athalon 1.33GHz, 256MB DDR RAM, ASUS mother board and a ASUS geforce2 DDR vid card. I got Mandrake running real sweet but ME couldnt surfice so i wanted to put winXP on it. Going through the install for XP i deleted the winME partition it was going fine till a quick message about the MBR (which had lilo). It just froze. I re-booted and the whole drive could not be accessed anymore even booting off a floppy or CD-ROM it could not probe. Luckly I had a spare driver and got everything running again. I like to image

There's my 1 cent, me no conlusion but really feel it has to do with WD

karmicro · 02-11-2002, 01:14 PM

of course one would like to ask why one would load XP in the first case. Bad enough win98.. worse Me (buggy buggy buggy).. but signing away to let bill gates have access to your computer anytime that you are online and he feels like it does NOT give me that "my data is secure" feeling.

It is not like the operating system WORKS or anything.....

We've loaded new and then rolled back to 98 Vers B.

theneoprotocol · 02-11-2002, 02:13 PM

XP was choosen because 2K does not like my Hardware setup. to many downloads just to keep it stable. And I must admit that I still get software from various companys for microsoft products. I test them I give them my opionon and run. Linux is my horse I now allocate it more drive space and play within my domain

I am a good honest boy now. except for that 4GB drive space I have

....

DavidPhillips · 02-11-2002, 11:34 PM

My failure was on a 8.4 GB drive after running for about 8 months, the mirror drive was identical and it's still going.

Something came loose in the drive I guess. The last time I saw it running it was not making noise.

it's only been up about 3 months since the failure, I replaced the drive with a 10.3 GB

one thing about the new kernels is they use more swap file, so maybe they access the drive more than the older ones.

I always try to have three times the ram I need, lets say you have 128 MB ram. it's essentially mirrored in swap and then another additional 128 MB is used for cache.

This is of course only if you are physically using the whole 128

512 seems to be a good number to have

karmicro · 02-12-2002, 07:31 AM

thanks david.

yes. we also use larger swaps.

The thing that really concern us is when the drive works fine after the failure. In other words, we have a failure where it cannot read the superblock at all. When we do a recovery the drive is fine again for another long long time...

This is distressing because there is nothing we can really point to as you could, to say "ah.. something was wrong with the drive". This would be understandable. Even so, we swapped to Seagate drives just to check the WD's, and had the exact same thing happen... so seems like not a drive related issue.

unfortunately it is so few and far between that it is really hard to diagnose or duplicate, so we mirror, backup etc..

We also recognize that most people don't have their computers running 24/7 and actually controlling processes. When there is a failure, we know it, and we will restart the processes daily but really never shut them down except for preventative maintenence.

Perhaps these section of the drive just never has the head pass over it, and so the magnetic polarization fades..just a thought, and we plan on trying to have the drive read the superblock daily on cron just to see if this will flag failure time in advance.

mike