RH 7: scsi crc error on boot. need to rescue!
Hi guys... hope I can get some help with this...
I have an ancient (well, 2001) linux box here, running RH 7, that is as close to a production machine as it gets w/out actually being one (it runs a database and SAP on top of that, and reinstalling would take far too many man-days to be an option right now).
After it was once accidentally shut down while running (by using the off switch - grrr!), it now declines to reboot: I get the following SCSI CRC error messages:
Creating Root Device
mounting root filesystem
(scsi:0:0:0:0) CRC error during data-in phase
(scsi:0:0:0:0) CRC error in intermediate CRC packet
scsi: aborting command due to timeout: pid 0, scsi 0, channel 0, id 0, lun 0 read (10) ......
etc etc... different scsi errors keep scrolling for minutes (slowly), but the system doesn't seem to get past them.
BTW - the machine is running 3 20GB IBM scsi drives on an adaptec controller, and has 1gb of pc133 SDRAM.
Since I'm a linux rookie I assumed this must be a hardware problem, but memtest86 showed no problems with the memory, and neither the scsi controller's own device check, nor my device check with IBM's harddisk check program showed any problems.
I was able to boot the machine into LINUX on the same cd I did the hardware checks with (UBCD, includes INSERT toolkit), and there I could mount and access the different SCSI hd's without trouble. Alas, the INSERT toolkit is not really well equipped for recovery...
By now I guessed something very small must be wrong with the filesystem (Ext3 on the boot partitions), and maybe there exists a program that 'fixes' ext3 filesystems... or maybe it's possible to do that from GRUB or LILO (I don't really know with which bootloader the machine was installed, but it appears to be the default RH7 boot loader because it has a RH image)...
Does anyone have even the slightest idea how I can go about this, preferably _without_ reinstalling linux and trying to figure out how to get SAP to work on a newer version of windows?
EDIT: is there something like a boot option that disables the CRC checks? that would help me get into linux, where I can do some further checks... maybe do a fsck or so... Or can I do a fsck from within a live CD? Which liveCD would you recommend for further troubleshooting?
Okay, small update:
e2fsck did not show errors on any of the scsi disks.
removing the scsi cable from the scsi controller and reseating it solved the crc problem.
_but_ now linux hangs after mounting the root filesystem with the following error:
Creating root device
Mounting root filesystem
kjounald starting. commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 220k freed
So... what is INIT supposed to do after mounting the root filesystem? One thing I can think of is initialising a serial mouse (?) -> there used to be one connected to this machine, but I left it in its place when I took this machine home for troubleshooting. Could this hang be caused by the system not finding the serial mouse? Mind you, this is RedHat 7 (so a quite old version of linux)...
Thanks for any replies...
nobody has an idea what might be going on here?
Go for the simplest alternatives first. Get a can of air duster and remove all the dust. I've toubleshooted Unix systems that hang then refuse to restart simply because they where overloaded with dust. After more than a year of continuous use a system could get so much dust that the circuits in the motherboard overheated. Electricity attracts dust and then creates a layer of insulation over the circuits, causing overheating, hence errors.
I recommend booting from a recent install CD, Fedora 5, for example. I say recent because it contains the latest 2.6 kernel and the latest version of ext3 filesystem. Boot in rescue mode, mount the critical partitions and save them to another system in case it gets any worse. You can do that with a combination of dd, gzip and ssh.
Another think you can do is clone the disk to another unused disk and try diagnosing with that one. Anyhow, do a good backup. This is your time to think of several alternatives.
When it hangs see if you can press Alt-F2, Alt-F3 and Alt-F4 - these terminal sessions output the logs as the kernel starts.
Try booting in single user mode; at the boot: prompt type 'linux single'. See if you can boot like this. That way you can narrow down where the real problem is.
Thanks for your input...
I checked the insides of the pc, and there's hardly any dust there (quite contrary to my own hardware, but that's another story ;) )
I tried to start single user, doesn't work, and alt-fX doesn't work either: the computer really hangs. Or not, 'cause ctrl-alt-del still restarts the pc.
Any other ideas?
Is there activity on the hard drive? Do you see the hard drive LED come on or flashing?
I'm thinking you may have a corrupt /boot filesystem or a corrupt initrd ramdisk file. Hopefully the rest may be OK. It says that it is mounting the root file system but probably it is not.
I would suggest to wait for a very long time, 1-2 hours and see if it comes up, especially if you see activity on the hard disks.
You may want to restore your /boot directory; you do have a backup right? If not then start it with a rescue CD, mount your critical data and backup ASAP.
Can you mount it with a rescue CD?
I do have a backup, but only for the database and the SAP folders...
I'll take a look at the activity leds and post what I find.
EDIT: yes, I can mount everything with a rescue CD (I even e2fsck'ed all partitions), and I don't see any problems when using a rescue CD... but I gues I can't just copy any rescue CD's /boot folders over to my own, right...
You'll need to mount and copy using the rescue CD. After you mount your root partition then you can make it the root partition for the rest of the session.
Get a CD that is compatible with your relase, RH7.
At the boot prompt type: 'linux rescue'
Mount the root partition but first make the folder:
mount /dev/hda1 /mnt/root
Mount the hard disk's root partition as the root.
Start your services manually.
If you need further detailed help let me know so we can talk. I'm cheap.
crc boot problem
hi to all, i too have the same problem while booting the fedora 7 linux.
If u have the dual boot (both the windows and linux),first try to boot with
windows and after booting windows,restart the system and now try booting
form linux.I hope it will boot without any problems now.
hey if u r using only the linux then u should restart the system at around
6 times sooo, then it is booting.
try this,it will work.
I think its a hardware problem.At first i use intel 845 motherboard,2.4 processor, with 256mb ram,80gb hard disk. i did'nt get any problem like this (crc error).
when i got new one with intel core-dual 2.6 processor,945 motherboard,1gb ram,80gb hard disk, i am getting this problem.
I think u people to using dual core board,right.
If any one knows how to correct this problem,please reply me.I hate to restart my system that many times.
|All times are GMT -5. The time now is 11:37 PM.|