Quote:
Originally Posted by zman2245
rylan, thanks. It is a brand new board from Intel I am using at work for developement, so it definitely shouldn't have parts going bad. But of course it's possible that a DIMM is corrupted.
|
You're right of course, but maybe you just have a new-build, but bad board? Have you tried exchanging under warranty? I've had this a few times - brand new HW that is bad right out of the box - and simply exchanged under warranty for other parts of the same model. I thought about this a bit and yeah, it could be a bad DIMM but I think it is a bit of a long shot to have a bad RAM chip completely corrupt the filing system and preventing a boot. So it might still be something else.
Quote:
I wonder, would crashing the kernel many times cause something like this? I am doing quite a bit of kernel dev...
|
Not sure... it might! Did you have a lot of kernel crashes / panics on an untweaked / virgin kernel on that system? This can also be an indicator of bad RAM or hardware. Modern Linux kernels are absurdly stable to the point of being completely ridiculous - especially a "stock" distro kernel (a kernel you compile and tweak yourself won't necessarily be so stable, of course). I've been playing with Linux on and off for about six years now, and I think so far I've seen three (count 'em - THREE) kernel panics, all down to bad hardware. But I doubt if most kernel panic modalities can corrupt the filing system. As I have it, that's the point of a kernel panic - it stops the kernel before it starts doing arbitrary stuff like executing data (instead of code) and / or wiping files and corrupting discs.
Quote:
Also, one other update to my original post - I was mistaken in saying the board boots fine. It actually does not boot after hitting this error. The boot gets stuck at the screen:
______________________________________________________________________
root (hd0, 0)
<other grub stuff>...
i8042.c: No controller found (not an issuefrom what I've read)
Thanks,
Zack
|
Does the above occur with a stock kernel, or one of the kernels you have worked on?
As far as I know RHEL is supposed to be for really stable hosting and server work... it might be better to play with kernels with something less stable / intended for serious work like Fedora (the "community" version of RHEL)?
I think at this point you can try a few more things:
1. Run memtest
Most distros' live DVDs or CDs has an option to run memtest86 on the system, without booting anything, direct off the disc. I discovered a bad DIMM once by leaving memtest86 running all night - only then did one of the chips turn up bad. Leaving it for an hour or so during the workday did not find the problem. Maybe you can try it overnight too?
2. Try and boot off a rescue DVD / CD
Since your system won't boot off the installed disc, how about this? Especially if you've done some kernel dev (and maybe your work has tainted the kernel so badly it can't run) this will point it out. If it boots off the rescue disc, you're the culprit!
3. Kernel parameter all-generic-ide
Long shot, I've had this parameter make a non-booting install suddenly start working off the fixed HDD drive which was on a SATA controller (not stock IDE).
4. Another distro with a newer kernel?
If none of the above work, try getting another distro completely with a possibly newer kernel version? If this works, it means that your hardware is fine.
5. Windows
Sacrilege! But if all else fails and you can run Windows on that system, try it... if it works, it obviously means Linux is the culprit. This will also establish if you are experiencing hardware of software troubles.