-   Mandriva (
-   -   Suddenly kernels won't finish booting (

chort 10-16-2003 03:20 PM

Suddenly kernels won't finish booting
I have Mandrake 9.0. Recently my primary hard disk drive went bad (starting having read and write errors) so I bought an identical hard drve and used tar to copy over all the information from every partition to my new hard drive.

I changed /etc/fstab to mount everything on the new disk. The only operation happening on the old disk was booting (still booted from the MBR on that disk). Eventually I decided just to remove the old drive entirely and boot off the new disk. I ran lilo with the proper config file and specified the new hd as the boot device, then I set the BIOS to boot off the second disk. Everything seemed to go fine.

The problem I now have is that some of the kernels will boot all the way to "freek xxxK of memory" and some kernels will only get to "unknown bridge, assuming transparent". I thought it was a devfs issue at first, because the failsafe kernel has devfs=nomount and it was able to get nearly to the end of the boot sequence, but I tried setting devfs=nomount on the 2.4.19-16 kernel and rerunning lilo and that didn't change the behavior (would still stop at the very beginning after loading the kernel into memory). I did a little experimenting and notice that the nonfb option would also get nearly to the end of the boot process befor dying.

What in the world could be causing this problem? I ran fsck.ext3 -f on every partition and there weren't any problems. I can mount all the disks fine and view all the data from Knoppix, so it doesn't seem like a disk issue. I tried commenting out the initrd line and running lilo again (so it wouldn't use the ramdisk for booting) but that didn't make any difference either.

I'm so frustrated that there isn't any debugging information available. I'm about ready to just backup all my data and install FreeBSD instead because I never had any inexplicable problems like this with BSD.

chort 10-17-2003 04:22 AM

OK, so I think I've narrowed it down a bit more. The part think it's getting hung up on is starting init. Normally when I boot the Init 2.xx banner is the next thing that comes up after the messages about the bridging devices... Also, init does not start when I boot with failsafe or nonfb kernels.

I tried booting my normal kernel with the options of single and debug and it does exactly what the failsafe config does.

Is my init some how damaged? Is it possible to repair init?

Well the saga continues. I still cannot figure out what is causing this problem. The research I've done today seems to indicate that the part it's getting stuck on is mounting the root fs. I'm not sure how that can be the case, since Knoppix can mount the same partition just fine. :scratch:

I'm getting pretty desparate here. I'm thinking about reinstalling Mandrake from CDs and mounting / on a different partition, then copy my old /etc to the new root partition and try booting that way (using old /usr, /var, /home, etc but with new /). Of course I'll have to copy /bin, /sbin, and anything else that may have been changed by packages... hmm, I guess /lib too. Does this seem like a reasonable solution?

chort 10-17-2003 04:36 PM

So a little bit more information. Passing init=/bin/sh on the boot prompt has no effect (still halts at the same place) so I'm assuming that means the kernel is not even getting as far as evoking init. What else could happen in between loading the kernel into memory and starting init that would cause it to halt? If I don't specify emergency/single/failsafe at the boot prompt, it halts immediately after the "loading kernel ... unknown bridging resource ... " section. If I do pass single/emergency/failsafe it will get up to the section where it mounts devfs on /dev, then frees 136K of kernel memory, then... nothing.

Buehler, Buehler... anyone?

chort 10-19-2003 04:45 AM

Well in case anyone is interested, I finally gave up, bought a third hard drive, and installed OpenBSD. OBSD can read the data off all the Linux partitions just fine, so I copied over my important data and all the necessary configuration files and just started over.

Oh well, only one Linux box left on my network now and it's days are numbered. I'm having much better luck with OpenBSD and FreeBSD.

aus9 10-20-2003 11:49 AM


assuming you have still kept that nasty old drive, did you look at the jumpers?
2) did you try dd if=/dev/hda of=/dev/hdc? or whatever /etc/fstab thought it was?

just a thought as disk dump should also dump the mbr. Now assuming I am right, each drive has a mbr so the new drive will think its now /dev/hda so you pull out the new new drive and put the second good one into the primary jumper position

chort 10-26-2003 03:55 AM

aus9, thanks for the advice. I actually had the first and second HDD both setup as primaries, but on different IDE channels. As for the mbr, that's rewritten by LILO any way (at least, that's my understanding) and I tried both the Mandrake LILO and the Knoppix (Debian) LILO to install the boot image. Neither one had errors (well, after I removed the message since the Debian LILO couldn't handle the big /boot/message) but they both had the same results with booting.

I do not think it's a problem with the MBR since I can get all the way through the built-in kernel modules if I boot in single user mode. It's the spot right around where the root file system gets mounted that it dies. Since both Debian and OpenBSD can mount all the file systems, I find that rather curious.

All times are GMT -5. The time now is 06:21 AM.