LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Debian
User Name
Password
Debian This forum is for the discussion of Debian Linux.

Notices



Reply
 
Search this Thread
Old 01-14-2007, 07:21 PM   #1
TwoEven
LQ Newbie
 
Registered: Jan 2007
Distribution: Debian
Posts: 8

Rep: Reputation: 0
Question lilo still boots old kernel after upgrade


I have an SBC that I'm developing on. The first version of my kernel was built from a modified version of the Voyage Linux source and copied from a host machine. The machine works beautifully with the current kernel, but I cannot upgrade the kernel. The first kernel was named 2.6.15-max. /lib/modules accordingly has a folder named 2.6.15-max. My /etc/lilo.conf is:

Code:
boot = /dev/hda
install=text
map=/boot/map
vga=normal
delay=1
serial=0,115200n8
default=Linux
image=/vmlinuz
root=/dev/hda1
label=Linux
append="console=tty0 console=ttyS0,115200n8"
read-only
Anyway, on the same host machine I compiled a new kernel with the name 2.6.15-max2. I packaged it as a .deb using make-kpkg. Using dpkg I removed the old kernel from the SBC and installed the new one, letting the installer use my old lilo.conf. The symlink /vmlinuz points to /boot/vmlinuz-2.6.15-max2 and the /lib/modules/2.6.15-max2 folder is made. lilo runs without any warnings or errors. All copies of the old kernel configs are removed. /boot/map has a current timestamp.

When the machine is rebooted, dmesg and uname -r both show that 2.6.15-max (not max2!) is booting! How can this be? I know that it is not simply a naming error in the build, because I do not see the changes that should appear in the new kernel config. The bootup gives no errors, whatsoever.

Thanks,
TwoEven
 
Old 01-14-2007, 08:10 PM   #2
FnordPerfect
Member
 
Registered: Dec 2006
Location: Germany
Distribution: Kubuntu (Feisty Fawn), Debian (SID)
Posts: 127

Rep: Reputation: 15
> lilo runs without any warnings or errors.
...
> How can this be?

good question..

So, if I understand correctly, after installing the new kernel lilo has been run successfully to update the MBR? Have you tried to re-run lilo from the shell, just to be sure..?

If yes, then I'm outa here.
 
Old 01-14-2007, 09:28 PM   #3
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Quote:
Originally Posted by FnordPerfect
So, if I understand correctly, after installing the new kernel lilo has been run successfully to update the MBR? Have you tried to re-run lilo from the shell, just to be sure..?

If yes, then I'm outa here.
Maybe the /etc/lilo.conf file was not updated. If that file was not changed during the upgrade, and if the kernel has a fancy name like vmlinuz-2.4.13 then lilo would still find that kernel file and load it, even though there may be a vmlinuz-2.6.19 sitting in the /boot directory.
 
Old 01-14-2007, 09:34 PM   #4
FnordPerfect
Member
 
Registered: Dec 2006
Location: Germany
Distribution: Kubuntu (Feisty Fawn), Debian (SID)
Posts: 127

Rep: Reputation: 15
Quote:
Originally Posted by stress_junkie
Maybe the /etc/lilo.conf file was not updated. If that file was not changed during the upgrade, and if the kernel has a fancy name like vmlinuz-2.4.13 then lilo would still find that kernel file and load it, even though there may be a vmlinuz-2.6.19 sitting in the /boot directory.
Well, in his lilo.conf the image to be booted is given as /vmlinuz, which is as he said a symlink to the correct kernel image in /boot
Thus, when lilo is run, it should resolve the symlink to /boot/vmlinuz-2.6.15-max2 and use that (to alter the MBR to include the correct kernel location etc.)

Last edited by FnordPerfect; 01-14-2007 at 09:35 PM.
 
Old 01-15-2007, 12:35 AM   #5
TwoEven
LQ Newbie
 
Registered: Jan 2007
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: 0
it gets weirder

Thanks for your speedy replies.

FnordPerfect: I have run lilo in so many ways that I have lost count. In addition to (and sometimes instead of) the automatic lilo in the dpkg install, I have run lilo multiple ways. I have ran it with the symlinked vmlinuz, and with the full path to the new kernel image. I have run it with lilo -C /etc/lilo.conf -v -v -v. I don't have the output at the moment, but I confirmed that it resolved the link to the correct kernel and that it was writing something to the MBR.

The disk is relatively empty (80% free). Is it possible that lilo is not updating the map correctly (even though the map has a current timestamp) and is still executing the old kernel from somewhere in lost diskspace? How else can it be executing a kernel that I have deleted? How?!

Also, in case anyone suspects the name conflicts as the problem, I have tried upgrading it to an older kernel build with a completely different appended name to no avail as well.

Unfortunately, I have a few of these boxes working well.. in multiple days of driving away. I need this to be remotely upgradable. A fix that requires loading this in another host would cost me too much in time and travel.

I've lost a lot of time googling this. Lilo has always served me well. If anything, its always been overly powerful. At the moment, it appears to have no effect.
 
Old 01-15-2007, 08:03 AM   #6
saikee
Senior Member
 
Registered: Sep 2005
Location: Newcastle upon Tyne UK
Distribution: Any free distro.
Posts: 3,398
Blog Entries: 1

Rep: Reputation: 112Reputation: 112
From my knowledge of Lilo it must be compiled every time whenever a change takes place in its configuration /etc/lilo.conf.

Lilo also has a small foorprint, much smaller than Grub, and so it is quite conceiveable that the compiled Lilo can sit entirely inside the the boot sector of the partition.

Someone has pointed out to me that a complete removal of Linux can still allow Lilo to operate to boot the other alternative like Windows. This is a feat that Grub cannot achieve.

Thus I believe the original poster may not have compiled Lilo succesfully and has it replaced with the new kernel. Lilo should report error if the new lilo.conf doesn't work and would continue to use the old setting, which may be the current situation.

Now lets talk about the cure

Lilo can be replaced in the MBR by command
Code:
lilo -b /dev/hda
and return no fatal error message but with something like

"Linux added"
or "Windows added"

If that is reported then Lilo would have been replaced in device hda.

Another possibility is Lilo can be inside the root partition of the Linux as well as in the MBR to carry out the booting. To restore Lilo inside a partition say hda1 would be

Code:
lilo -b /dev/hda1
The difference between the two is the former boots the Linux directly whereas in the latter the same Linux is booted indirectly by a boot loader in the MBR first and then it passes the control to the Lilo inside hda1.

I include the above just to show Lilo could be compiled to the wrong location without the user aware of the consequence.

Please report the way Lilo has been compiled to and error message if any.

I would also suggest to alter /etc/lilo.conf to a way it can be recognised like altering the statement from
Code:
label=Linux
to
Code:
label=New_Linux
to test the new setting has indeed been implemented
 
Old 01-15-2007, 11:27 AM   #7
TwoEven
LQ Newbie
 
Registered: Jan 2007
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: 0
I'm very frustrated with this. I know it sounds like I must not be running lilo. I assure you. I am running lilo, specifying the config file on the command line.

Code:
TestRack:~# cat /etc/lilo.conf
boot = /dev/hda
install=text
map=/boot/map
vga=normal
delay=1
serial=0,115200n8
default=Linux

image=/vmlinuz
root=/dev/hda1
label=Linux
append="console=tty0 console=ttyS0,115200n8"
read-only
TestRack:~# lilo -b /dev/hda -C /etc/lilo.conf -v -v
LILO version 22.6.1, Copyright (C) 1992-1998 Werner Almesberger
Development beyond version 21 Copyright (C) 1999-2004 John Coffman
Released 17-Nov-2004, and compiled at 12:32:32 on May 25 2005
Debian GNU/Linux

Ignoring entry 'boot'
Warning: LBA32 addressing assumed
raid_setup returns offset = 00000000  ndisk = 0
 BIOS   VolumeID   Device
Reading boot sector from /dev/hda
pf_hard_disk_scan: ndevs=1
  0300  0000F0F0  /dev/hda
device codes (user assigned pf) = 0
device codes (user assigned) = 0
device codes (BIOS assigned) = 1
device codes (canonical) = 1
mode = 0x80,  columns = 15,  rows = 1,  page = 34
Using TEXT secondary loader
Calling map_insert_data
Secondary loader: 14 sectors (0x2C00 dataend).
bios_boot = 0x80  bios_map = 0x80  map==boot = 0  map S/N: 0000F0F0
Warning: no PROMPT with SERIAL; setting DELAY to 20 (2 seconds)
BIOS data check was okay on the last boot

Boot image: /vmlinuz -> boot/vmlinuz-2.6.15-max2
Setup length is 10 sectors.
Mapped 2356 sectors.
Added Linux *

 BIOS   VolumeID   Device
  80    0000F0F0    0300
Writing boot sector.
/boot/boot.0300 exists - no boot sector backup copy made.
Map file size: 23552 bytes.
RAID device mask 0x0000
TestRack:~# reboot
After reboot:
Code:
TestRack:~# uname -r
2.6.15-max
TestRack:~# dmesg|head -n 1
Linux version 2.6.15-max (Version: 0.2) (root@debonair) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 PREEMPT Sat Jul 8 07:49:49 CDT 2006
Saikee: I could try changing the labels and watch the bootup from serial this evening when I'm in front of one. I only have ssh access at the moment. I think I tested it remotely by making the delay huge and noting the boot time difference, but that's less than a perfect test.
 
Old 01-15-2007, 12:17 PM   #8
saikee
Senior Member
 
Registered: Sep 2005
Location: Newcastle upon Tyne UK
Distribution: Any free distro.
Posts: 3,398
Blog Entries: 1

Rep: Reputation: 112Reputation: 112
Your Lilo has been succesfully compiled and the kernel confirmed to be the old one.

Now the lilo.conf lines
Code:
image=/vmlinuz
root=/dev/hda1
tells me Lilo has been asked to "source" the kernel vmlinuz from the directory of /dev/hda1 which is effectively in the "/".

Whatever vmlinuz you have left inside / Lilo will pick it up for compilation.

Can you use a different name like "kernelMax2" in the above line and try to copy your mewer kernel into it like?
Code:
cp  whatever_the_new_kernel  /kernelMax2
.

Make sure the above two lines are altered in lilo.conf, at least temporarily to
Code:
image=/kernelMax2
root=/dev/hda1
 
Old 01-15-2007, 06:54 PM   #9
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Check your motherboard CMOS settings. Make sure that the motherboard is not set to protect the MBR of the first hard disk. This is sometimes referred to as virus protection or boot sector protection. That would definitely prevent the lilo code from being replaced. Once you disable MBR protection on your motherboard then boot Linux and run lilo.

Nevertheless, I think we established that your lilo.conf file tells lilo to boot /boot/vmlinuz, which is a link to the proper kernel. That being the case I don't think that updating the lilo software in the MBR would change the apparent fact that the new kernel is not booting even though the /boot/vmlinuz link points to the correct kernel.

Have I got the facts correct or have I misunderstood something?

Last edited by stress_junkie; 01-15-2007 at 06:57 PM.
 
Old 01-15-2007, 09:01 PM   #10
TwoEven
LQ Newbie
 
Registered: Jan 2007
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: 0
that sounds possible

stress_junkie: Boot sector protection is the first idea that sounds plausible. I'll check for that tomorrow. This is a commercial SBC for dedicated appliances: it may well have boot sector protection. But wouldn't lilo throw an error? Perhaps the BIOS just drops all write requests without returning an error? Lilo definitely thinks it's writing to the MBR (confirmed by running lilo with full verbosity). Could this be tested by writing zeros to the first 512 bytes of the disk using dd and seeing if lilo still boots?

One other idea. I'm going use dd to copy the MBR to a file, run lilo on my new kernel, and then diff a dd image of the new MBR with the archived image.

saikee: I will try putting the kernel in root, too; though I'm doubtful, as I don't see how that could explain lilo still booting a kernel that has been deleted.

Thanks so far!
 
Old 01-15-2007, 09:31 PM   #11
TwoEven
LQ Newbie
 
Registered: Jan 2007
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: 0
simle test

Food for thought: I just issued lilo twice with different delay values and according to this, the MBR is being updated. As a sanity check, I wrote the same block out twice at the end to make sure it did not differ.

Code:
TestRack23:/boot# lilo -d 20
Added Linux *
TestRack23:/boot# dd if=/dev/hda of=boot.test20 bs=512 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.006134 seconds (83470 bytes/sec)
TestRack23:/boot# lilo -d 30
Added Linux *
TestRack23:/boot# dd if=/dev/hda of=boot.test30 bs=512 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.002242 seconds (228371 bytes/sec)
TestRack23:/boot# diff boot.test20 boot.test30
Binary files boot.test20 and boot.test30 differ
TestRack23:/boot# dd if=/dev/hda of=boot.test30-2 bs=512 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.002315 seconds (221160 bytes/sec)
TestRack23:/boot# diff boot.test30 boot.test30-2
TestRack23:/boot#
 
Old 01-15-2007, 10:28 PM   #12
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
You could switch to GRUB to see what happens.
 
Old 01-16-2007, 10:00 AM   #13
xaos5
Member
 
Registered: Dec 2004
Distribution: debian and slackware
Posts: 217

Rep: Reputation: 31
remove the symbolic link /vmlinuz and make entries in lilo's config to your kernels. There is no need to have a symbolic link in / pointing to a kernel in /boot. I'm guessing it was intended on keeping lilo configuration simple by not having to touch it from a package managers perspective. Correct me if any of this seems wrong. Now /usr/src/linux symbolic link makes a little more sense as some packages check this directory to check for kernel sources (I'm pretty sure gentoo packages do this).
 
Old 01-16-2007, 04:01 PM   #14
TwoEven
LQ Newbie
 
Registered: Jan 2007
Distribution: Debian
Posts: 8

Original Poster
Rep: Reputation: 0
saikee, xaos5: I copied the kernel to /vmlinuz-2.6.15-max2 and updated lilo.conf, ran lilo -v -v -v. As usual, lilo ran happily without any warnings. And as usual, it still booted the old kernel. By the way, I have tried other kernels just to see if it was an error in my kernel compile.

To continue my previous test: I used dd to copy the first 512 bytes of /dev/hda the MBR when the kernel was located in /boot. I then updated lilo.conf to point to the copy of the image in /, ran lilo, and ran dd again. The MBR differed. Of course, the bootloader still used an outdated map file, pointing the a deleted kernel. But changing just the kernel path updated the MBR: My only conclusion is that what Linux (and hence lilo and dd) see as the start of the drive is not really the start. The "MBR" that I amd writing and reading from lies somewhere after the first 512 bytes of the drive, allowing the old MBR to continue booting. Is this even possible? Could this SBC achieve boot sector protection by artificially shifting the drives geometry off? (By hiding the first sector completely and labeling the true second as the first, etc?)

Now an unfortunate possibility arises in my head: These drives are being copied using dd from a binary file on a host, and the very first was on a. Is it possible that a mistake was made early on that tweaked the partition table? In the frenzy of the early copying could Linux be fooled about the true start of the drive?

In either case: yuck. I still need to check on options in the BIOS for boot sector protection. Have you ever heard of a mistaken case of the first sector?

Thanks for helping me think through this.
 
Old 01-17-2007, 04:11 AM   #15
IgnitusBoyone
LQ Newbie
 
Registered: Jan 2007
Location: Memphis, TN
Distribution: Gentoo 11
Posts: 13

Rep: Reputation: 0
Hard drive types and assumptions can be the issue.

TwoEven,

Does this SBC have the ability for video output? I would like to know if the lilo splash screen is reflecting any updates. I notice that you did not try to have kernel boot options. I figure this is a sign that you can not hook it up a standard screen for display, but if you can. I would set up two kernel options the original and the second and just make sure that the change is reflected.

Most Lilo issues are due to miss matched hard drive geometries. I am not sure what would of caused it or how to fix it at the moment, but its the best place to start. Given the issue seems to be that lilo says its writing but when you restart there has been not changes the logical conclusion is it is not writing to the place that is getting boot strapped.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
System Down: Broken LILO after failed kernel upgrade. Please help! Yalla-One Linux - Software 1 03-30-2005 04:53 PM
suse pro 9.1: no longer boots after kernel upgrade (X locks up) abelikoff Suse/Novell 18 03-02-2005 11:22 PM
Lilo won't boot XP after kernel upgrade jafriede Slackware 6 09-22-2004 02:35 PM
lilo and changing where it boots from true_atlantis Linux - General 10 12-01-2003 05:56 PM
RH9 Upgrade from 7.3. LILO stops at LI - boots from boot disk ok zevious Linux - Software 4 04-17-2003 08:07 PM


All times are GMT -5. The time now is 05:23 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration