DebianThis forum is for the discussion of Debian Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have an SBC that I'm developing on. The first version of my kernel was built from a modified version of the Voyage Linux source and copied from a host machine. The machine works beautifully with the current kernel, but I cannot upgrade the kernel. The first kernel was named 2.6.15-max. /lib/modules accordingly has a folder named 2.6.15-max. My /etc/lilo.conf is:
Anyway, on the same host machine I compiled a new kernel with the name 2.6.15-max2. I packaged it as a .deb using make-kpkg. Using dpkg I removed the old kernel from the SBC and installed the new one, letting the installer use my old lilo.conf. The symlink /vmlinuz points to /boot/vmlinuz-2.6.15-max2 and the /lib/modules/2.6.15-max2 folder is made. lilo runs without any warnings or errors. All copies of the old kernel configs are removed. /boot/map has a current timestamp.
When the machine is rebooted, dmesg and uname -r both show that 2.6.15-max (not max2!) is booting! How can this be? I know that it is not simply a naming error in the build, because I do not see the changes that should appear in the new kernel config. The bootup gives no errors, whatsoever.
> lilo runs without any warnings or errors.
...
> How can this be?
good question..
So, if I understand correctly, after installing the new kernel lilo has been run successfully to update the MBR? Have you tried to re-run lilo from the shell, just to be sure..?
So, if I understand correctly, after installing the new kernel lilo has been run successfully to update the MBR? Have you tried to re-run lilo from the shell, just to be sure..?
If yes, then I'm outa here.
Maybe the /etc/lilo.conf file was not updated. If that file was not changed during the upgrade, and if the kernel has a fancy name like vmlinuz-2.4.13 then lilo would still find that kernel file and load it, even though there may be a vmlinuz-2.6.19 sitting in the /boot directory.
Maybe the /etc/lilo.conf file was not updated. If that file was not changed during the upgrade, and if the kernel has a fancy name like vmlinuz-2.4.13 then lilo would still find that kernel file and load it, even though there may be a vmlinuz-2.6.19 sitting in the /boot directory.
Well, in his lilo.conf the image to be booted is given as /vmlinuz, which is as he said a symlink to the correct kernel image in /boot
Thus, when lilo is run, it should resolve the symlink to /boot/vmlinuz-2.6.15-max2 and use that (to alter the MBR to include the correct kernel location etc.)
Last edited by FnordPerfect; 01-14-2007 at 08:35 PM.
FnordPerfect: I have run lilo in so many ways that I have lost count. In addition to (and sometimes instead of) the automatic lilo in the dpkg install, I have run lilo multiple ways. I have ran it with the symlinked vmlinuz, and with the full path to the new kernel image. I have run it with lilo -C /etc/lilo.conf -v -v -v. I don't have the output at the moment, but I confirmed that it resolved the link to the correct kernel and that it was writing something to the MBR.
The disk is relatively empty (80% free). Is it possible that lilo is not updating the map correctly (even though the map has a current timestamp) and is still executing the old kernel from somewhere in lost diskspace? How else can it be executing a kernel that I have deleted? How?!
Also, in case anyone suspects the name conflicts as the problem, I have tried upgrading it to an older kernel build with a completely different appended name to no avail as well.
Unfortunately, I have a few of these boxes working well.. in multiple days of driving away. I need this to be remotely upgradable. A fix that requires loading this in another host would cost me too much in time and travel.
I've lost a lot of time googling this. Lilo has always served me well. If anything, its always been overly powerful. At the moment, it appears to have no effect.
From my knowledge of Lilo it must be compiled every time whenever a change takes place in its configuration /etc/lilo.conf.
Lilo also has a small foorprint, much smaller than Grub, and so it is quite conceiveable that the compiled Lilo can sit entirely inside the the boot sector of the partition.
Someone has pointed out to me that a complete removal of Linux can still allow Lilo to operate to boot the other alternative like Windows. This is a feat that Grub cannot achieve.
Thus I believe the original poster may not have compiled Lilo succesfully and has it replaced with the new kernel. Lilo should report error if the new lilo.conf doesn't work and would continue to use the old setting, which may be the current situation.
Now lets talk about the cure
Lilo can be replaced in the MBR by command
Code:
lilo -b /dev/hda
and return no fatal error message but with something like
"Linux added"
or "Windows added"
If that is reported then Lilo would have been replaced in device hda.
Another possibility is Lilo can be inside the root partition of the Linux as well as in the MBR to carry out the booting. To restore Lilo inside a partition say hda1 would be
Code:
lilo -b /dev/hda1
The difference between the two is the former boots the Linux directly whereas in the latter the same Linux is booted indirectly by a boot loader in the MBR first and then it passes the control to the Lilo inside hda1.
I include the above just to show Lilo could be compiled to the wrong location without the user aware of the consequence.
Please report the way Lilo has been compiled to and error message if any.
I would also suggest to alter /etc/lilo.conf to a way it can be recognised like altering the statement from
Code:
label=Linux
to
Code:
label=New_Linux
to test the new setting has indeed been implemented
I'm very frustrated with this. I know it sounds like I must not be running lilo. I assure you. I am running lilo, specifying the config file on the command line.
Code:
TestRack:~# cat /etc/lilo.conf
boot = /dev/hda
install=text
map=/boot/map
vga=normal
delay=1
serial=0,115200n8
default=Linux
image=/vmlinuz
root=/dev/hda1
label=Linux
append="console=tty0 console=ttyS0,115200n8"
read-only
TestRack:~# lilo -b /dev/hda -C /etc/lilo.conf -v -v
LILO version 22.6.1, Copyright (C) 1992-1998 Werner Almesberger
Development beyond version 21 Copyright (C) 1999-2004 John Coffman
Released 17-Nov-2004, and compiled at 12:32:32 on May 25 2005
Debian GNU/Linux
Ignoring entry 'boot'
Warning: LBA32 addressing assumed
raid_setup returns offset = 00000000 ndisk = 0
BIOS VolumeID Device
Reading boot sector from /dev/hda
pf_hard_disk_scan: ndevs=1
0300 0000F0F0 /dev/hda
device codes (user assigned pf) = 0
device codes (user assigned) = 0
device codes (BIOS assigned) = 1
device codes (canonical) = 1
mode = 0x80, columns = 15, rows = 1, page = 34
Using TEXT secondary loader
Calling map_insert_data
Secondary loader: 14 sectors (0x2C00 dataend).
bios_boot = 0x80 bios_map = 0x80 map==boot = 0 map S/N: 0000F0F0
Warning: no PROMPT with SERIAL; setting DELAY to 20 (2 seconds)
BIOS data check was okay on the last boot
Boot image: /vmlinuz -> boot/vmlinuz-2.6.15-max2
Setup length is 10 sectors.
Mapped 2356 sectors.
Added Linux *
BIOS VolumeID Device
80 0000F0F0 0300
Writing boot sector.
/boot/boot.0300 exists - no boot sector backup copy made.
Map file size: 23552 bytes.
RAID device mask 0x0000
TestRack:~# reboot
After reboot:
Code:
TestRack:~# uname -r
2.6.15-max
TestRack:~# dmesg|head -n 1
Linux version 2.6.15-max (Version: 0.2) (root@debonair) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 PREEMPT Sat Jul 8 07:49:49 CDT 2006
Saikee: I could try changing the labels and watch the bootup from serial this evening when I'm in front of one. I only have ssh access at the moment. I think I tested it remotely by making the delay huge and noting the boot time difference, but that's less than a perfect test.
Check your motherboard CMOS settings. Make sure that the motherboard is not set to protect the MBR of the first hard disk. This is sometimes referred to as virus protection or boot sector protection. That would definitely prevent the lilo code from being replaced. Once you disable MBR protection on your motherboard then boot Linux and run lilo.
Nevertheless, I think we established that your lilo.conf file tells lilo to boot /boot/vmlinuz, which is a link to the proper kernel. That being the case I don't think that updating the lilo software in the MBR would change the apparent fact that the new kernel is not booting even though the /boot/vmlinuz link points to the correct kernel.
Have I got the facts correct or have I misunderstood something?
Last edited by stress_junkie; 01-15-2007 at 05:57 PM.
stress_junkie: Boot sector protection is the first idea that sounds plausible. I'll check for that tomorrow. This is a commercial SBC for dedicated appliances: it may well have boot sector protection. But wouldn't lilo throw an error? Perhaps the BIOS just drops all write requests without returning an error? Lilo definitely thinks it's writing to the MBR (confirmed by running lilo with full verbosity). Could this be tested by writing zeros to the first 512 bytes of the disk using dd and seeing if lilo still boots?
One other idea. I'm going use dd to copy the MBR to a file, run lilo on my new kernel, and then diff a dd image of the new MBR with the archived image.
saikee: I will try putting the kernel in root, too; though I'm doubtful, as I don't see how that could explain lilo still booting a kernel that has been deleted.
Food for thought: I just issued lilo twice with different delay values and according to this, the MBR is being updated. As a sanity check, I wrote the same block out twice at the end to make sure it did not differ.
Code:
TestRack23:/boot# lilo -d 20
Added Linux *
TestRack23:/boot# dd if=/dev/hda of=boot.test20 bs=512 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.006134 seconds (83470 bytes/sec)
TestRack23:/boot# lilo -d 30
Added Linux *
TestRack23:/boot# dd if=/dev/hda of=boot.test30 bs=512 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.002242 seconds (228371 bytes/sec)
TestRack23:/boot# diff boot.test20 boot.test30
Binary files boot.test20 and boot.test30 differ
TestRack23:/boot# dd if=/dev/hda of=boot.test30-2 bs=512 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.002315 seconds (221160 bytes/sec)
TestRack23:/boot# diff boot.test30 boot.test30-2
TestRack23:/boot#
remove the symbolic link /vmlinuz and make entries in lilo's config to your kernels. There is no need to have a symbolic link in / pointing to a kernel in /boot. I'm guessing it was intended on keeping lilo configuration simple by not having to touch it from a package managers perspective. Correct me if any of this seems wrong. Now /usr/src/linux symbolic link makes a little more sense as some packages check this directory to check for kernel sources (I'm pretty sure gentoo packages do this).
saikee, xaos5: I copied the kernel to /vmlinuz-2.6.15-max2 and updated lilo.conf, ran lilo -v -v -v. As usual, lilo ran happily without any warnings. And as usual, it still booted the old kernel. By the way, I have tried other kernels just to see if it was an error in my kernel compile.
To continue my previous test: I used dd to copy the first 512 bytes of /dev/hda the MBR when the kernel was located in /boot. I then updated lilo.conf to point to the copy of the image in /, ran lilo, and ran dd again. The MBR differed. Of course, the bootloader still used an outdated map file, pointing the a deleted kernel. But changing just the kernel path updated the MBR: My only conclusion is that what Linux (and hence lilo and dd) see as the start of the drive is not really the start. The "MBR" that I amd writing and reading from lies somewhere after the first 512 bytes of the drive, allowing the old MBR to continue booting. Is this even possible? Could this SBC achieve boot sector protection by artificially shifting the drives geometry off? (By hiding the first sector completely and labeling the true second as the first, etc?)
Now an unfortunate possibility arises in my head: These drives are being copied using dd from a binary file on a host, and the very first was on a. Is it possible that a mistake was made early on that tweaked the partition table? In the frenzy of the early copying could Linux be fooled about the true start of the drive?
In either case: yuck. I still need to check on options in the BIOS for boot sector protection. Have you ever heard of a mistaken case of the first sector?
Hard drive types and assumptions can be the issue.
TwoEven,
Does this SBC have the ability for video output? I would like to know if the lilo splash screen is reflecting any updates. I notice that you did not try to have kernel boot options. I figure this is a sign that you can not hook it up a standard screen for display, but if you can. I would set up two kernel options the original and the second and just make sure that the change is reflected.
Most Lilo issues are due to miss matched hard drive geometries. I am not sure what would of caused it or how to fix it at the moment, but its the best place to start. Given the issue seems to be that lilo says its writing but when you restart there has been not changes the logical conclusion is it is not writing to the place that is getting boot strapped.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.