Slackware - Installation This forum is for the discussion of installation issues with Slackware. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
12-10-2022, 09:08 AM
|
#1
|
Member
Registered: Jun 2008
Posts: 399
Rep:
|
Recurring Lilo error 99 99 99 99 99 on software raid
This is something which I've encountered again and again over the years, and never got to the bottom of it. When rebooting the machine, out of the blue, Lilo decides the boot sector is corrupt and it outputs half of screen of 99 99 99 99 99 and refuses to boot. Booting off a usb flash disk and re-running Lilo fixes it.
1. Pretty much all machines involved are headless servers
2. Hardware is mainly desktop motherboards of various makes, models, ages anywhere between 12 years old to current. Some with Intel, some with AMD cpu's.
3. Storage is all SATA hard-disks - I don't have any servers with SSD's and Lilo.
4. It's been cropping up pretty much on all Slackware versions since at least 13 to current.
5. The only thing in common I can think of is that they all run software RAID in mirroring mode, with version 0.90 superblock format, so that Lilo can boot directly without an initrd.
6. Lilo is installed on the MBR / at the beginning of the disk, not at the beginning at the partition.
Otherwise everything else seems absolutely random. I can have a server which has been up and running for months, no software or hardware upgrade of any kind, and I try to reboot it and all of a sudden decides it has a corrupted boot sector. I then boot off a usb flash disk, re-run Lilo, and everything is fine. Maybe for a few months, or years, and then randomly it might do it again. Today I had it happen on a freshly installed machine with Slackware 15.1, on first reboot. I re-run Lilo, and everything was fine.
The problem is that these are headless servers, on remote sites - so every time I reboot one remotely, it's a lottery if it will come back or not. I have resorted to running lilo again in the rc.local_shutdown script, so that it re-writes the boot sector immediately before shutdown - which seems to help. The issue does crop up relatively rarely - maybe once every few months, for 1 machine out of 20 - but when it happens it can be a real problem if I am not physically close to the server - as there is no way to fix it without driving to the site.
Whenever the motherboard supports it I use eLilo and UEFI - and never had an issue there. But I still have a lot of older hardware, which I don't want to throw away as it is doing a perfectly fine job.
Can it be that the boot sectors of the hard-disk literally start to demagnetise and loose the data? Maybe there is some piece of software which corrupts them or erases them? But if that is the case, why so randomly and for no reason? It seems strange that re-running Lilo fixes things - sometimes for a few more years, and I don't see any more issues on the same machine.
Anybody any ideas on this one please?
Last edited by xj25vm; 12-10-2022 at 12:05 PM.
|
|
|
12-10-2022, 02:19 PM
|
#2
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,279
|
Quote:
Originally Posted by xj25vm
Lilo is installed on the MBR / at the beginning of the disk, not at the beginning at the partition.
|
That was the way PCs ran in the early days. The bios just dumped you on Sector 0, and that got run. You had many imaginative boot viruses that ran on Dos. Lilo at least asked you what to run next. To solve your issue, get rid of lilo and install grub. As a side note, take better care of your disks! A favourite failure mode back in the dauy was sector 0 going bad...
Bios have mushroomed in size, initially to provide more sophisticated boot management. I take it these servers are not new, and if they are servers I'm wondering why you're rebooting them. UEFI obviously isn't in the picture if lilo works for you. Don't use elilo, as that has a failure mode which writes '99's as well.
Last edited by business_kid; 12-10-2022 at 02:21 PM.
|
|
|
12-10-2022, 06:02 PM
|
#3
|
Member
Registered: Aug 2021
Location: Seattle, WA
Distribution: Slackware
Posts: 320
|
Quote:
Originally Posted by xj25vm
...
Anybody any ideas on this one please?
|
From the LILO error codes page:
Quote:
0x09: DMA attempt across 64k boundary
This shouldn't happen, but may inicate a disk geometry mis-match. Try omitting the COMPACT option. You may need to specify the disk geometry yourself.
|
I experienced this recently with a circa 2007 Dell machine when the motherboard battery was failing. What would happen was that, after a power-down, the BIOS setting for SATA would revert from RAID-mode back to ATA (or vice versa, I can't remember now.) Hence, a "disk geometry mis-match."
Probably not the case here, as re-running lilo probably wouldn't fix it--even temporarily, but thought I'd mention it, just in case.
EDIT:
Here's the link for the LILO error codes:
http://wiki.wlug.org.nz/LiloErrorCodes
Last edited by JayByrd; 12-10-2022 at 06:05 PM.
Reason: typo. Add link.
|
|
|
12-10-2022, 10:09 PM
|
#4
|
Member
Registered: Jun 2008
Posts: 399
Original Poster
Rep:
|
Quote:
Originally Posted by JayByrd
From the LILO error codes page:
I experienced this recently with a circa 2007 Dell machine when the motherboard battery was failing. What would happen was that, after a power-down, the BIOS setting for SATA would revert from RAID-mode back to ATA (or vice versa, I can't remember now.) Hence, a "disk geometry mis-match."
Probably not the case here, as re-running lilo probably wouldn't fix it--even temporarily, but thought I'd mention it, just in case.
EDIT:
Here's the link for the LILO error codes:
http://wiki.wlug.org.nz/LiloErrorCodes
|
Thank you for the reply and the suggestions. When I get the error, the screen is about half full of:
Code:
99 99 99 99 99 99 99
I wonder if this is not error 0x99 - some forum posts seem to suggest that might be the case. From the Lilo link you suggested above:
Quote:
0x99: Invalid Second Stage
Mismatch between drive and BIOS geometry, or a bad map file. Some evidence that LINEAR needs to be set on the disk (see LiloNotes)
|
I am still looking into what the above really means in practical terms and what can I do about it.
Last edited by xj25vm; 12-10-2022 at 10:10 PM.
|
|
1 members found this post helpful.
|
12-10-2022, 10:16 PM
|
#5
|
Member
Registered: Jun 2008
Posts: 399
Original Poster
Rep:
|
Quote:
Originally Posted by business_kid
That was the way PCs ran in the early days. The bios just dumped you on Sector 0, and that got run. You had many imaginative boot viruses that ran on Dos. Lilo at least asked you what to run next. To solve your issue, get rid of lilo and install grub. As a side note, take better care of your disks! A favourite failure mode back in the dauy was sector 0 going bad...
Bios have mushroomed in size, initially to provide more sophisticated boot management. I take it these servers are not new, and if they are servers I'm wondering why you're rebooting them. UEFI obviously isn't in the picture if lilo works for you. Don't use elilo, as that has a failure mode which writes '99's as well.
|
Thank you for the info and the suggestions. Is there anything I can actually do about sector 0 going bad? Can I move that data somewhere else? I'm also not sure why would sector 0 going bad - after all, I asume it only gets read once when the disk boots up, and almost never written to again after the MBR is written. I wonder why would it go bad before other sectors which are used more often?
Regarding your second question, there are occasions when a server does need rebooting. For example a power supply might need changing, every few years the battery in the UPS needs replacing, sometimes a network card might need replacing or upgrading. I'm not sure how any of the above can be performed without actually shutting down the server?
EDIT: Also, on occasion there are power cuts. The ups notifies the server, which after a period of time, if the power hasn't returned, will initiate a shutdown. Again, another reason a server might need to be shutdown sometimes.
Last edited by xj25vm; 12-10-2022 at 10:26 PM.
|
|
|
12-11-2022, 05:11 AM
|
#6
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,279
|
Back in the days when disks went 'clunk-click,' sector 0 was the first read sector and got a rougher life. Floppies also. Moving things away from sector 0 as with grub, was an improvement. But partition data also occurred early, and that is referenced often. I think the fact that you posted how to improve your MBR corruption speaks for itself. I told you, and you're disputing it 
|
|
|
12-11-2022, 12:00 PM
|
#7
|
Member
Registered: Jun 2008
Posts: 399
Original Poster
Rep:
|
Quote:
Originally Posted by business_kid
Back in the days when disks went 'clunk-click,' sector 0 was the first read sector and got a rougher life. Floppies also. Moving things away from sector 0 as with grub, was an improvement. But partition data also occurred early, and that is referenced often. I think the fact that you posted how to improve your MBR corruption speaks for itself. I told you, and you're disputing it 
|
I am grateful for any info or hints. I was merely trying to understand the mechanics of it - hence the further questions. Thank you again for the suggestions. I didn't know that Grub doesn't use sector 0, while Lilo does.
|
|
|
All times are GMT -5. The time now is 06:51 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|