LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware > Slackware - Installation
User Name
Password
Slackware - Installation This forum is for the discussion of installation issues with Slackware.

Notices


Reply
  Search this Thread
Old 12-10-2022, 09:08 AM   #1
xj25vm
Member
 
Registered: Jun 2008
Posts: 399

Rep: Reputation: 70
Recurring Lilo error 99 99 99 99 99 on software raid


This is something which I've encountered again and again over the years, and never got to the bottom of it. When rebooting the machine, out of the blue, Lilo decides the boot sector is corrupt and it outputs half of screen of 99 99 99 99 99 and refuses to boot. Booting off a usb flash disk and re-running Lilo fixes it.

1. Pretty much all machines involved are headless servers
2. Hardware is mainly desktop motherboards of various makes, models, ages anywhere between 12 years old to current. Some with Intel, some with AMD cpu's.
3. Storage is all SATA hard-disks - I don't have any servers with SSD's and Lilo.
4. It's been cropping up pretty much on all Slackware versions since at least 13 to current.
5. The only thing in common I can think of is that they all run software RAID in mirroring mode, with version 0.90 superblock format, so that Lilo can boot directly without an initrd.
6. Lilo is installed on the MBR / at the beginning of the disk, not at the beginning at the partition.

Otherwise everything else seems absolutely random. I can have a server which has been up and running for months, no software or hardware upgrade of any kind, and I try to reboot it and all of a sudden decides it has a corrupted boot sector. I then boot off a usb flash disk, re-run Lilo, and everything is fine. Maybe for a few months, or years, and then randomly it might do it again. Today I had it happen on a freshly installed machine with Slackware 15.1, on first reboot. I re-run Lilo, and everything was fine.

The problem is that these are headless servers, on remote sites - so every time I reboot one remotely, it's a lottery if it will come back or not. I have resorted to running lilo again in the rc.local_shutdown script, so that it re-writes the boot sector immediately before shutdown - which seems to help. The issue does crop up relatively rarely - maybe once every few months, for 1 machine out of 20 - but when it happens it can be a real problem if I am not physically close to the server - as there is no way to fix it without driving to the site.

Whenever the motherboard supports it I use eLilo and UEFI - and never had an issue there. But I still have a lot of older hardware, which I don't want to throw away as it is doing a perfectly fine job.

Can it be that the boot sectors of the hard-disk literally start to demagnetise and loose the data? Maybe there is some piece of software which corrupts them or erases them? But if that is the case, why so randomly and for no reason? It seems strange that re-running Lilo fixes things - sometimes for a few more years, and I don't see any more issues on the same machine.

Anybody any ideas on this one please?

Last edited by xj25vm; 12-10-2022 at 12:05 PM.
 
Old 12-10-2022, 02:19 PM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,279

Rep: Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553
Quote:
Originally Posted by xj25vm
Lilo is installed on the MBR / at the beginning of the disk, not at the beginning at the partition.
That was the way PCs ran in the early days. The bios just dumped you on Sector 0, and that got run. You had many imaginative boot viruses that ran on Dos. Lilo at least asked you what to run next. To solve your issue, get rid of lilo and install grub. As a side note, take better care of your disks! A favourite failure mode back in the dauy was sector 0 going bad...

Bios have mushroomed in size, initially to provide more sophisticated boot management. I take it these servers are not new, and if they are servers I'm wondering why you're rebooting them. UEFI obviously isn't in the picture if lilo works for you. Don't use elilo, as that has a failure mode which writes '99's as well.

Last edited by business_kid; 12-10-2022 at 02:21 PM.
 
Old 12-10-2022, 06:02 PM   #3
JayByrd
Member
 
Registered: Aug 2021
Location: Seattle, WA
Distribution: Slackware
Posts: 320

Rep: Reputation: 325Reputation: 325Reputation: 325Reputation: 325
Quote:
Originally Posted by xj25vm View Post
...
Anybody any ideas on this one please?
From the LILO error codes page:
Quote:
0x09: DMA attempt across 64k boundary
This shouldn't happen, but may inicate a disk geometry mis-match. Try omitting the COMPACT option. You may need to specify the disk geometry yourself.
I experienced this recently with a circa 2007 Dell machine when the motherboard battery was failing. What would happen was that, after a power-down, the BIOS setting for SATA would revert from RAID-mode back to ATA (or vice versa, I can't remember now.) Hence, a "disk geometry mis-match."

Probably not the case here, as re-running lilo probably wouldn't fix it--even temporarily, but thought I'd mention it, just in case.

EDIT:
Here's the link for the LILO error codes:
http://wiki.wlug.org.nz/LiloErrorCodes

Last edited by JayByrd; 12-10-2022 at 06:05 PM. Reason: typo. Add link.
 
Old 12-10-2022, 10:09 PM   #4
xj25vm
Member
 
Registered: Jun 2008
Posts: 399

Original Poster
Rep: Reputation: 70
Quote:
Originally Posted by JayByrd View Post
From the LILO error codes page:

I experienced this recently with a circa 2007 Dell machine when the motherboard battery was failing. What would happen was that, after a power-down, the BIOS setting for SATA would revert from RAID-mode back to ATA (or vice versa, I can't remember now.) Hence, a "disk geometry mis-match."

Probably not the case here, as re-running lilo probably wouldn't fix it--even temporarily, but thought I'd mention it, just in case.

EDIT:
Here's the link for the LILO error codes:
http://wiki.wlug.org.nz/LiloErrorCodes
Thank you for the reply and the suggestions. When I get the error, the screen is about half full of:

Code:
99 99 99 99 99 99 99
I wonder if this is not error 0x99 - some forum posts seem to suggest that might be the case. From the Lilo link you suggested above:

Quote:
0x99: Invalid Second Stage
Mismatch between drive and BIOS geometry, or a bad map file. Some evidence that LINEAR needs to be set on the disk (see LiloNotes)
I am still looking into what the above really means in practical terms and what can I do about it.

Last edited by xj25vm; 12-10-2022 at 10:10 PM.
 
1 members found this post helpful.
Old 12-10-2022, 10:16 PM   #5
xj25vm
Member
 
Registered: Jun 2008
Posts: 399

Original Poster
Rep: Reputation: 70
Quote:
Originally Posted by business_kid View Post
That was the way PCs ran in the early days. The bios just dumped you on Sector 0, and that got run. You had many imaginative boot viruses that ran on Dos. Lilo at least asked you what to run next. To solve your issue, get rid of lilo and install grub. As a side note, take better care of your disks! A favourite failure mode back in the dauy was sector 0 going bad...

Bios have mushroomed in size, initially to provide more sophisticated boot management. I take it these servers are not new, and if they are servers I'm wondering why you're rebooting them. UEFI obviously isn't in the picture if lilo works for you. Don't use elilo, as that has a failure mode which writes '99's as well.
Thank you for the info and the suggestions. Is there anything I can actually do about sector 0 going bad? Can I move that data somewhere else? I'm also not sure why would sector 0 going bad - after all, I asume it only gets read once when the disk boots up, and almost never written to again after the MBR is written. I wonder why would it go bad before other sectors which are used more often?

Regarding your second question, there are occasions when a server does need rebooting. For example a power supply might need changing, every few years the battery in the UPS needs replacing, sometimes a network card might need replacing or upgrading. I'm not sure how any of the above can be performed without actually shutting down the server?

EDIT: Also, on occasion there are power cuts. The ups notifies the server, which after a period of time, if the power hasn't returned, will initiate a shutdown. Again, another reason a server might need to be shutdown sometimes.

Last edited by xj25vm; 12-10-2022 at 10:26 PM.
 
Old 12-11-2022, 05:11 AM   #6
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,279

Rep: Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553
Back in the days when disks went 'clunk-click,' sector 0 was the first read sector and got a rougher life. Floppies also. Moving things away from sector 0 as with grub, was an improvement. But partition data also occurred early, and that is referenced often. I think the fact that you posted how to improve your MBR corruption speaks for itself. I told you, and you're disputing it
 
Old 12-11-2022, 12:00 PM   #7
xj25vm
Member
 
Registered: Jun 2008
Posts: 399

Original Poster
Rep: Reputation: 70
Quote:
Originally Posted by business_kid View Post
Back in the days when disks went 'clunk-click,' sector 0 was the first read sector and got a rougher life. Floppies also. Moving things away from sector 0 as with grub, was an improvement. But partition data also occurred early, and that is referenced often. I think the fact that you posted how to improve your MBR corruption speaks for itself. I told you, and you're disputing it
I am grateful for any info or hints. I was merely trying to understand the mechanics of it - hence the further questions. Thank you again for the suggestions. I didn't know that Grub doesn't use sector 0, while Lilo does.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Recurring Xilinx issue i.e. error loading shared libraries (libncurses.so.5) Sandylex Linux - General 1 07-19-2022 12:41 PM
[SOLVED] Recurring input/output error on my network HD LightSeeker Linux - Hardware 18 10-16-2013 06:10 AM
recurring ext3 error jkobrien Linux - General 13 04-26-2004 12:57 PM
Odd recurring message in /var/log/messages tarballedtux Linux - General 4 05-21-2003 09:28 PM
Recurring inetd error message in /var/log/messages jkcunningham Linux - Networking 6 08-27-2002 09:00 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware > Slackware - Installation

All times are GMT -5. The time now is 06:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration