LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Mandriva (https://www.linuxquestions.org/questions/mandriva-30/)
-   -   Server Crashing.. Help? (Guru needed??) Part 2 (https://www.linuxquestions.org/questions/mandriva-30/server-crashing-help-guru-needed-part-2-a-293186/)

ckr 02-21-2005 06:50 PM

Server Crashing.. Help? (Guru needed??) Part 2
 
I'm hoping that someone will be able to help me with this. If you *really* need the full skinny, I had another topic that was named the same thing as this one, just without the 'part 2'.

The short story, is that I built a nice little Linux server using Mandrake 10 and a P3-500 and the longest it's stayed up is about 3 days (It is not duel boot). Various possibilities were mentioned about why that might be, and someone told me that the noapic option should be used. The server is still crashing, but I found the following info in the /var/messages when I rebooted.

Feb 21 19:16:29 localhost kernel: ACPI disabled because your bios is from 1999 and too old
Feb 21 19:16:29 localhost kernel: You can enable it with acpi=force
Feb 21 19:16:29 localhost kernel: Built 1 zonelists
{{Feb 21 19:16:29 localhost kernel: Kernel command line: auto BOOT_IMAGE=linux ro root=301 devfs=mount acpi=ht resume=/dev/hda5 noapic}}
Feb 21 19:16:29 localhost kernel: Local APIC disabled by BIOS -- reenabling.
Feb 21 19:16:29 localhost kernel: Found and enabled local APIC!
Feb 21 19:16:29 localhost kernel: Initializing CPU#0

As you can see, somehow, even though the lilo.conf file tells it not to, something is re-enabling apic. At least, that's how I've read this log. Since getting the noapic option to work is an important trouble shooting step, I'd love to know how to keep it turned off. You'll note the line above surrounded with the {{}} is proof that I used the noapic in Lilo and that the lilo.conf file was recompiled.

Below, is the lilo.conf file:
boot=/dev/hda1
map=/boot/map
default="linux"
keytable=/boot/us.klt
nowarn
message=/boot/message
menu-scheme=wb:bw:wb:bw
disk=/dev/hdd bios=0x82
image=/boot/vmlinuz
label="linux"
root=/dev/hda1
initrd=/boot/initrd.img
append="devfs=mount acpi=ht resume=/dev/hda5 noapic"
read-only
image=/boot/vmlinuz-2.6.3-7mdk
label="263-7"
root=/dev/hda1
initrd=/boot/initrd-2.6.3-7mdk.img
append="devfs=mount acpi=ht resume=/dev/hda5 noapic"
read-only
image=/boot/vmlinuz
label="failsafe"
root=/dev/hda1
initrd=/boot/initrd.img
append="failsafe acpi=ht resume=/dev/hda5 devfs=nomount noapic"
read-only

If anyone has any suggestions as to how I can make this server stop crashing, I'm all ears :) I'm not a newbe, but I'm no where near guru stature either. If I can't get this to work, then I'll give another distro a try in hopes that it's not my hardware that's crashing out.


Thanks for listening, and I appeciate you're help in advance!

ckr 02-21-2005 08:19 PM

Just an update -- the computer died about ten minutes after I posted this. Of course, I wasn't on it at the time, but a script that I have running updates a file every ten minutes.... this is very frustrating.

courtrrb 02-22-2005 11:23 AM

Try changing the line:
append="devfs=mount acpi=ht resume=/dev/hda5 noapic"
to
append="devfs=mount acpi=ht resume=/dev/hda5 noapic nolapic"

If you noticed I added the nolapic option. If I don't add this option to my system I get the same symptoms your having

opjose 02-22-2005 10:25 PM

Also try noacpi and noapm as well.

Set your bios to PNP OS = NO

Good luck.

bunnadik 02-23-2005 02:03 AM

And perhaps replace "acpi=ht" with "acpi=off"

However I thought the acpi/apic problem was mostly found in new PC's not an old P3.
Check your memory with memtest (http://www.memtest86.com/) and check /var/log/messages for anything
running prior to the crash.

- Peder

opjose 02-23-2005 02:11 AM

Yup, but as he posted the kernel seems to attempt to enable acpi & apic even on older machines, which could itself cause problems.

ckr 02-23-2005 06:26 AM

Thanks for all the tips folks! I tried the additional parameter of nolapic and that kept the machine from re-initializing the APIC. I rebooted and we're off to the races. I don't like to change too many things at once, so I'll have to wait and see if it crashes again. I'll also give the memory test a try since, while I don't *think* it's the memory, I could be completely wrong. Besides, the random crashing could easily be caused by the memory.

As far as messages and syslog, there's nothing of use (with reguard to the crashing) in those logs. I wish there were. I've kept all of them, and eventually maybe a pattern will immerge, if this last change didn't fix it.

Thanks again for all the suggestions. I'll be usin' 'em if the machine keeps crashing!

courtrrb 02-23-2005 04:32 PM

As stated before. Mine would crash randomly but I was able to force it to crash/lockup by coping a large file(>75M) across my network. I was guaranteed a crash or lockup every time.

ckr 02-23-2005 05:11 PM

Well, these crashes occur while the machine is basicly sitting idle. It's currently been up over a day, but I've gone as long as 3 with no crashes. Hopefully, the last change wil have fixed it.

ckr 02-27-2005 12:15 PM

Well, hopefully, this will be the last chapter in this particular story :)

The problem was definately hardware related. I was running a memory checker that booted off a CD (DOS based, I think -- nothing to do with Linux) and the machine went down. I *think* that the cause was the old UPS I was using. Apparently it was power cycling because the batteries were too old. It actaully did it while I was up there. Since I've been off that UPS (I plugged into a newer one that I have) the server's been up for almost 2 days. That's not the record, but I really hope that this was the problem.

And if it turns out that it was, well... I still leared a lot due to the help that people gave me.

Thanks!


All times are GMT -5. The time now is 09:09 PM.