Server Crashing.. Help? (Guru needed??) Part 2
I'm hoping that someone will be able to help me with this. If you *really* need the full skinny, I had another topic that was named the same thing as this one, just without the 'part 2'.
The short story, is that I built a nice little Linux server using Mandrake 10 and a P3-500 and the longest it's stayed up is about 3 days (It is not duel boot). Various possibilities were mentioned about why that might be, and someone told me that the noapic option should be used. The server is still crashing, but I found the following info in the /var/messages when I rebooted. Feb 21 19:16:29 localhost kernel: ACPI disabled because your bios is from 1999 and too old Feb 21 19:16:29 localhost kernel: You can enable it with acpi=force Feb 21 19:16:29 localhost kernel: Built 1 zonelists {{Feb 21 19:16:29 localhost kernel: Kernel command line: auto BOOT_IMAGE=linux ro root=301 devfs=mount acpi=ht resume=/dev/hda5 noapic}} Feb 21 19:16:29 localhost kernel: Local APIC disabled by BIOS -- reenabling. Feb 21 19:16:29 localhost kernel: Found and enabled local APIC! Feb 21 19:16:29 localhost kernel: Initializing CPU#0 As you can see, somehow, even though the lilo.conf file tells it not to, something is re-enabling apic. At least, that's how I've read this log. Since getting the noapic option to work is an important trouble shooting step, I'd love to know how to keep it turned off. You'll note the line above surrounded with the {{}} is proof that I used the noapic in Lilo and that the lilo.conf file was recompiled. Below, is the lilo.conf file: boot=/dev/hda1 map=/boot/map default="linux" keytable=/boot/us.klt nowarn message=/boot/message menu-scheme=wb:bw:wb:bw disk=/dev/hdd bios=0x82 image=/boot/vmlinuz label="linux" root=/dev/hda1 initrd=/boot/initrd.img append="devfs=mount acpi=ht resume=/dev/hda5 noapic" read-only image=/boot/vmlinuz-2.6.3-7mdk label="263-7" root=/dev/hda1 initrd=/boot/initrd-2.6.3-7mdk.img append="devfs=mount acpi=ht resume=/dev/hda5 noapic" read-only image=/boot/vmlinuz label="failsafe" root=/dev/hda1 initrd=/boot/initrd.img append="failsafe acpi=ht resume=/dev/hda5 devfs=nomount noapic" read-only If anyone has any suggestions as to how I can make this server stop crashing, I'm all ears :) I'm not a newbe, but I'm no where near guru stature either. If I can't get this to work, then I'll give another distro a try in hopes that it's not my hardware that's crashing out. Thanks for listening, and I appeciate you're help in advance! |
Just an update -- the computer died about ten minutes after I posted this. Of course, I wasn't on it at the time, but a script that I have running updates a file every ten minutes.... this is very frustrating.
|
Try changing the line:
append="devfs=mount acpi=ht resume=/dev/hda5 noapic" to append="devfs=mount acpi=ht resume=/dev/hda5 noapic nolapic" If you noticed I added the nolapic option. If I don't add this option to my system I get the same symptoms your having |
Also try noacpi and noapm as well.
Set your bios to PNP OS = NO Good luck. |
And perhaps replace "acpi=ht" with "acpi=off"
However I thought the acpi/apic problem was mostly found in new PC's not an old P3. Check your memory with memtest (http://www.memtest86.com/) and check /var/log/messages for anything running prior to the crash. - Peder |
Yup, but as he posted the kernel seems to attempt to enable acpi & apic even on older machines, which could itself cause problems.
|
Thanks for all the tips folks! I tried the additional parameter of nolapic and that kept the machine from re-initializing the APIC. I rebooted and we're off to the races. I don't like to change too many things at once, so I'll have to wait and see if it crashes again. I'll also give the memory test a try since, while I don't *think* it's the memory, I could be completely wrong. Besides, the random crashing could easily be caused by the memory.
As far as messages and syslog, there's nothing of use (with reguard to the crashing) in those logs. I wish there were. I've kept all of them, and eventually maybe a pattern will immerge, if this last change didn't fix it. Thanks again for all the suggestions. I'll be usin' 'em if the machine keeps crashing! |
As stated before. Mine would crash randomly but I was able to force it to crash/lockup by coping a large file(>75M) across my network. I was guaranteed a crash or lockup every time.
|
Well, these crashes occur while the machine is basicly sitting idle. It's currently been up over a day, but I've gone as long as 3 with no crashes. Hopefully, the last change wil have fixed it.
|
Well, hopefully, this will be the last chapter in this particular story :)
The problem was definately hardware related. I was running a memory checker that booted off a CD (DOS based, I think -- nothing to do with Linux) and the machine went down. I *think* that the cause was the old UPS I was using. Apparently it was power cycling because the batteries were too old. It actaully did it while I was up there. Since I've been off that UPS (I plugged into a newer one that I have) the server's been up for almost 2 days. That's not the record, but I really hope that this was the problem. And if it turns out that it was, well... I still leared a lot due to the help that people gave me. Thanks! |
All times are GMT -5. The time now is 09:09 PM. |