System freeze every few minutes
For the last week my system has been locking up constantly, and more and more frequently. I'm at my wit's end as to what to do about it. If anyone can shed any light on the problem...
I've been testing this evening (5 crashes in an hour ) and the crashes happen like so: The cursor stops blinking (or images stop moving if I'm watching a video), everything stops, I can move the mouse once - it jumps a few cm - and then everything freezes up. On one occasion, after five minutes it came back alive again, but other times I can leave it for over an hour and nothing moves. I don't think it's a heating problem - nothing is particularly hot in the system and I left the PC running all day with a liveCD and there was no crash (I'm currently logged in from a LiveCD after 6 crashes in a row, all Ok so far touch wood). I've also been running top constantly and processes haven't been going over 60% when crashes occur. |
Just a hunch. Try booting your current kernel with "noapic nolapic noacpi" boot args.
|
Quote:
Ms-BIOS bug 824 timer not connected to blahblahblah I tried adding noapic to menus.lst on the line with the kernel name, but the OS would not boot with this parameter. I think I might have put it in the wrong place. Where exactly should the three options above go in this file: Code:
# menu.lst - See: grub(8), info grub, update-grub(8) |
Quote:
Code:
kernel /vmlinuz-2.6.20-16-generic root=UUID=b9348ed0-9064-4863-8f17-b4a7da2ff060 ro quiet splash noapic nolapic acpi=off Maybe also check your memory - run memtest86 at your grub kernel selection screen. be warned: it takes ages! |
Try to ssh into the locked up linux box from another computer (maybe your laptop...)
You will find that in most circumstances the condition you describe isn't a crash, but something has deadlocked or else is in an infinite loop. You would encounter this problem in the event of a substantial misconfiguration of something, or if you have a corrupted filesystem, or if you have mismatched libraries. Or, sometimes, if you just are running a really badly behaved application. So the machine might not be crashed, and if you manage to ssh in from a remote location you can both find out where the problem is AND fix it. If there is a busy-wait (infinite loop) sucking up your processor, then you may have to wait for awhile for the ssh login to be processed, and responsiveness to your remote shell may suck. But if the system isn't genuinely locked up, this is the best way to go to sort out the problem. |
If this has been happening gradually and mem test passes, take the side panel off and take compressed air and blow out the ps and cpu cooling fins. In our part of the country you need to do it every 6 to 9 mo.
|
OK, I tried editing the line to
Code:
kernel /vmlinuz-2.6.20-16-generic root=UUID=b9348ed0-9064-4863-8f17-b4a7da2ff060 ro quiet splash noapic nolapic noacpi When I did manage to restart, I got stuck on the splash-screen again, hit Ctrl-Alt-F1 and found the boot screen hanging on a message telling me an image could not be found. I hit Ctrl-Alt-Del and the screen flickered on and off and came back to the boot screen with thousands of messages : hdb: drive not ready for command I powered down and rebooted to a LiveCD. Last time I got a lockup I let it sit for a while and was able to move the mouse about once every five minutes by a couple of pixels, so yes, I think it is some kind of deadlock. I have no other PCs nor access to any means of connecting remotely to this one, unfortunately. I think I'm going to have to just guess at what's most likely to be the problem and 1 - run memtest all night and if that gives no clues 2 - reformat the hard drive and if that doesn't work 3 - buy a new HD and if that doesn't work 4 - buy a new motherboard and if that doesn't work 5 - ??? |
Try with acpi=off noapic nolapic
|
OK, memtest ran for 10 hours with no errors.
Rebooted and got stuck on the first splashscreen (the PC manufacturer's name with options to edit BIOS settings) for about a minute, then on the Kubuntu splashscreen for another two minutes. Hit Ctrl-Alt-1: No resume image: doing normal boot. It hung here until I hit Altr-Alt-Del. Next I get: fdisk died with exit status 8 and land on the root@machinename prompt with no mounted drives. Ctrl-D and the OS finally boots. I try re-editing menus.lst with acpi=off noapic nolapic and reboot. Exactly the same procedure again. (at least it booted after I forced it) This kind of points to a hard drive problem, but would that explain the lengthy hang on the initial splashscreen? |
The plot thickens...
I shut down, 'flushed' the CMOS battery and restarted. Manufacturer's splashscreen came and went as normal, GRUB went OK, Kubuntu splasshcreen appeared and loaded to half-way, then bombed out to a black screen. New message: hdb1 contains a file system with errors, check forced. Nothing further happens until I hit Ctrl-Alt-Del and booting resumes, but fails on login with a small window marked Could not start kstartupconfig. check your installation. I try a few times to login, but nothing doing. Back on line via LiveCD... /dev/hdb1 is a HD with only data files on it - images, music etc. No reason why a corruption on it should stop the OS from loading, right? This new error about kstartupconfig - where's it coming from and what's my best next step? I guess I need to see about fixing the file system on hdb1 but I also need to resolve the impossibility of logging in... Also, I don't have an install CD of the latest version of Kubuntu - this was installed via update manager. The ISO of the latest CD is on the hard drive with the OS. If I could get in, I could burn it... Am I going round in circles or just circling the drain at this stage? |
If you can boot from a live CD, you should be able to run fsck on /dev/hdb1 (which needs to be unmounted while fsck checks it).
man fsck for all the details, options, and warnings. If /dev/hdb is not necessary for you to boot your system, does it boot any better if you physically remove it? |
Quote:
I'll try removing it next time I reboot (although I'll still have that strange kstartupconfig problem -some error in my user config I think). Meanwhile, fsck gave me the following result. Not very encouraging :( Code:
sudo fsck /dev/hdb1 Code:
sudo mount -t ext3 /dev/hdb1 /media/disk/ Code:
mount: wrong fs type, bad option, bad superblock on /dev/hdb1, dmesg| tail -20 gives me Code:
[4300998.123000] end_request: I/O error, dev hdb, sector 248 edit: I get exactly the same problem tryign to fsck the other IDE HD (/dev/hda*, my main HD is /dev/sda1). Nobody could be unlucky enough to lose two hard drives, surely? |
I agree - it doesn't look good.
Quote:
Can you try those HDDs in another box? |
Just restarted the PC after three hours cooling off (me as much as the computer :) ) and it booted with no problems or error messages, with all drives and partitions mounted and readable.
Perhaps the drives themselves were all heating one another up? Back to square one, in any case - apart from the fact that APCI/APIC has now been disabled. Hmm, it rained in the meantime, perhaps cooling things down by a few degrees. :D [breaking news] I just heard a whirr-click from one of the drives and had a momentary freeze, and now everything is slowing down... |
Sure looks like a hardware fault at this point.
Probably you don't have 2 HDs going bad, but remember that there is still a single point of failure; your controller or your cable. Try unplugging/replugging the cable. Make sure the controller chip isn't getting too hot. Try replacing the cable. You also potentially could have a jumper problem that would cause this if both drives are on the same cable and, for instance, both are jumpered "master". |
All times are GMT -5. The time now is 10:54 PM. |