SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm fairly sure those errors are unrelated to the problem, as they occur at every boot up, whether the freeze occurred previously or not.
Your call, but I'm fairly confident that this is connected. Keep in mind that if you have heavy internet-activity things need to be written to disk, often in small chunks.
I agree with moesasji that this seems to look like a hardware failure/incompatibility. If you have a separate hard drive, it may be worth trying it in your system.
I tried running from an Xubuntu 15.04 live cd, and the issue still occurs, even when no hard drives are mounted. When they are mounted, it has occurred when I view media from my storage drive. (Not my Slackware install drive.) The issue seems fairly reproducible by just viewing a video on Youtube, although it's occurred at other times as well. (Including while writing this post.) Fortunately, when I hit the shutdown button on my computer, it will bring up XFCE's shutdown menu, from which I can cancel and continue what I was doing. This seems to confirm a hardware issue, though it's unlikely to be the hard drive.
It could be the CPU, as syslog will make mention of a non-responsive cpu if I wait long enough. (I'll post the message when I see it again.)
It may be the memory. My last memory check didn't reveal an error, but I can run again to be sure.
It is possible it's the videocard. My current videocard is new, but I got it earlier this summer when the previous card died. Both cards have run the Catalyst driver, to some degree, although I don't believe Ubuntu is using it on the livecd.
I doubt the issue is with my motherboard, as I replaced it with a new one recently as well.
Now that I'm running a live cd, I'll try swapping between the network card I'm using currently, and the built-in networking on the motherboard, to see if that's the issue.
Metaschima, although that message doesn't occur for every instance of the freeze, it does seem to coincide with the error I'm getting through Xubuntu.
Last edited by Sylvester Ink; 11-28-2015 at 10:43 PM.
Check the mobo voltages and see if they are within limits.
The graphics card is nearly impossible to test, but it definitely can fail, and I had one fail with similar symptoms to yours. The only hint was a low voltage for card, if that is a hint at all.
I suspect you may have bad sector(s) on your disk. The ATA bus error can be due to 1 bad 512byte block (ie a "host" sector) on a drive which uses 4k native sectors. I have seen this with Western Digital 2TB and 3TB drives. The good news is if you can find and re-write the bad sector - the drive will carry on and still have a useful working life.
These bad sectors can be difficult to detect if NCQ (Native Command Queueing) is enabled for the disk. Try disabling NCQ with:
Code:
echo 1 > /sys/block/sdX/device/queue_depth
...and you may then get a more comprehensive error message, possibly with a sector number (LBA address).
There is a way to decode the sector address from your messages - see my notes/example below:
...try varying the sector number to determine the range of bad sectors. If it is a drive with 4k native blocks, then you'll find the whole 4K (ie 8x 512byte blocks unreadable).
Confirm the bad sector location(s) with the "--read-sector" flag of hdparm(8).
If you do have bad sector(s), the next step will be to work out what (if any) file or meta-data is using those sectors.
Finally, you may be able to "repair" the bad sector with the "--repair-sector" parameter of hdparm. This writes zeros over the sector contents, but once done (for WD drives anyway, in my experience) the drive will be OK. I've not seen bad sectors re-appear in the same locations, and the SMART remapped sectors count did not increase.
@MarcT, if he is seeing the same problem when running a live distro that isn't touching the harddrive, it seems unlikely that the harddrive is to blame for his problem.
True - although some live distros will activate swap partitions if found, and could scan md raid and LVM partitions. There could be unexpected or unintended disk access.
The only way to rule it out with a live distro would be to disconnect the drive.
I still think it's worth disabling NCQ and trying to reproduce this freeze. If there is a disk problem it will often lead to a more accurate I/O error message.
However, apparent bad sectors could also be a symptom of some other hardware issue (eg CPU core failing, bad RAM, bad SATA cable, marginal power supply, etc).
Sorry for the slow updates, but the holiday season plus work projects have limited my available time to diagnose these issues.
I ran memtest for about 9 hours and didn't get a single error, so I think it's pretty safe to say the memory is fine.
I ran the Mesernne stress test for 15 minutes on mode 1, and about 6 minutes on mode 2. Neither showed errors, but I'll set it to run overnight, just in case. Otherwise, I can rule out CPU issues.
I think the next area to check is whether the network card is the issue. I'll pull it and use the integrated networking on the motherboard (currently disabled), and see if I still get the freeze.
[EDIT]
After 7 hours of the stress test, I have no errors, so I think it's safe to say that the CPU is fine as well.
[/EDIT]
Last edited by Sylvester Ink; 12-14-2015 at 08:49 AM.
Okay, it seems the problem was with the network card. I have one on my motherboard and one in a PCI slot. I use the PCI card and have the motherboard networking disabled due to a previous issue on an older motherboard. So it seems like the PCI card was either damaged or conflicting with some other hardware. In any case, once I pulled it and switched to the motherboard, everything has been running just fine after about a week usage. I'll post an update if the problem occurs again, but for now I think I can mark this solved.
Thanks for your help, and I apologize for the delayed responses on my part. (Again, it's that time of year.)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.