[SOLVED] rcu_process_callbacks panic on file download
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I've had a very odd panic occurring in the last few days specifically when downloading files in a browser (both firefox and chrome) seem to be doing this. The problem is 100% reproducible and results in the panic below.
I didn't at first realise what was happening but then saw a trend - within a minute or 2 of clicking a download link, and while the file is downloading in the browser, the panic above occurs at the console and the download stops. It also appears that all disk access stops and it's almost impossible to run anything else. A shutdown proceeds only partially and I have to hard stop/reset the machine.
There have been no changes to the OS directly before the issue started however I have been keeping my OS fairly up to date, with downloads and updates on an almost weekly basis.
slackware64 -current
athlon fx 6350 - kernel 4.14.20 and then updated to 4.14.22
nvidia gt1030 - binary 384.111 and then updated to 390.25
other running software: dolphin, thunderbird, sickrage, chrome, qbittorrent
The system had no updates for the week prior to when the problem started so had quite a few days when there was no issue. I've tried to google for this issue but there's not much for rcu_process_callbacks and panics that relate.
Additional information: the problem is happening with wget at a console as well. I tried downloading VirtualBox 5.2.8 now with wget and it stopped about 3/4 way through the download. I then tried to take a screenshot and save it which appeared to be fine but after restarting the machine, the saved screenshot is no longer on the drive. Another oddity is that when the problem happens, the machine drive light comes on solid although there is no drive usage noise.
First thing I would do is run a check on the drive. It could be the start of the drive failing. Check your SMART data and run an fsck on the drive/partition.
If you cannot find anything faulty with your HardDrive, following bassmadrigal's advice, then I'd suggest to try booting a live Linux Image with an older kernel - pre 4.14.20, mount your HardDrive (a partition that is not used by the system would be ideal) and try to replicate your reported issue.
It might be a kernel bug: https://bugs.debian.org/cgi-bin/bugr...cgi?bug=891467 https://bugzilla.kernel.org/show_bug.cgi?id=198861
Hi @abga and @bassmadrigal, Thanks for your responses.
So first this machine is as is from last night - no issues overnight (with no downloading).
I had already done fsck's on all drives as a first stop and nothing of interest found. But I think your 2nd link above is pretty much spot on what's happening here. This feels like it started after the 4.14.20 kernel update.
I've just checked smart and everything is looking good:
sda = 500GB wd black (passed)
sdb = 1TB wd black (passed with interest: 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 13)
sdc = SanDisk Ultra II 240GB - passed but I'm not sure I'm reading this one right:
Anyway, I'm going to find a boot mem stick and test/confirm the issue. Anyone know here where I can get a pre-4.14.20 kernel set from (besides building it myself)? The last iso I've made of -current is in Dec17 and that kernel is 4.9.53 - I'd like to have something that at least includes spectre/meltdown support ... I also see that 4.14.23 has been released by PV ... maybe I should test that as well.
Ok worked off Eric's liveslak for half hour and no problems - that's with 4.14.18. Luckily I took a quick look at dmesg output and here is the culprit for the 1TB WD Black:
Hmm so the problem is solved - a good dust clean out of the case, swap of sata cables between 1TB WD Black and 240GB SSD Sandisk Ultra, and no more errors. I'm going to put this down to a loose cable ...
@abga, thanks very much for those links - got me onto the right track!
On your SMART data interpretation confusion, regrettably none of the HDD manufacturers are respecting the SMART standard anymore, instead they have their own "recipe" and "internal values" that only their Diagnosis Software can interpret. Furthermore, in the modern HardDisks the technology and material limits are pushed at their limits, they compress more data on the same magnetic surface and due to the constant internal errors the manufacturers have implemented (firmware) internal wear leveling and error correction. The SMART field Reallocated Sector Counter, that used to be a good indicator about a failing drive, doesn't reflect today too much. If it does, you'll get some astronomical values that the manufacturer considers "normal", still inside the threshold.
Depending on how the SMART was implemented in the HardDisk, there is still a way to look after failures in looking after the detailed SMART error log, again, if any: https://www.thomas-krenn.com/en/wiki...using_Smartctl
Or, run some self-tests: https://www.thomas-krenn.com/en/wiki..._with_smartctl
On your odd experience, I'm concerned that after the 4.14.20 kernel, any hardware related issue with the HardDisk will result in a kernel panic, which isn't really useful. The two patches that were submitted for resolving this issue are not yet accepted, one has the status "not applicable" and one is "deferred": https://bugzilla.kernel.org/show_bug.cgi?id=198861#c1
Agreed - it was not immediately obvious that the panic was hardware related although that probably should have been a 1st stop. But which hardware? It was only by luck that after booting up a pre-20 kernel, that I saw the ata errors. This will make troubleshooting more difficult.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.