LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (http://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   unable to handle kernel level paging request at address... (http://www.linuxquestions.org/questions/linux-kernel-70/unable-to-handle-kernel-level-paging-request-at-address-4175484381/)

RandomTroll 11-12-2013 08:15 PM

unable to handle kernel level paging request at address...
 
This happened to me today, halting my computer. Is this a symptom of defective RAM? I was querying a wireless adaptor's status at the time but a lot else was happening.

sundialsvcs 11-14-2013 10:29 AM

See if you can reproduce the problem by doing the same thing again. Verify also that all of your kernel packages are consistently up-to-date, most especially any having to do with removable (e.g. USB) devices such as this adaptor might be. It is easy to "mix apples with oranges," on-demand introducing code into the kernel that is not fully compatible with it because one part got updated slightly but another did not. Also, check in the usual places for any bug-reports concerning this driver.

I do not consider that "defective RAM" is likely, or that it would have anything to do with this issue. (If your RAM really was defective, the computer probably wouldn't be running at all ... It would be sitting there, making peculiar hot smoky smells.)

RandomTroll 11-16-2013 09:33 PM

Quote:

Originally Posted by sundialsvcs (Post 5064333)
See if you can reproduce the problem by doing the same thing again.

I have done the same thing, as near as I can tell, hundreds of times.

Quote:

Originally Posted by sundialsvcs (Post 5064333)
Verify also that all of your kernel packages are consistently up-to-date

I check daily.

Quote:

Originally Posted by sundialsvcs (Post 5064333)
check in the usual places for any bug-reports concerning this driver.

Done.

Quote:

Originally Posted by sundialsvcs (Post 5064333)
(If your RAM really was defective, the computer probably wouldn't be running at all ... It would be sitting there, making peculiar hot smoky smells.)

I've had defective RAM a couple of times without any smoking. Errors happened more commonly.

I wanted to know what this error meant. It could mean that the memory the kernel was trying to page in had failed a consistency check on the occasion of the kernel trying to page it in or that the kernel had erred in its bookkeeping and freed some memory that was still in use and had just tried to use it. Or I don't understand it. If it was a defective driver for a USB-attached non-memory device - why would that cause a paging error?

sundialsvcs 11-18-2013 03:23 PM

There should be a complete traceback when the message occurs. Please include it.

Since the problem is reproducible-at-will by you and it appears to be linked to a particular action by you ... "querying the status of a wireless adapter" ... then I do not expect it to be a memory problem. Perhaps the wrong driver is being used ... perhaps another driver is also listening to that device that shouldn't be.

RandomTroll 11-19-2013 01:26 AM

Quote:

Originally Posted by sundialsvcs (Post 5066787)
Since the problem is reproducible-at-will by you

It isn't. I didn't say it was. I have queried the status of that adaptor thousands of times (it's built in; I've had the computer for 4 years). I provided that bit of data not because it was important but because it was all I had: I couldn't do anything with the computer other than turn it off. Only a bit of the trace was displayed. I didn't want to copy it all down by hand.

I want to understand what the error report means. This error's in arch/x86/mm/fault.c so it is a memory error. That's not documented; it'll take me some time to figure out.

sundialsvcs 11-19-2013 11:10 AM

Well, then let me now obligingly eat my words and say that perhaps it is a hardware problem. DIMM cards do go flakey sometimes, and they sometimes even wiggle out of their slots. If the problem appears more-or-less "at random," and is not traceable to a particular thing that is done, this does say, "hardware."

Or ... a nasty subtle software bug.

The root cause of this problem is that a SIGSEGV or a SIGBUS has occurred with a kernel-space address that is not found in any module's exception-table. So, the kernel never expected to find this problem in its own code and doesn't know what to do. What is the address, and does it correspond to a location within the fixed kernel? (If not, it must be a loaded module, implying a device-driver.) Use the memory-map to trace this to a location in the kernel code.


All times are GMT -5. The time now is 01:05 AM.