LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-28-2015, 11:20 PM   #1
raindog308
Member
 
Registered: Dec 2010
Posts: 34

Rep: Reputation: 1
Question Multiple Kernel Crashes/Oops - "unable to handle kernel paging request"


I'm running CentOS 7.1.1503 on a home-built box with an i3 processor. The box is a file server and has several internal mdadm RAID arrays...one for root, one for a small file share, and one for a large file share.

I've seen a number of spontaneous reboots:

Code:
127.0.0.1-2015.08.16-01:28:16/vmcore-dmesg.txt:[562354.563966] BUG: unable to handle kernel paging request at 000000000000212a
127.0.0.1-2015.08.27-07:12:31/vmcore-dmesg.txt:[969889.931877] BUG: unable to handle kernel paging request at ffff880112268b60
127.0.0.1-2015.08.28-16:52:55/vmcore-dmesg.txt:[ 4611.944684] kernel BUG at drivers/md/raid5.c:316!
127.0.0.1-2015.08.28-19:21:10/vmcore-dmesg.txt:[ 8833.527255] BUG: unable to handle kernel paging request at 000000020000039f
Some more detail on the last one:

Code:
[ 8833.527255] BUG: unable to handle kernel paging request at 000000020000039f
[ 8833.527293] IP: [<ffffffff81208d2a>] bio_integrity_advance+0x1a/0x60
[ 8833.527320] PGD 0
[ 8833.527327] Oops: 0000 [#1] SMP
[ 8833.527338] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_mirror dm_region_hash dm_log dm_mod nfsd intel_powerclamp coretemp eeepc_wmi asus_wmi sparse_keymap raid456 async_raid6_recov async_memcpy async_pq intel_rapl kvm_intel raid6_pq rfkill kvm iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek mxm_wmi snd_hda_codec_hdmi snd_hda_codec_generic crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_controller snd_hda_codec aesni_intel snd_hwdep auth_rpcgss nfs_acl lockd mei_me async_xor snd_seq snd_seq_device xor async_tx lrw gf128mul shpchp wmi snd_pcm mei glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr serio_raw i2c_i801 snd_timer snd soundcore tpm_infineon sunrpc uinput ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_common
[ 8833.527606]  i915 ahci libahci libata i2c_algo_bit drm_kms_helper e1000e drm ptp pps_core i2c_core video
[ 8833.527644] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.10.0-229.11.1.el7.x86_64 #1
[ 8833.527667] Hardware name: ASUS All Series/Z87-PLUS, BIOS 1405 08/19/2013
[ 8833.527686] task: ffff88030e8b6660 ti: ffff88030e8e4000 task.ti: ffff88030e8e4000
[ 8833.527704] RIP: 0010:[<ffffffff81208d2a>]  [<ffffffff81208d2a>] bio_integrity_advance+0x1a/0x60
[ 8833.527736] RSP: 0018:ffff88031fb83cf0  EFLAGS: 00010202
[ 8833.527752] RAX: 00000001ffffffff RBX: 0000000000006000 RCX: 0000000000000003
[ 8833.527770] RDX: 0000000000000000 RSI: 0000000000006000 RDI: 00000001fb3f2b10
[ 8833.527790] RBP: ffff88031fb83d08 R08: 0000000000000001 R09: 00000000000002c0
[ 8833.527809] R10: ffff88030aa9a800 R11: 0000000000080000 R12: ffff88001c6c5c58
[ 8833.527828] R13: 00000000fffffffb R14: 0000000000006000 R15: ffff880131e9ac00
[ 8833.527846] FS:  0000000000000000(0000) GS:ffff88031fb80000(0000) knlGS:0000000000000000
[ 8833.527865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8833.527880] CR2: 000000020000039f CR3: 000000000190a000 CR4: 00000000001407e0
[ 8833.527897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8833.527916] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 8833.527934] Stack:
[ 8833.527941]  ffffffff811fe09d ffff88001c6c5c58 0000000000006000 ffff88031fb83d48
[ 8833.527968]  ffffffff812ad447 0007a00000000000 ffff880131e9ac00 0000000000000000
[ 8833.527990]  0000000000000000 0000000000000000 ffff880131e9ac00 ffff88031fb83d70
[ 8833.528014] Call Trace:
[ 8833.528023]  <IRQ>
[ 8833.528029]
[ 8833.528042]  [<ffffffff811fe09d>] ? bio_advance+0x1d/0xd0
[ 8833.528063]  [<ffffffff812ad447>] blk_update_request+0x77/0x350
[ 8833.528083]  [<ffffffff812ad73c>] blk_update_bidi_request+0x1c/0x80
[ 8833.528101]  [<ffffffff812ada1f>] blk_end_bidi_request+0x1f/0x60
[ 8833.528121]  [<ffffffff812ada70>] blk_end_request+0x10/0x20
[ 8833.528142]  [<ffffffff813f9cd8>] scsi_io_completion+0x108/0x650
[ 8833.528160]  [<ffffffff813eece3>] scsi_finish_command+0xb3/0x110
[ 8833.528176]  [<ffffffff813f9adf>] scsi_softirq_done+0x12f/0x160
[ 8833.528192]  [<ffffffff812b3fb0>] blk_done_softirq+0x90/0xc0
[ 8833.528208]  [<ffffffff81077b2f>] __do_softirq+0xef/0x280
[ 8833.528223]  [<ffffffff81615b9c>] call_softirq+0x1c/0x30
[ 8833.528239]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[ 8833.528252]  [<ffffffff81077ec5>] irq_exit+0x115/0x120
[ 8833.528267]  [<ffffffff81616738>] do_IRQ+0x58/0xf0
[ 8833.528284]  [<ffffffff8160b9ed>] common_interrupt+0x6d/0x6d
[ 8833.528299]  <EOI>
[ 8833.527255] BUG: unable to handle kernel paging request at 000000020000039f
[ 8833.527293] IP: [<ffffffff81208d2a>] bio_integrity_advance+0x1a/0x60
[ 8833.527320] PGD 0
[ 8833.527327] Oops: 0000 [#1] SMP
[ 8833.527338] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_mirror dm_region_hash dm_log dm_mod nfsd intel_powerclamp coretemp eeepc_wmi asus_wmi sparse_keymap raid456 async_raid6_recov async_memcpy async_pq intel_rapl kvm_intel raid6_pq rfkill kvm iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek mxm_wmi snd_hda_codec_hdmi snd_hda_codec_generic crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_controller snd_hda_codec aesni_intel snd_hwdep auth_rpcgss nfs_acl lockd mei_me async_xor snd_seq snd_seq_device xor async_tx lrw gf128mul shpchp wmi snd_pcm mei glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr serio_raw i2c_i801 snd_timer snd soundcore tpm_infineon sunrpc uinput ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_common
[ 8833.527606]  i915 ahci libahci libata i2c_algo_bit drm_kms_helper e1000e drm ptp pps_core i2c_core video
[ 8833.527644] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.10.0-229.11.1.el7.x86_64 #1
[ 8833.527667] Hardware name: ASUS All Series/Z87-PLUS, BIOS 1405 08/19/2013
[ 8833.527686] task: ffff88030e8b6660 ti: ffff88030e8e4000 task.ti: ffff88030e8e4000
[ 8833.527704] RIP: 0010:[<ffffffff81208d2a>]  [<ffffffff81208d2a>] bio_integrity_advance+0x1a/0x60
[ 8833.527736] RSP: 0018:ffff88031fb83cf0  EFLAGS: 00010202
[ 8833.527752] RAX: 00000001ffffffff RBX: 0000000000006000 RCX: 0000000000000003
[ 8833.527770] RDX: 0000000000000000 RSI: 0000000000006000 RDI: 00000001fb3f2b10
[ 8833.527790] RBP: ffff88031fb83d08 R08: 0000000000000001 R09: 00000000000002c0
[ 8833.527809] R10: ffff88030aa9a800 R11: 0000000000080000 R12: ffff88001c6c5c58
[ 8833.527828] R13: 00000000fffffffb R14: 0000000000006000 R15: ffff880131e9ac00
[ 8833.527846] FS:  0000000000000000(0000) GS:ffff88031fb80000(0000) knlGS:0000000000000000
[ 8833.527865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8833.527880] CR2: 000000020000039f CR3: 000000000190a000 CR4: 00000000001407e0
[ 8833.527897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8833.527916] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 8833.527934] Stack:
[ 8833.527941]  ffffffff811fe09d ffff88001c6c5c58 0000000000006000 ffff88031fb83d48
[ 8833.527968]  ffffffff812ad447 0007a00000000000 ffff880131e9ac00 0000000000000000
[ 8833.527990]  0000000000000000 0000000000000000 ffff880131e9ac00 ffff88031fb83d70
[ 8833.528014] Call Trace:
[ 8833.528023]  <IRQ>
[ 8833.528029]
[ 8833.528042]  [<ffffffff811fe09d>] ? bio_advance+0x1d/0xd0
[ 8833.528063]  [<ffffffff812ad447>] blk_update_request+0x77/0x350
[ 8833.528083]  [<ffffffff812ad73c>] blk_update_bidi_request+0x1c/0x80
[ 8833.528101]  [<ffffffff812ada1f>] blk_end_bidi_request+0x1f/0x60
[ 8833.528121]  [<ffffffff812ada70>] blk_end_request+0x10/0x20
[ 8833.528142]  [<ffffffff813f9cd8>] scsi_io_completion+0x108/0x650
[ 8833.528160]  [<ffffffff813eece3>] scsi_finish_command+0xb3/0x110
[ 8833.528176]  [<ffffffff813f9adf>] scsi_softirq_done+0x12f/0x160
[ 8833.528192]  [<ffffffff812b3fb0>] blk_done_softirq+0x90/0xc0
[ 8833.528208]  [<ffffffff81077b2f>] __do_softirq+0xef/0x280
[ 8833.528223]  [<ffffffff81615b9c>] call_softirq+0x1c/0x30
[ 8833.528239]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[ 8833.528252]  [<ffffffff81077ec5>] irq_exit+0x115/0x120
[ 8833.528267]  [<ffffffff81616738>] do_IRQ+0x58/0xf0
[ 8833.528284]  [<ffffffff8160b9ed>] common_interrupt+0x6d/0x6d
[ 8833.528299]  <EOI>
[ 8833.528307]
[ 8833.528318]  [<ffffffff814aa022>] ? cpuidle_enter_state+0x52/0xc0
[ 8833.528333]  [<ffffffff814aa018>] ? cpuidle_enter_state+0x48/0xc0
[ 8833.528352]  [<ffffffff814aa155>] cpuidle_idle_call+0xc5/0x200
[ 8833.528370]  [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
[ 8833.528389]  [<ffffffff810c6801>] cpu_startup_entry+0xf1/0x290
[ 8833.528410]  [<ffffffff8104228a>] start_secondary+0x1ba/0x230
[ 8833.528426] Code: 08 66 89 57 28 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 8b 7f 60 48 8b 40 10 48 85 ff 48 8b 80 98 00 00 00 <48> 8b 90 a0 03 00 00 74 2a 48 85 d2 74 27 89 f0 55 c1 ee 09 c1
[ 8833.528536] RIP  [<ffffffff81208d2a>] bio_integrity_advance+0x1a/0x60
[ 8833.528562]  RSP <ffff88031fb83cf0>
[ 8833.528573] CR2: 000000020000039f
So most of that is greek to me but obviously it's in the I/O subsystem somewhere.

I haven't had a chance to run memtest86 but that's my next move tomorrow. But if it shows the RAM is OK...what might be the next move?

Kernel is up to date:

Code:
3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
yum shows a few things I could update (married, http, firefox, etc.) but they're all applications rather than system stuff. No kernel update available.

Running CentOS is not a requirement if I would fare better on a different distro. Really, the box just needs smb, CrashPlan, and a few minor web things with php/mysql. It's got 12GB of RAM and while it's busy (load is often 3 or 4), it's a quad core i3 and it shouldn't be oopsing like that regardless.
 
Old 08-29-2015, 02:38 PM   #2
raindog308
Member
 
Registered: Dec 2010
Posts: 34

Original Poster
Rep: Reputation: 1
memtest86 was clean after a full pass (all tests) so I don't think it's RAM.

Ideas?

Thanks in advance.
 
Old 11-04-2015, 02:51 AM   #3
BensonBear
LQ Newbie
 
Registered: Feb 2005
Posts: 25

Rep: Reputation: 1
Quote:
Originally Posted by raindog308 View Post
memtest86 was clean after a full pass (all tests) so I don't think it's RAM.

Ideas?

Thanks in advance.
Don't know if you are still around, but I was googling on the subject heading and came across this. I just thought I would point out that just because memtest86 runs a long time with no errors does not mean there are no errors in your memory. I had a problem where I could not boot a particular kernel and really thought it had to be a kernel problem because I ran memtest86 for about 10 straight passes (many days) and it found no errors.

I stayed with an older kernel and about six months later, I started getting apparently random crashes. This time, memtest showed a bunch of errors. I narrowed it down to one of four sticks of memory, and then I found that the kernel that would not boot previously now booted with no problems.

I think that means the original problem was due to faulty memory which was not detected by memtest86.

(Sadly, the companion stick of the one that originally went bad on my system now also appears to be bad. I am getting the error in the subject header once in a while (maybe once every two weeks or so), and I think its the memory again. First tine in 20 years had memory problems, hard to accept).
 
Old 11-04-2015, 02:04 PM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,987
Blog Entries: 4

Rep: Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037Reputation: 4037
Technically, this arises from a "general protection fault" type of interrupt that occurs in kernel-mode. (See the kernel source-code and associated comments and documentation for details.) In short, it could have either a hardware or a kernel software (driver?) explanation.

The nature of these messages strongly suggests to me that the root cause is a software bug in the drivers/md/raid5.c module, having something to do with bidirectional data ("bidi") requests . . .

I'd therefore see if these drivers are up-to-date and/or if any issues in this area have been reported.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
"unable to handle kernel paging request" doing rsync Barx Slackware 5 06-06-2013 09:01 AM
KMAP Physical Page to Kernel Virtual Address: unable to handle kernel paging request kickuindajunk Linux - Kernel 6 08-21-2010 01:17 PM
Fedora core 8 install issues. "unable to handle kernel paging request" Gangrif Fedora - Installation 7 03-12-2008 11:46 AM
Kernel OOPS "Unable to handle kernel NULL pointer dereference" tkwsn Linux - General 3 06-30-2004 09:38 AM
Samba is behaving poorly and causing "unable to handle kernel paging request" errors system Linux - Networking 6 01-26-2002 09:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 03:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration