LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 01-10-2021, 12:21 PM   #1
TheOneKEA
Member
 
Registered: Oct 2003
Distribution: Debian GNU/Linux 11 (amd64) w/kernel 6.0.15
Posts: 299

Rep: Reputation: 30
Question Unable to use NVMe device on X570-P motherboard; refcount_t and percpu errors


I recently plugged a 256GB NVMe drive into the secondary slot on my Asus X570-P motherboard. After partitioning and formatting the drive, any I/O to the drive (including mounting its filesystem) causes the following errors to appear in the dmesg.

Code:
[  121.698761] refcount_t: underflow; use-after-free.
[  121.698772] WARNING: CPU: 8 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xab/0xf0
[  121.698773] Modules linked in: rfcomm(E) cmac(E) bnep(E) binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) btusb(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) crct10dif_pclmul(E) crc32_pclmul(E) rfkill(E) ghash_clmulni_intel(E) jitterentropy_rng(E) aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) efi_pstore(E) drbg(E) ccp(E) ansi_cprng(E) ecdh_generic(E) ecc(E) acpi_cpufreq(E) nft_counter(E) efivarfs(E) crc32c_intel(E)
[  121.698797] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E     5.10.6-BET #1
[  121.698798] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1405 11/19/2019
[  121.698801] RIP: 0010:refcount_warn_saturate+0xab/0xf0
[  121.698802] Code: 05 af d2 72 01 01 e8 7a 06 87 00 0f 0b c3 80 3d 9d d2 72 01 00 75 90 48 c7 c7 78 60 44 a6 c6 05 8d d2 72 01 01 e8 5b 06 87 00 <0f> 0b c3 80 3d 7c d2 72 01 00 0f 85 6d ff ff ff 48 c7 c7 d0 60 44
[  121.698804] RSP: 0018:ffffa9d980394f30 EFLAGS: 00010086
[  121.698805] RAX: 0000000000000000 RBX: ffff93c68f858900 RCX: 0000000000000027
[  121.698806] RDX: 0000000000000027 RSI: ffff93cd7ec12e80 RDI: ffff93cd7ec12e88
[  121.698807] RBP: ffff93c690bde200 R08: 0000000000000000 R09: c0000000ffffdfff
[  121.698808] R10: ffffa9d980394d50 R11: ffffa9d980394d48 R12: 0000000000000001
[  121.698809] R13: ffff93c6941f0600 R14: ffff93c68f78fa00 R15: 0000000000000000
[  121.698810] FS:  0000000000000000(0000) GS:ffff93cd7ec00000(0000) knlGS:0000000000000000
[  121.698811] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  121.698812] CR2: 00007f332be16000 CR3: 0000000106d4a000 CR4: 0000000000350ee0
[  121.698813] Call Trace:
[  121.698815]  <IRQ>
[  121.698818]  nvme_irq+0x104/0x190
[  121.698822]  __handle_irq_event_percpu+0x2e/0xd0
[  121.698824]  handle_irq_event_percpu+0x33/0x80
[  121.698825]  handle_irq_event+0x39/0x70
[  121.698827]  handle_edge_irq+0x7c/0x1a0
[  121.698830]  asm_call_irq_on_stack+0x12/0x20
[  121.698831]  </IRQ>
[  121.698834]  common_interrupt+0xd7/0x160
[  121.698836]  asm_common_interrupt+0x1e/0x40
[  121.698839] RIP: 0010:cpuidle_enter_state+0xd2/0x2e0
[  121.698840] Code: e8 93 22 6a ff 31 ff 49 89 c5 e8 29 2c 6a ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 c4 01 00 00 31 ff e8 a2 d8 6f ff fb 45 85 f6 <0f> 88 c9 00 00 00 49 63 ce be 68 00 00 00 4c 2b 2c 24 48 89 ca 48
[  121.698841] RSP: 0018:ffffa9d980177e80 EFLAGS: 00000202
[  121.698842] RAX: ffff93cd7ec1ce00 RBX: 0000000000000002 RCX: 000000000000001f
[  121.698843] RDX: 0000001c55cfa428 RSI: 00000000239f541c RDI: 0000000000000000
[  121.698844] RBP: ffff93c68ea7f400 R08: 0000000000000002 R09: 000000000001c600
[  121.698845] R10: 00000077d8356efc R11: ffff93cd7ec1be24 R12: ffffffffa66d38e0
[  121.698846] R13: 0000001c55cfa428 R14: 0000000000000002 R15: 0000000000000000
[  121.698849]  cpuidle_enter+0x30/0x50
[  121.698852]  do_idle+0x24f/0x290
[  121.698854]  cpu_startup_entry+0x1b/0x20
[  121.698857]  start_secondary+0x10b/0x150
[  121.698859]  secondary_startup_64_no_verify+0xb0/0xbb
[  121.698861] ---[ end trace 3cff32dbce8f0fd6 ]---
[  151.779331] nvme nvme1: I/O 159 QID 9 timeout, aborting
[  151.779344] nvme nvme1: I/O 160 QID 9 timeout, aborting
[  151.779349] nvme nvme1: I/O 161 QID 9 timeout, aborting
[  151.779354] nvme nvme1: I/O 162 QID 9 timeout, aborting
[  151.779368] nvme nvme1: Abort status: 0x0
[  151.779370] nvme nvme1: Abort status: 0x0
[  151.779371] nvme nvme1: Abort status: 0x0
[  151.779373] nvme nvme1: Abort status: 0x0
[  151.779374] nvme nvme1: I/O 166 QID 9 timeout, aborting
[  151.779378] nvme nvme1: I/O 167 QID 9 timeout, aborting
[  151.779382] nvme nvme1: I/O 168 QID 9 timeout, aborting
[  151.779387] nvme nvme1: Abort status: 0x0
[  151.779389] nvme nvme1: Abort status: 0x0
[  151.779390] nvme nvme1: I/O 169 QID 9 timeout, aborting
[  151.779394] nvme nvme1: Abort status: 0x0
[  151.779396] nvme nvme1: I/O 170 QID 9 timeout, aborting
[  151.779402] nvme nvme1: Abort status: 0x0
[  151.779403] nvme nvme1: I/O 171 QID 9 timeout, aborting
[  151.779408] nvme nvme1: Abort status: 0x0
[  151.779410] nvme nvme1: I/O 172 QID 9 timeout, aborting
[  151.779415] nvme nvme1: Abort status: 0x0
[  151.779416] nvme nvme1: I/O 173 QID 9 timeout, aborting
[  151.779420] nvme nvme1: Abort status: 0x0
[  151.779427] nvme nvme1: Abort status: 0x0
[  181.987372] nvme nvme1: I/O 159 QID 9 timeout, reset controller
[  182.015464] nvme nvme1: 15/0/0 default/read/poll queues
[  212.195476] nvme nvme1: I/O 160 QID 9 timeout, disable controller
[  212.313646] blk_update_request: I/O error, dev nvme1n1, sector 16350 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313653] blk_update_request: I/O error, dev nvme1n1, sector 16093 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313656] blk_update_request: I/O error, dev nvme1n1, sector 15836 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313658] blk_update_request: I/O error, dev nvme1n1, sector 15579 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313660] blk_update_request: I/O error, dev nvme1n1, sector 15322 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313662] blk_update_request: I/O error, dev nvme1n1, sector 15065 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313663] blk_update_request: I/O error, dev nvme1n1, sector 14808 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313665] blk_update_request: I/O error, dev nvme1n1, sector 14551 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313667] blk_update_request: I/O error, dev nvme1n1, sector 14294 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313669] blk_update_request: I/O error, dev nvme1n1, sector 14037 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
[  212.313702] nvme nvme1: failed to mark controller live state
[  212.313705] nvme nvme1: Removing after probe failure status: -19
[  212.323510] Aborting journal on device dm-0-8.
[  212.323518] Buffer I/O error on dev dm-0, logical block 25198592, lost sync page write
[  212.323521] JBD2: Error -5 detected when updating journal superblock for dm-0-8.
Code:
[  212.344431] percpu ref (hd_struct_free) <= 0 (-28) after switching to atomic
[  212.344438] WARNING: CPU: 6 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x139/0x140
[  212.344439] Modules linked in: rfcomm(E) cmac(E) bnep(E) binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) btusb(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) crct10dif_pclmul(E) crc32_pclmul(E) rfkill(E) ghash_clmulni_intel(E) jitterentropy_rng(E) aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) efi_pstore(E) drbg(E) ccp(E) ansi_cprng(E) ecdh_generic(E) ecc(E) acpi_cpufreq(E) nft_counter(E) efivarfs(E) crc32c_intel(E)
[  212.344452] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W   E     5.10.6-BET #1
[  212.344453] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1405 11/19/2019
[  212.344454] RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x139/0x140
[  212.344456] Code: 80 3d f9 f0 72 01 00 0f 85 52 ff ff ff 49 8b 54 24 e0 49 8b 74 24 e8 48 c7 c7 88 5f 44 a6 c6 05 db f0 72 01 01 e8 ad 24 87 00 <0f> 0b e9 2e ff ff ff 41 55 49 89 f5 41 54 55 48 89 fd 53 48 83 ec
[  212.344456] RSP: 0018:ffffa9d98033cf20 EFLAGS: 00010282
[  212.344457] RAX: 0000000000000000 RBX: 7fffffffffffffe3 RCX: 0000000000000027
[  212.344457] RDX: 0000000000000027 RSI: ffff93cd7eb92e80 RDI: ffff93cd7eb92e88
[  212.344458] RBP: 0000360c00c0c328 R08: 0000000000000000 R09: c0000000ffffdfff
[  212.344458] R10: ffffa9d98033cd40 R11: ffffa9d98033cd38 R12: ffff93c68fb584a0
[  212.344459] R13: ffffffffa6765f10 R14: 0000000000000202 R15: ffffffffa6606100
[  212.344460] FS:  0000000000000000(0000) GS:ffff93cd7eb80000(0000) knlGS:0000000000000000
[  212.344460] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  212.344461] CR2: 000055585e41cc28 CR3: 0000000107fe8000 CR4: 0000000000350ee0
[  212.344461] Call Trace:
[  212.344462]  <IRQ>
[  212.344465]  rcu_core+0x196/0x420
[  212.344468]  __do_softirq+0xc9/0x214
[  212.344469]  asm_call_irq_on_stack+0x12/0x20
[  212.344470]  </IRQ>
[  212.344471]  do_softirq_own_stack+0x31/0x40
[  212.344473]  irq_exit_rcu+0x9a/0xa0
[  212.344474]  sysvec_apic_timer_interrupt+0x2c/0x80
[  212.344475]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  212.344477] RIP: 0010:cpuidle_enter_state+0xd2/0x2e0
[  212.344478] Code: e8 93 22 6a ff 31 ff 49 89 c5 e8 29 2c 6a ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 c4 01 00 00 31 ff e8 a2 d8 6f ff fb 45 85 f6 <0f> 88 c9 00 00 00 49 63 ce be 68 00 00 00 4c 2b 2c 24 48 89 ca 48
[  212.344478] RSP: 0018:ffffa9d980167e80 EFLAGS: 00000202
[  212.344479] RAX: ffff93cd7eb9ce00 RBX: 0000000000000001 RCX: 000000000000001f
[  212.344479] RDX: 0000003170b6b110 RSI: 00000000239f541c RDI: 0000000000000000
[  212.344480] RBP: ffff93c68ea7e000 R08: 0000000000000002 R09: 000000000001c600
[  212.344480] R10: 000000c3ae2c0e44 R11: ffff93cd7eb9be24 R12: ffffffffa66d38e0
[  212.344481] R13: 0000003170b6b110 R14: 0000000000000001 R15: 0000000000000000
[  212.344483]  cpuidle_enter+0x30/0x50
[  212.344484]  do_idle+0x24f/0x290
[  212.344486]  cpu_startup_entry+0x1b/0x20
[  212.344487]  start_secondary+0x10b/0x150
[  212.344488]  secondary_startup_64_no_verify+0xb0/0xbb
[  212.344489] ---[ end trace 3cff32dbce8f0fd7 ]---
After these errors are thrown, the device becomes inaccessible and unmounting its filesystem generates additional errors:

Code:
[  756.097787] Buffer I/O error on dev dm-0, logical block 0, lost sync page write
[  756.097792] EXT4-fs (dm-0): I/O error while writing superblock
These errors occur with both the 5.9.15 kernel and the 5.10.6 kernel.
 
Old 01-10-2021, 01:46 PM   #2
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
Looking at the manual for that board it seems the support for nvme can be either PCIe 3.0x4 or PCIe 4.0x4 depending on the processor installed.

I have a B550M board and on it the information in the manual explicitly states that the M.2-2 socket shares a bus with SATA 5 & 6 and that only one can be used.(either M.2-2 or Sata 5 & 6, but not both) Your card that has the problem is in M.2-2 and I wonder if it may be similar even though your manual does not state it that way. It may be worth the try and see if they are interfering by changing whatever SATA ports you are using.
 
1 members found this post helpful.
Old 01-10-2021, 03:11 PM   #3
TheOneKEA
Member
 
Registered: Oct 2003
Distribution: Debian GNU/Linux 11 (amd64) w/kernel 6.0.15
Posts: 299

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by computersavvy View Post
Looking at the manual for that board it seems the support for nvme can be either PCIe 3.0x4 or PCIe 4.0x4 depending on the processor installed.

I have a B550M board and on it the information in the manual explicitly states that the M.2-2 socket shares a bus with SATA 5 & 6 and that only one can be used.(either M.2-2 or Sata 5 & 6, but not both) Your card that has the problem is in M.2-2 and I wonder if it may be similar even though your manual does not state it that way. It may be worth the try and see if they are interfering by changing whatever SATA ports you are using.
You've made an excellent point. I checked my motherboard manual and my hardware configuration, and it appears that the M.2_2 socket on the X570-P has its own dedicated connection to the X570 chipset, and does not share with the SATA ports. I don't have anything plugged into SATA5G or SATA6G anyway so I don't believe this issue is being caused by an underlying hardware mismatch.
 
Old 01-17-2021, 10:31 AM   #4
TheOneKEA
Member
 
Registered: Oct 2003
Distribution: Debian GNU/Linux 11 (amd64) w/kernel 6.0.15
Posts: 299

Original Poster
Rep: Reputation: 30
I updated my motherboard BIOS to version 3001 and modified some of the settings to ensure that the M.2_2 slot was properly configured for the NVMe drive, and I'm continuing to get the same kernel I/O errors. Other than the drive being defective in some way I'm not sure where else to check to see why this is happening.
 
Old 01-17-2021, 12:08 PM   #5
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
Quote:
Originally Posted by TheOneKEA View Post
I updated my motherboard BIOS to version 3001 and modified some of the settings to ensure that the M.2_2 slot was properly configured for the NVMe drive, and I'm continuing to get the same kernel I/O errors. Other than the drive being defective in some way I'm not sure where else to check to see why this is happening.

Does the BIOS see the drive properly?
Someone recently posted about a second drive that was acting flaky and found a (hidden) setting in the advanced bios that fixed the issue. I would look there and read the manual about the bios carefully in case you have a similar issue.
 
Old 01-17-2021, 12:42 PM   #6
TheOneKEA
Member
 
Registered: Oct 2003
Distribution: Debian GNU/Linux 11 (amd64) w/kernel 6.0.15
Posts: 299

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by computersavvy View Post
Does the BIOS see the drive properly?
Someone recently posted about a second drive that was acting flaky and found a (hidden) setting in the advanced bios that fixed the issue. I would look there and read the manual about the bios carefully in case you have a similar issue.
Yes, the BIOS does see the drive properly. It has always been visible in the BIOS, even before I did the BIOS updates.
 
Old 01-23-2021, 06:50 PM   #7
TheOneKEA
Member
 
Registered: Oct 2003
Distribution: Debian GNU/Linux 11 (amd64) w/kernel 6.0.15
Posts: 299

Original Poster
Rep: Reputation: 30
After working with the NVMe maintainers, I was able to fix my drive by applying the following patch to my kernel source and recompiling:

Code:
diff -urN pci.c.orig pci.c
--- pci.c.orig  2021-01-20 21:24:32.124077095 -0500
+++ pci.c       2021-01-23 13:06:08.620757149 -0500
@@ -3219,6 +3219,8 @@
                .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
        { PCI_DEVICE(0x15b7, 0x2001),   /*  Sandisk Skyhawk */
                .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
+       { PCI_DEVICE(0x1d97, 0x2263),   /*  SPCC */
+               .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
        { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
                .driver_data = NVME_QUIRK_SINGLE_VECTOR },
        { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
I haven't heard back yet from the NVMe maintainers to see if a patch like this one will be queued for inclusion in the Linux kernel.
 
Old 01-25-2021, 02:51 PM   #8
jefro
Moderator
 
Registered: Mar 2008
Posts: 22,361

Rep: Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692Reputation: 3692
Thanks for the update and solution.
 
  


Reply

Tags
kernel, nvme, percpu, refcount_t, underflow


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
System only boots Win10 and OpenBSD; Can't Boot Linux | AMD X570 w/ Latest BIOS and AGESA zer0cell Linux - Newbie 4 11-11-2020 08:07 PM
Is Asus Prime X570-Pro motherboard (New Egg N82E16813119196) Compatible with Linux Distributions? OLD-Jim Linux - Hardware 2 08-01-2019 06:07 PM
LXer: Data in a Flash, Part II: Using NVMe Drives and Creating an NVMe over Fabrics Network LXer Syndicated Linux News 0 05-20-2019 11:41 PM
Migrate Linux/win10 dual boot from MBR nvme drive to a new GPT nvme drive bluemoo Linux - Software 7 09-25-2018 06:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 03:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration