LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 05-31-2018, 04:31 AM   #1
albr0da
LQ Newbie
 
Registered: May 2018
Posts: 3

Rep: Reputation: Disabled
NVMe drive kernel fail after hotplug kernel 4.16.12


Hi,

i have an application which issues read and write command to an NVMe drive. It is possible that during such a command, the power of the drive is removed.
After some seconds the power is restored. Then i remove the drive and make a pci rescan. After that the drive is available again.

The problem is that sometimes, if the power of the drive is already removed and a read or write command is issued, the command blocks and my application is freezed.

It is also not possible to remove the drive via with
"echo 1 > /sys/bus/pci/"pci-slot"/remove"
After the remove command the console hangs.

Lspci -v shows the following output:
03:00.0 Non-Volatile memory controller: Device 1987:5008 (rev ff) (prog-if ff)
!!! Unknown header type 7f

It is even impossible to shutdown the computer, cause linux cant stop my application process.

Below is the output of dmesg. There are 2 power cycles of the NVMe drive. After the first the removal is detected and the read/write command causes the "print_req_error:".
The second time this doesnt work and
[ 2296.834008] INFO: task kworker/u8:0:3133 blocked for more than 120 seconds.
[ 2296.834011] Not tainted 4.16.12 #1
[ 2296.834012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
is returned.

Below is the complete log.

Quote:
[ 1994.821647] nvme nvme0: pci function 0000:03:00.0
[ 1994.821655] nvme 0000:03:00.0: enabling device (0000 -> 0002)
[ 1996.077176] nvme nvme0: failed to set APST feature (-19)
[ 1996.137178] pci 0000:03:00.0: [1987:5008] type 00 class 0x010802
[ 1996.137209] pci 0000:03:00.0: reg 0x10: [mem 0xf7000000-0xf7003fff 64bit]
[ 1996.137216] pci 0000:03:00.0: reg 0x18: [io 0xe000-0xe007]
[ 1996.137552] pci 0000:03:00.0: BAR 0: assigned [mem 0xf7000000-0xf7003fff 64bit]
[ 1996.137560] pci 0000:03:00.0: BAR 2: assigned [io 0xe000-0xe007]
[ 1996.137563] pci 0000:05:00.0: PCI bridge to [bus 06]
[ 1996.137695] nvme nvme0: pci function 0000:03:00.0
[ 2062.094221] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[ 2062.162220] nvme 0000:03:00.0: Refused to change power state, currently in D3
[ 2062.162279] nvme nvme0: Removing after probe failure status: -19
[ 2062.186224] nvme0n1: detected capacity change from 120034123776 to 0
[ 2062.186233] print_req_error: I/O error, dev nvme0n1, sector 151210245
[ 2062.286232] nvme nvme0: failed to set APST feature (-19)
[ 2066.754332] pci 0000:03:00.0: [1987:5008] type 00 class 0x010802
[ 2066.754358] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[ 2066.754364] pci 0000:03:00.0: reg 0x18: [io 0x0000-0x0007]
[ 2066.754389] pci 0000:03:00.0: Max Payload Size set to 256 (was 128, max 256)
[ 2066.754704] pci 0000:03:00.0: BAR 0: assigned [mem 0xf7000000-0xf7003fff 64bit]
[ 2066.754713] pci 0000:03:00.0: BAR 2: assigned [io 0xe000-0xe007]
[ 2066.754717] pci 0000:05:00.0: PCI bridge to [bus 06]
[ 2066.754845] nvme nvme0: pci function 0000:03:00.0
[ 2066.754855] nvme 0000:03:00.0: enabling device (0000 -> 0002)
[ 2068.054326] nvme nvme0: failed to set APST feature (-19)
[ 2068.114351] pci 0000:03:00.0: [1987:5008] type 00 class 0x010802
[ 2068.114379] pci 0000:03:00.0: reg 0x10: [mem 0xf7000000-0xf7003fff 64bit]
[ 2068.114387] pci 0000:03:00.0: reg 0x18: [io 0xe000-0xe007]
[ 2068.114748] pci 0000:03:00.0: BAR 0: assigned [mem 0xf7000000-0xf7003fff 64bit]
[ 2068.114755] pci 0000:03:00.0: BAR 2: assigned [io 0xe000-0xe007]
[ 2068.114759] pci 0000:05:00.0: PCI bridge to [bus 06]
[ 2068.114871] nvme nvme0: pci function 0000:03:00.0
[ 2125.071254] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[ 2125.139251] nvme 0000:03:00.0: Refused to change power state, currently in D3
[ 2125.139309] nvme nvme0: Removing after probe failure status: -19
[ 2125.155254] nvme0n1: detected capacity change from 120034123776 to 0
[ 2296.834008] INFO: task kworker/u8:0:3133 blocked for more than 120 seconds.
[ 2296.834011] Not tainted 4.16.12 #1
[ 2296.834012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2296.834013] kworker/u8:0 D 0 3133 2 0x80000000
[ 2296.834018] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
[ 2296.834018] Call Trace:
[ 2296.834022] __schedule+0x1a1/0x723
[ 2296.834023] schedule+0x2e/0x70
[ 2296.834025] blk_mq_freeze_queue_wait+0x35/0x80
[ 2296.834027] ? do_wait_intr+0x70/0x70
[ 2296.834028] blk_freeze_queue+0x21/0x40
[ 2296.834030] blk_cleanup_queue+0x6c/0x100
[ 2296.834032] nvme_ns_remove+0xe5/0x140 [nvme_core]
[ 2296.834033] nvme_remove_namespaces+0x25/0x50 [nvme_core]
[ 2296.834034] nvme_remove+0x7a/0x100 [nvme]
[ 2296.834036] pci_device_remove+0x38/0xc0
[ 2296.834038] device_release_driver_internal+0x121/0x1d0
[ 2296.834039] device_release_driver+0x11/0x20
[ 2296.834040] nvme_remove_dead_ctrl_work+0x2c/0x50 [nvme]
[ 2296.834041] process_one_work+0x110/0x2d0
[ 2296.834042] worker_thread+0x31/0x320
[ 2296.834044] kthread+0xe2/0x110
[ 2296.834044] ? process_scheduled_works+0x30/0x30
[ 2296.834046] ? kthread_worker_fn+0x150/0x150
[ 2296.834047] ret_from_fork+0x2e/0x38
[ 2296.834049] INFO: task MyApplicationName:13566 blocked for more than 120 seconds.
[ 2296.834050] Not tainted 4.16.12 #1
[ 2296.834051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2296.834051] MyApplicationName D 0 13566 13496 0x00000000
[ 2296.834052] Call Trace:
[ 2296.834053] __schedule+0x1a1/0x723
[ 2296.834054] ? blk_flush_plug_list+0x9c/0x1c0
[ 2296.834055] schedule+0x2e/0x70
[ 2296.834056] io_schedule+0x15/0x40
[ 2296.834058] __blkdev_direct_IO_simple+0x1f1/0x320
[ 2296.834059] ? __enqueue_entity+0x6b/0x80
[ 2296.834061] ? enqueue_task_fair+0x6fd/0xe20
[ 2296.834062] ? bdget+0x100/0x100
[ 2296.834063] blkdev_direct_IO+0x358/0x3c0
[ 2296.834065] ? filemap_write_and_wait_range+0x7d/0xa0
[ 2296.834066] ? generic_update_time+0x7b/0x130
[ 2296.834067] generic_file_direct_write+0x107/0x1e0
[ 2296.834069] __generic_file_write_iter+0xa2/0x200
[ 2296.834070] blkdev_write_iter+0xa7/0x150
[ 2296.834071] __vfs_write+0xd8/0x140
[ 2296.834073] vfs_write+0xa0/0x1b0
[ 2296.834074] ? __fget_light+0x2a/0x80
[ 2296.834075] SyS_pwrite64+0x83/0xb0
[ 2296.834076] do_fast_syscall_32+0x95/0x188
[ 2296.834078] entry_SYSENTER_32+0x4e/0x7c
[ 2296.834078] EIP: 0xb7f5de11
[ 2296.834079] EFLAGS: 00200293 CPU: 3
[ 2296.834079] EAX: ffffffda EBX: 0000000e ECX: b3e17000 EDX: 0000fc00
[ 2296.834080] ESI: 05535c00 EDI: 00000000 EBP: b166a088 ESP: b166a058
[ 2296.834081] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
This problem appears on the following systems:
-Linux 4.16.12-1-ARCH x86_64 GNU/Linux (log is from this system)
-Linux 4.16.12 Debian Wheezy x86_32 GNU/Linux
-Linux 4.15.3 Debian Wheezy x86_32 GNU/Linux

On Linux 4.13.12 this problem doesnt appear.

Do you have any idea what could cause the problem?

thanks
 
Old 05-31-2018, 02:14 PM   #2
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
Do you have hotplug enabled in you kernel? I do not see any of the kernel hotplug events in your log.
 
Old 06-01-2018, 05:19 AM   #3
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
I wouldn't consider this a problem. I doubt if the kernel developers have anticipated the need for such functionality. I would check for the device. If it isn't present, then you should send the command to a buffer. You can contact the developer and ask that the functionality you want be added.
 
Old 06-12-2018, 06:41 AM   #4
albr0da
LQ Newbie
 
Registered: May 2018
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks for your replies.

Quote:
Originally Posted by Mara View Post
Do you have hotplug enabled in you kernel? I do not see any of the kernel hotplug events in your log.
Yes, I have.

I will contact the nvme driver developers and post the reply here if there is a solution for this.
 
Old 07-05-2018, 12:56 AM   #5
albr0da
LQ Newbie
 
Registered: May 2018
Posts: 3

Original Poster
Rep: Reputation: Disabled
There is a patch which solves the problem here:

https://git.kernel.org/pub/scm/linux...0b5186d5551ba1

Thanks for the help
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: NVMe over Fabrics Support Coming to the Linux 4.8 Kernel LXer Syndicated Linux News 0 08-22-2016 03:33 PM
Problem with hotplug and 2.6 kernel calhawk Linux - General 7 03-15-2006 06:01 AM
speed up hotplug with kernel 2.6.7 robtheJobNorflk Slackware 1 09-29-2004 02:10 PM
2.6.8.1 kernel stuck at rc.hotplug carboncopy Slackware 2 08-19-2004 06:23 AM
hotplug + kernel 2.6 kam_kenneth Linux - Newbie 0 05-18-2004 01:26 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 06:59 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration