Dear linux community,
kernel version:
Code:
[root@kudu-2 centos]# uname -r
3.10.0-957.1.3.el7.x86_64
distro details:
Code:
[root@kudu-2 centos]# cat /etc/*release
CentOS Linux release 7.6.1810 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.6.1810 (Core)
CentOS Linux release 7.6.1810 (Core)
this is the list of my nvme devices:
Code:
[root@kudu-2 centos]# lspci -nn -D | grep -i ssd
0000:00:05.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 01)
0000:00:06.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 01)
0000:00:07.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 01)
0000:00:08.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 01)
I am having this issue with my NVME drives:
Code:
May 3 09:39:42 kudu-2 kernel: nvme nvme2: I/O 157 QID 1 timeout, aborting
May 3 09:39:42 kudu-2 kernel: nvme nvme2: I/O 158 QID 1 timeout, aborting
May 3 09:39:42 kudu-2 kernel: nvme nvme2: I/O 159 QID 1 timeout, aborting
May 3 09:39:42 kudu-2 kernel: nvme nvme2: I/O 160 QID 1 timeout, aborting
May 3 09:40:13 kudu-2 kernel: nvme nvme2: I/O 157 QID 1 timeout, reset controller
May 3 09:40:43 kudu-2 kernel: nvme nvme2: I/O 20 QID 0 timeout, reset controller
May 3 09:41:33 kudu-2 kernel: nvme nvme2: Device not ready; aborting reset
May 3 09:41:33 kudu-2 kernel: nvme nvme2: Abort status: 0x7
May 3 09:41:33 kudu-2 kernel: nvme nvme2: Abort status: 0x7
May 3 09:41:33 kudu-2 kernel: nvme nvme2: Abort status: 0x7
May 3 09:41:33 kudu-2 kernel: nvme nvme2: Abort status: 0x7
May 3 09:41:48 kudu-2 kernel: INFO: task MaintenanceMgr :29348 blocked for more than 120 seconds.
What does `QID 1 timeout, aborting`, `QID 1 timeout, reset controller` and `Device not ready; aborting reset` why it gets into that stage and how to fix it?