Hi. I purchased a new Thinkpad T14 Gen 4 Laptop with an AMD GPU and it sporadically crashes my X server (running Openbox on Debian Bookworm).
GPU according to lspci:
Code:
64:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 (rev dd)
The crash is reproducible running any of the glmark2 benchmark (or at least the scenarious build, shading and texture) at exactly the end of the benchmark.
I tested with kernels 6.1, 6.6, 6.7, 6.8 and the behavior is the same.
The crash produces the following logging:
Code:
May 21 12:18:51 laptop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=61972, emitted seq=61974
May 21 12:18:51 laptop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process glmark2 pid 3403 thread glmark2:cs0 pid 3404
May 21 12:18:51 laptop kernel: amdgpu 0000:64:00.0: amdgpu: GPU reset begin!
May 21 12:18:52 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:52 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:52 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:52 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:52 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:52 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:52 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:52 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:53 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:53 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:53 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:53 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:53 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:53 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:53 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:53 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:53 laptop kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
May 21 12:18:53 laptop kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
May 21 12:18:53 laptop kernel: [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
I also tried using the following kernel params but no joy:
Code:
iommu=soft amdgpu.runpm=0 amdgpu.sg_display=0
Installed firmware packages:
Code:
firmware-amd-graphics install
firmware-atheros install
firmware-intel-sound install
firmware-iwlwifi install
firmware-linux install
firmware-linux-free install
firmware-linux-nonfree install
firmware-misc-nonfree install
firmware-realtek install
firmware-sof-signed install
Any hints as to what else I could try would be much appreciated!