LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Trouble with booting Debian 10 from an NVMe SSD (https://www.linuxquestions.org/questions/linux-hardware-18/trouble-with-booting-debian-10-from-an-nvme-ssd-4175694259/)

GustavusHolmiensis 04-25-2021 02:04 PM

Trouble with booting Debian 10 from an NVMe SSD
 
[I previously started a thread with a similar question. I don't know if the thread was deleted by a moderator, or accidentally by myself. In the belief it may have been the latter, I make a new attempt. In case it was the former, please let me know what needs to be improved.]

I have a custom built computer with the following hardware specs:

*Motherboard: ASRock B550 Pro4
*Processor: AMD Ryzen 5 3600 6-Core
*RAM: Two 8 GiB DDR4-2400 modules for a total of 16 GiB RAM
*Storage: Kingston SA2000M8500G 500 GiB NVMe SSD mounted in an M.2 port
*Graphics card: ASUS PCIe GT710-SL-2GD5

I installed Debian 10.6 on the computer, and it ran well for a while. I recall installing Nvidia drivers at some point, and that I had trouble afterwards because I forgot to purge the nouveau drivers. Unfortunately, I don't remember if I ended up removing the Nvidia drivers or not in order to solve the problem.

At a few occasions, at least two but possible more, I've had the computer seemingly freeze completely. It did not respond to keyboard or mouse input, and I had to reboot the computer by pressing the power button.

On at least one of these occasions, I was participating in a video conference over Zoom. At this occasion, the video and sound feed from my computer to the other participant still worked, and I was also able to receive the sound feed from the other participant. Aside from that, the computer was completely frozen and I had to attempt a reboot by pressing the power button.

However, I have not been able to boot the computer since. If I try, the following happens:

*I get the following info on the screen:
Code:

[    0.004219] do_IRQ: 1.55 No irq handler for vector
 [    0.004219] do_IRQ: 2.55 No irq handler for vector
 [    0.004219] do_IRQ: 3.55 No irq handler for vector
 [    0.004219] do_IRQ: 4.55 No irq handler for vector
 [    0.004219] do_IRQ: 5.55 No irq handler for vector
 [    0.004219] do_IRQ: 6.55 No irq handler for vector
 [    0.004219] do_IRQ: 7.55 No irq handler for vector
 [    0.004219] do_IRQ: 8.55 No irq handler for vector
 [    0.004219] do_IRQ: 9.55 No irq handler for vector
 [    0.004219] do_IRQ: 10.55 No irq handler for vector
 /dev/nvme0n1p2: recovering journal
 /dev/nvme0n1p2: clean, 626654/4890624 files, 6477569/19531264 blocks
 [    3.950331] sp5100-tco sp5100-tco: Watchdog hardware is disabled
 fsckd-cancel-msg:Press Ctrl+C to cancel all filesystem checks in progress

[Note that the do_IRQ messages have always appeared, before the problems started to appear]

*I press Ctrl+C but nothing happens. After waiting for a while, I get some messages that seem related to my NVMe SSD:
Code:

[  242.827095] INFO: task jbd2/nvme0n1p2-:313 blocked for more than 120     
  seconds.
 [  242.827114]      Not tainted 4.19.0-14-amd64 #1 Debian 4.19.171-2
 [  242.827120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
  disables this message.
 [  242.827222] INFO: task fsck.ext4:504 blocked for more than 120
  seconds.
 [  242.827228]      Not tainted 4.19.0-14-amd64 #1 Debian 4.19.171-2
 [  242.827233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
  disables this message.
 [  242.827285] INFO: task kworker/0:2H:614 blocked for more than 120
  seconds.
 [  242.827290]      Not tainted 4.19.0-14-amd64 #1 Debian 4.19.171-2
 [  242.827295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
  disables this message.
 [246.623079] nvme nvme0: Device not ready: aborting reset

*After waiting a little more (or possible after pressing Esc), I get a lot
of messages, mostly consisting of a repeat of the following:
Code:

          Unmounting /boot/efi/...
          Deactivating swap /dev/disk/by-uuid/4fc3549b-84a0-4b33-
  bf8a-9884c65e7958.
  [FAILED] Failed unmounting /boot/efi...
  [FAILED] Failed deactivating swap /dev/disk/by-uuid/4fc3549b-84a0-4b33-
  bf8a-9884c65e7958.

*The system then gets stuck after the following message:
Code:

  [OK] Stopped File System Check on /dev/disk/by-uuid/A909-2B14
*I have tried to further diagnose the error by booting in rescue mode from a Debian live USB. Running fdisk -l gives the following output (sans stuff about the USB device):
Code:

Device    Boot  Start      End  Sectors  Size  Id  Type
/dev/sda1  *          0  7758431  7758432  3.7G  0  Empty
/dev/sda2        23680    29343      5664  2.8M  ef EFI (FAT-12/16/32)

Following suggestions which I received, I have done the following:


*Try a normal boot, and hit E in the Grub menu. I get the following:
Code:

setparams 'System setup'

        fwsetup

Thus, there is no line beginning with "linu". Should I add such a line myself?

*I then rebooted the computer in rescue mode from my debian live USB. This time, my NVMe was apparently detected. In the window where I pick a device to use as root file system, I got the following:
Code:

/dev/nvme0n1p1
/dev/nvme0n1p2
/dev/nvme0n1p3
/dev/nvme0n1p4
/dev/sda1
/dev/sda2
Assemble RAID array
Do not use a root file system.

*Because I was unsure what to do, I picked "Do not use a root file system", and opened an interactive shell. I ran blkid and got the following:
Code:

/dev/nvme0n1: PTUUID="9f5a6318-2b4e-4ad8-b58d-269e7445b202" PTTYPE="gpt"
/dev/nvme0n1p1: UUID="A909-2B14" TYPE="vfat" PARTUIID="fb0938f6-2d49-4895-902d-73d38110a938"
/dev/nvme0n1p2: UUID="d163bcb3-a38e-4691-a183-a27b729d57c6" TYPE="ext4" PARTUUID="b7e6a481-400d-4be1-bd25-b023cfe8287"
/dev/nvme0n1p3: UUID="4fc3549b-84a0-4b33-bf8a-9884c65d7958" TYPE="swap" PARTUUID="3dee7513-4f57-4c26-bc01-88351440246a"
/dev/nvme0n1p4: UUID="52684e96-3941-438e-a85a-bbf9458fca70" TYPE="ext4" PARTUUID="edb45be3-beaa-408e-9963-f78d4c3ee6b5"
/dev/sda1: UUID=2020-09-26-11-15-10-00" LABEL="Debian 10.6.0 amd64 1" TYPE="iso9660" PTUUID="7533593e" PTTYPE="dos" PARTUUID="7533593e-01"
/dev/sda2: SEC_TYPE="msdos" UUID="6201-48E3" TYPE=vfat" PARTUUID="7533593e-02"

*Next, I tried to run efibootmgr -v, but the command was not found.

*I then ran parted -l, and got the following:
Code:

Warning: The driver descriptor says the physical block size  is 2048 bytes, but Linux says it is 512 bytes.
Ignore/Cancel?

I picked Cancel, and then got the following:
Code:

Model: SanDisk Cruzer Blade (scsi)
Disk /dev/sda: 16.0GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Model: KINGSTON SA2000M8500G (nvme)
Disk /dev/nvme0n1: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start  End    Size    File system    Name  Flags
 1      1049kB  538MB  537MB  fat32                boot, esp
 2      538MB  80.5GB  80.0GB  ext4
 3      80.5GB  101GB  20.0GB  linux.swap(v1)
 4      101GB  500GB  400GB  ext4

*Since it appeared that the NVMe disk was now found, I tried a normal reboot. I got the following info on the screen. Note, that it is not identical to what I had before.
Code:

[    0.004218] do_IRQ: 1.55 No irq handler for vector
 [    0.004218] do_IRQ: 2.55 No irq handler for vector
 [    0.004218] do_IRQ: 3.55 No irq handler for vector
 [    0.004218] do_IRQ: 4.55 No irq handler for vector
 [    0.004218] do_IRQ: 5.55 No irq handler for vector
 [    0.004218] do_IRQ: 6.55 No irq handler for vector
 [    0.004218] do_IRQ: 7.55 No irq handler for vector
 [    0.004218] do_IRQ: 8.55 No irq handler for vector
 [    0.004218] do_IRQ: 9.55 No irq handler for vector
 [    0.004218] do_IRQ: 10.55 No irq handler for vector
 [    3.441147] sd 6:0:0:0:0: [sda] No Caching mode page found
 [    3.441147] sd 6:0:0:0:0: [sda] Assuming drive cache: write through
 /dev/nvme0n1p2: recovering journal
 /dev/nvme0n1p2: clean, 626654/4890624 files, 6477569/19531264 blocks
 [    4.010116] sp5100-tco sp5100-tco: Watchdog hardware is disabled
 fsckd-cancel-msg:Press Ctrl+C to cancel all filesystem checks in progress

*After waiting for a while, I got the following (repeated a few times):
[code]
[ 242.827430] INFO: task jbd2/nvme0n1p2-:313 blocked for more than 120
seconds.
[ 242.827449] Not tainted 4.19.0-14-amd64 #1 Debian 4.19.171-2
[ 242.827455] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

*After waiting a little more, I got a lot
of messages, mostly consisting of a repeat of the following:
Code:

          Unmounting /boot/efi/...
          Deactivating swap /dev/disk/by-uuid/4fc3549b-84a0-4b33-
  bf8a-9884c65e7958.
  [FAILED] Failed unmounting /boot/efi...
  [FAILED] Failed deactivating swap /dev/disk/by-uuid/4fc3549b-84a0-4b33-
  bf8a-9884c65e7958.

*I hit Esc, and got the following:
Code:

[  OK  ] Stopped File System Check on /dev/disk/by-uuid/A909-2B14.
fsckd-cancel-msg:Press Ctrl+C to cancel all filesystem checks in progress.

*I left the system for a few hours, in case it was running a file system check. Nothing seemed to happen, so I rebooted using Alt+SysReq+B. Once again, I booted in rescue mode from my live Debian USB. This time, the NVMe was "not" found. In the window where I pick a device to use as root file system, I get the following:
Code:

/dev/sda1
/dev/sda2
Assemble RAID array
Do not use a root file system.

Any help would be greatly appreciated!

Emerson 04-26-2021 07:24 AM

Can't find your own posts? :confused: What an excuse for double posting.

GustavusHolmiensis 04-26-2021 08:09 AM

Thank you for pointing out the existence of the other thread, Emerson!

I assure you that it was an honest mistake. When I tried to post a reply to my original thread yesterday, it somehow disappeared. When I clicked on either "My Posts" or "My Threads" there was nothing there. Hence, I started this new thread and also prefaced my OP with the message that I believed I had accidentally deleted my own thread. I had probably not done so, but I don't know why I could not see it yesterday.

Thus, the present thread can be deleted. I am not sure how to do so myself, however.

boughtonp 04-26-2021 08:15 AM


 
Quote:

Originally Posted by GustavusHolmiensis (Post 6245047)
Thus, the present thread can be deleted. I am not sure how to do so myself, however.

Use the report button to ask a moderator to do it.


In the meantime, here is the direct link to the other thread:
https://www.linuxquestions.org/questions/linux-hardware-18/computer-with-debian-10-6-won%27t-boot-after-several-freezing-incidents-4175694205


GustavusHolmiensis 04-26-2021 08:24 AM

Thanks, boughtonp! I'll do that.


All times are GMT -5. The time now is 08:34 PM.