I have recently compiled kernel 4.4 to use with BLFS on a Thinkpad. I am trying this kernel branch because it seems some issues with i915 graphics _might_ have been fixed well enough to try to hibernate/resume when booting with an initrd. Before this kernel version hibernate more often than not (read: almost always, except for the occasional accident) failed because of graphics problems (invalid ROM contents - resetting anything with the gpu will make the resumed data not match the checksum of the swap/resume image). With this kernel I have reliably resumed, but I cannot use this kernel long enough to test this sufficiently because of other problems.
First, I am having serious problems using an external hard drive. This is a 2.5-inch SATA drive (traditional hard drive, not solid state) with a USB enclosure. When I connect the usb cable, I get errors such as this:
Code:
Nov 2 22:36:14 hostname kernel: [ 170.563324] usb 1-1: new high-speed USB device number 3 using ehci-pci
Nov 2 22:36:14 hostname kernel: [ 170.681163] usb 1-1: New USB device found, idVendor=13fd, idProduct=3940
Nov 2 22:36:14 hostname kernel: [ 170.681177] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Nov 2 22:36:14 hostname kernel: [ 170.681184] usb 1-1: Product: MK1665GSX
Nov 2 22:36:14 hostname kernel: [ 170.681191] usb 1-1: Manufacturer: TOSHIBA
Nov 2 22:36:14 hostname kernel: [ 170.681197] usb 1-1: SerialNumber: 30303030303030303030303030303030
Nov 2 22:36:14 hostname kernel: [ 170.681780] usb-storage 1-1:1.0: USB Mass Storage device detected
Nov 2 22:36:14 hostname kernel: [ 170.682714] scsi host4: usb-storage 1-1:1.0
Nov 2 22:36:15 hostname kernel: [ 171.685542] scsi 4:0:0:0: Direct-Access TOSHIBA MK1665GSX 0204 PQ: 0 ANSI: 6
Nov 2 22:36:15 hostname kernel: [ 171.686488] sd 4:0:0:0: Attached scsi generic sg2 type 0
Nov 2 22:36:15 hostname kernel: [ 171.691693] sd 4:0:0:0: [sdb] Spinning up disk...
Nov 2 22:36:16 hostname kernel: [ 172.692342] .ready
Nov 2 22:36:16 hostname kernel: [ 172.693512] sd 4:0:0:0: [sdb] 312581807 512-byte logical blocks: (160 GB/149 GiB)
Nov 2 22:36:16 hostname kernel: [ 172.694314] sd 4:0:0:0: [sdb] Write Protect is off
Nov 2 22:36:16 hostname kernel: [ 172.694321] sd 4:0:0:0: [sdb] Mode Sense: 1f 00 10 08
Nov 2 22:36:16 hostname kernel: [ 172.695347] sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
Nov 2 22:36:16 hostname kernel: [ 172.738591] sdb: sdb1
Nov 2 22:36:16 hostname kernel: [ 172.742094] sd 4:0:0:0: [sdb] Attached SCSI disk
Nov 2 22:36:16 hostname kernel: [ 173.010061] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
Nov 2 22:36:16 hostname kernel: [ 173.010070] sd 4:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current] [descriptor]
Nov 2 22:36:16 hostname kernel: [ 173.010074] sd 4:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information
Nov 2 22:36:16 hostname kernel: [ 173.010079] sd 4:0:0:0: [sdb] tag#0 CDB: ATA command pass through(12)/Blank a1 06 20 00 00 00 00 00 00 e5 00 00
Nov 2 22:36:17 hostname kernel: [ 173.388546] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
Nov 2 22:36:17 hostname kernel: [ 173.388554] sd 4:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current] [descriptor]
Nov 2 22:36:17 hostname kernel: [ 173.388558] sd 4:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information
Nov 2 22:36:17 hostname kernel: [ 173.388564] sd 4:0:0:0: [sdb] tag#0 CDB: ATA command pass through(12)/Blank a1 06 20 da 00 00 4f c2 00 b0 00 00
Nov 2 22:36:37 hostname kernel: [ 193.748218] EXT4-fs (dm-2): mounting ext3 file system using the ext4 subsystem
Nov 2 22:36:37 hostname kernel: [ 193.807845] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null)
This would ordinarily signal a failing hard drive - but this only happens with kernel 4.4. And I even saw similar errors when I plugged in an ordinary flash drive. If I use kernel 3.2 or 3.14, the drive starts right up and all is well:
Code:
Nov 4 03:33:17 hostname kernel: [ 27.695330] usb 1-1: new high-speed USB device number 3 using ehci_hcd
Nov 4 03:33:17 hostname kernel: [ 27.812952] usb 1-1: New USB device found, idVendor=13fd, idProduct=3940
Nov 4 03:33:17 hostname kernel: [ 27.813079] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Nov 4 03:33:17 hostname kernel: [ 27.813170] usb 1-1: Product: MK1665GSX
Nov 4 03:33:17 hostname kernel: [ 27.813232] usb 1-1: Manufacturer: TOSHIBA
Nov 4 03:33:17 hostname kernel: [ 27.813286] usb 1-1: SerialNumber: 30303030303030303030303030303030
Nov 4 03:33:17 hostname kernel: [ 27.814714] scsi4 : usb-storage 1-1:1.0
Nov 4 03:33:18 hostname kernel: [ 28.817793] scsi 4:0:0:0: Direct-Access TOSHIBA MK1665GSX 0204 PQ: 0 ANSI: 6
Nov 4 03:33:18 hostname kernel: [ 28.818625] sd 4:0:0:0: Attached scsi generic sg2 type 0
Nov 4 03:33:19 hostname kernel: [ 28.822108] sd 4:0:0:0: [sdb] Spinning up disk....ready
Nov 4 03:33:19 hostname kernel: [ 29.824408] sd 4:0:0:0: [sdb] 312581807 512-byte logical blocks: (160 GB/149 GiB)
Nov 4 03:33:19 hostname kernel: [ 29.825424] sd 4:0:0:0: [sdb] Write Protect is off
Nov 4 03:33:19 hostname kernel: [ 29.825497] sd 4:0:0:0: [sdb] Mode Sense: 1f 00 10 08
Nov 4 03:33:19 hostname kernel: [ 29.826252] sd 4:0:0:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
Nov 4 03:33:19 hostname kernel: [ 29.866420] sdb: sdb1
Nov 4 03:33:19 hostname kernel: [ 29.870627] sd 4:0:0:0: [sdb] Attached SCSI disk
It seems to me that kernel 4.4 isn't getting the "sense" information right. This is dangerous, because shortly after inserting this drive and doing some routine read/write activity, my entire desktop locked up. After seeing these errors I booted from a rescue cd (with an older kernel) and ran fsck to recover the journal (ext3).
Might anyone know what part of the kernel is responsible for this sort of thing, and be aware of any patch or any mandatory configure option that wasn't required in earlier versions?
And now for another issue: with this same kernel, i.e. vanilla kernel 4.4.14 (same version as used in Slackware) I get lockups within a minute or two after suspend/resume or hibernate/resume, and see stuff like this in the kernel log:
Code:
Nov 3 16:20:38 hostname kernel: [25082.316755] EXT4-fs (dm-0): re-mounted. Opts: commit=0
Nov 3 16:20:38 hostname kernel: [25082.319792] EXT4-fs (dm-2): re-mounted. Opts: data=ordered,commit=0
Nov 3 16:20:38 hostname kernel: [25082.322521] EXT4-fs (loop0): re-mounted. Opts: data=ordered,commit=0
Nov 3 16:20:38 hostname kernel: [25082.325421] EXT4-fs (dm-0): re-mounted. Opts: data=ordered,commit=0
Nov 3 16:20:38 hostname kernel: [25082.327396] EXT4-fs (dm-0): re-mounted. Opts: data=ordered,commit=0
Nov 3 16:20:38 hostname kernel: [25082.329399] EXT4-fs (dm-2): re-mounted. Opts: data=ordered,commit=0
Nov 3 16:20:38 hostname kernel: [25082.331298] EXT4-fs (dm-2): re-mounted. Opts: data=ordered,commit=0
Nov 3 16:22:30 hostname kernel: [25193.752804] general protection fault: 0000 [#2] SMP
where dm-0 is my root (/) partition and dm-2 is my external hard drive. Both are formatted as ext3. I know this kernel handles ext3 partitions through the ext4 driver but come on, it should know not to remount a live root ext3 partition so many times and not expect tears to fall.
I reverted commit e31fb9e00543e5d3c5b686747d3c862bc09b59f3 (i.e. the commit that purged ext3) and rebuilt, and this problem went away as I suspected. But I still cannot use my external hard drive.