Quote:
I have seen several different ways in which the kernel can crash. Since patching udev-165, I haven't seen any more crashes. Ed |
Is it safe to just downgrade to udev-164? Do we expect the next slackware update to roll back to udev-164 until the bug is fixed, or not?
|
Same problem during boot a few days ago...
|
I use Slack-64-m, but I have another partition Slack32-current and
update it today, after some boots I got a kernel panic. Since the partition which use slack64-M-current (I'm using 64 now) I took a look in / var / log / syslog (the slack32-current). Below is it .. +========================================+ Code:
Jan 3 23:23:13 base2 kernel: [ 0.157045] raid6: int32x1 1281 MB/s |
Update Wed Jan 5 2011:
Okay so the bug is back! I am running slackware current and got crashes twice this morning. This has to be fixed! If I were a new user I would give up on Linux and go back to Windows... In the old days we used to make fun of Windows because of the "Blue Screen of Death" would occur so often. This is exactly what I am seeing now with Linux! ----------------------------------------------------- Jan 3 2010: I'm beginning to think that my random kernel oops are caused by my hardware. When my fairly new ps/2 mouse is connected to a KVM then I get the random kernel oops. And at other times the mouse is misconfigured by the kernel as a keyboard and does not work as a mouse. When I disconnect the KVM and plug the mouse directly into the motherboard, then it works fine. I have gone back to Slackware current using udev-165 and kernel 2.6.35.7 So far no kernel oops and the mouse is recognized as a mouse. There is still a kernel bug in there and it needs to be fixed, but hopefully it is not a very common bug. A bad mouse should not be allowed to cause a kernel oops. |
Hi all,
A follow-up to post #25. The problem is probably in the kernel scsi/sg code, specifically supporting "ATA pass-through" functionality. What's new in udev-165 is a function "disk_identify_packet_device_command" which tries a scsi SPC-4 ATA 16-bit pass-through command to identify a cd/dvd drive, and if that fails, tries an SPC-3 version of the command. I believe it is the version 3 attempt which causes the oops; commenting out line 270 of extras/ata_id/ata_id.c: 253 ret = ioctl(fd, SG_IO, &io_v4); 254 if (ret != 0) { 255 /* could be that the driver doesn't do version 4, try version 3 */ 256 if (errno == EINVAL) { 257 struct sg_io_hdr io_hdr; 258 259 memset(&io_hdr, 0, sizeof(struct sg_io_hdr)); 260 io_hdr.interface_id = 'S'; 261 io_hdr.cmdp = (unsigned char*) cdb; 262 io_hdr.cmd_len = sizeof (cdb); 263 io_hdr.dxferp = buf; 264 io_hdr.dxfer_len = buf_len; 265 io_hdr.sbp = sense; 266 io_hdr.mx_sb_len = sizeof (sense); 267 io_hdr.dxfer_direction = SG_DXFER_FROM_DEV; 268 io_hdr.timeout = COMMAND_TIMEOUT_MSEC; 269 270 // ret = ioctl(fd, SG_IO, &io_hdr); 271 if (ret != 0) 272 goto out; 273 } else { 274 goto out; 275 } 276 } appears to eliminate the panic. Also, running the "sg_sat_identify" command from the sg3_utils package (http://sg.danny.cz/sg/sg3_utils.html#mozTocId479511), eg, sg_sat_identify -p /dev/dvd works, while running it as sg_sat_identify -p -c /dev/dvd frequently produces a kernel panic which looks the same as the udevd one at bootup. The difference is the -c switch which instructs the kernel to write back ATA register data in the sense buffer. The udev-165 code also does this (setting the ck_cond bit and hence, the oops). UPDATE: Further testing shows that the ck_cond value is not relevant-- the panic results regardless of how ck_cond is set. |
As a test, on one of my test boxes, I put udev-164-i486-3 back in the system, but kept the /etc/rc.d/rc.udev from udev-165 (as that apparently creates the /dev/root properly according to the changelog).
I have yet to notice any issues with the downgrade, and haven't had the boot time kernel oops yet after a bunch of halt, reboot, suspend or hibernate. |
I just installed linux-2.6.37, and re-installed a vanilla udev-165. So far things look good, so (hopefully) the issue has been resolved in the kernel. The problem may be related to the fix (http://www.kernel.org/pub/linux/kern...geLog-2.6.37):
commit 2a5f07b5ec098edc69e05fdd2f35d3fbb1235723 Author: Tejun Heo <tj@kernel.org> Date: Mon Nov 1 11:39:19 2010 +0100 libata: fix NULL sdev dereference race in atapi_qc_complete() SCSI commands may be issued between __scsi_add_device() and dev->sdev assignment, so it's unsafe for ata_qc_complete() to dereference dev->sdev->locked without checking whether it's NULL or not. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com> |
I am still quietly putting up with the issue which sees a failed boot one in every 10 or so boots. Any word on an 'official' fix from the slackware team? Otherwise I suspect I shall simply downgrade the udev package.....
|
That patch is in 2.6.35.10; has anyone reproduced the problem with that kernel, perchance?
New kernels in -current are still a little ways out, I think - some other stuff probably needs to hit the tree first. |
Quote:
Code:
andrew@skamandros~$ uname -r |
1 Attachment(s)
Follow-up to post #38:
I jumped the gun... The problem remains in kernel linux-2.6.37. Follow-up to post #36: The problem is probably in the kernel block, drivers/scsi/sg, or drivers/scsi/sd code, specifically related to "ATA pass-through" functionality, and probably only occurs for certain drive hardware. I don't know anything about this code, so until someone who does can fix it, using udev-164 (which doesn't use the ATA pass-through command on cd/dvd devices in ata_id.c), or commenting out this command in ata_id.c for udev-165, will side-step the issue for me. Additional experimentation: Another possible cause of this oops could be inappropriate buffer alignment. I built udev-165 with ata_id.c patched to use page-aligned sense and response buffers (rather than simple unsigned char arrays), and so far it looks promising- no panics yet (see attached patch). |
Quote:
|
Update Fri Jan 7:
2.6.36.10 and udev-165 crashed the same way this morning... ---------------------------------------------------- Thurs Jan 6: Quote:
And I now use 2.6.35.10 and udev-165 along with the rest of Slackware current. My set up is a new Intel dual core D510MO motherboard using an sata drive hooked up to the motherboard disk controller. |
Quote:
|
All times are GMT -5. The time now is 07:48 PM. |