LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   SATA drives suddenly malfunctional (https://www.linuxquestions.org/questions/linux-hardware-18/sata-drives-suddenly-malfunctional-447593/)

Clemente 05-23-2006 02:14 AM

SATA drives suddenly malfunctional
 
Hi,

I got some trouble with two sata drives connected to a winfast mainboard (unknown model). The system is running debian sarge with kernel 2.6.8-11-amd64-k8.

As mentioned, I connected two 200GB sata drives (sda, sdb), organized as software raid1. All went fine fpr several months and reboots. Suddenly, the system didn't recognize the drives correctly.
While booting, it hangs several minutes. After coming up, sdb simply isn't available. Any access to /dev/sda (fdisk or mount) results in a unkillable process (output at bottom of posting).

Dmesg shows some lines, that look suspect to me:
Code:

ACPI: PCI interrupt 0000:00:05.0[A] -> GSI 17 (level, low) -> IRQ 17
ata1: SATA max UDMA/133 cmd 0xE900 ctl 0xEA02 bmdma 0xED00 irq 17
ata2: SATA max UDMA/133 cmd 0xEB00 ctl 0xEC02 bmdma 0xED08 irq 17
ata1: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4673 85:7c69 86:3e01 87:4663 88:007f
ata1: dev 0 ATA, max UDMA/133, 398297088 sectors: lba48
ata1: dev 0 configured for UDMA/133
scsi0 : sata_sis
ata2: no device found (phy stat 00000000)
scsi1 : sata_sis
  Vendor: ATA      Model: Maxtor 6L200M0    Rev: BANC
  Type:  Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sda: drive cache: write back
 /dev/scsi/host0/bus0/target0/lun0:<3>ata1: command 0x25 timeout, stat 0x50 host _stat 0x24
 p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0

Does the acpi line or this irp 17 thing mean something in relation to the drive problem?

Thanks a lot,
Clemente

--

Little more dmesg output:
Code:

ACPI: PCI interrupt 0000:00:03.3[D] -> GSI 23 (level, low) -> IRQ 23
ehci_hcd 0000:00:03.3: Silicon Integrated Systems [SiS] USB 2.0 Controller
ehci_hcd 0000:00:03.3: irq 23, pci mem ffffff0000274000
ehci_hcd 0000:00:03.3: new USB bus registered, assigned bus number 4
PCI: cache line size of 64 is not supported by device 0000:00:03.3
ehci_hcd 0000:00:03.3: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 8 ports detected
ACPI: PCI interrupt 0000:00:05.0[A] -> GSI 17 (level, low) -> IRQ 17
ata1: SATA max UDMA/133 cmd 0xE900 ctl 0xEA02 bmdma 0xED00 irq 17
ata2: SATA max UDMA/133 cmd 0xEB00 ctl 0xEC02 bmdma 0xED08 irq 17
ata1: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4673 85:7c69 86:3e01 87:4663 88:007f
ata1: dev 0 ATA, max UDMA/133, 398297088 sectors: lba48
ata1: dev 0 configured for UDMA/133
scsi0 : sata_sis
ata2: no device found (phy stat 00000000)
scsi1 : sata_sis
  Vendor: ATA      Model: Maxtor 6L200M0    Rev: BANC
  Type:  Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sda: drive cache: write back
 /dev/scsi/host0/bus0/target0/lun0:<3>ata1: command 0x25 timeout, stat 0x50 host _stat 0x24
 p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
eth0: Media Link On 100mbps full-duplex
NET: Registered protocol family 10
Disabled Privacy Extensions on device ffffffff80338ce0(lo)


Unkillable Process:
Code:

root@server1 : ~ : 09:04
>ps aux
root      2579  0.0  0.2  8312 2604 ?        S    May16  0:00 /usr/sbin/smbd -D
root    24034  0.0  0.0  1888  632 ?        D    May22  0:00 fdisk -l
root    24067  0.0  0.0  1756  724 ?        Ss  May22  0:00 /usr/sbin/cron


WhatsHisName 05-24-2006 12:01 AM

It sounds more like a hardware problem than something to do with the OS.

You should download the drive manufacturer’s diagnostic utility and test both drives with it.

If a drive fails the testing, then move it to another system and test it again. If it fails again, then the solution is fairly obvious. Replace it.

If it passes in the second system, then look for things like a failing power supply or a malfunctioning controller in the original system.


All times are GMT -5. The time now is 09:49 AM.