LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Is this normal operation for a SATA hot swap? (https://www.linuxquestions.org/questions/linux-hardware-18/is-this-normal-operation-for-a-sata-hot-swap-825331/)

dreamgear 08-10-2010 08:23 AM

Is this normal operation for a SATA hot swap?
 
My question is regarding the messages generated by hot-swapping a (unmounted) SATA hard drive. Is this normal? Before I start to depend on this system I want to know that things are working as intended.

I'm running Ubuntu 10.04 Server on a newly built system - Intel H55HC board with Intel I3-540 processor.

I have a hot-swap disk cage with 4x WD 2TB 7200RPM drives.

I want to hot-swap two of the disks. When I remove them (after unmounting) I get the following messages (as revealed by dmesg):
Code:

[ 2050.175552] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[ 2050.175817] ata3: irq_stat 0x00400040, connection status changed
[ 2050.176107] ata3: SError: { PHYRdyChg 10B8B DevExch }
[ 2050.176420] ata3: hard resetting link
[ 2050.923016] ata3: SATA link down (SStatus 0 SControl 300)
[ 2055.912461] ata3: hard resetting link
[ 2056.261745] ata3: SATA link down (SStatus 0 SControl 300)
[ 2056.261758] ata3: limiting SATA link speed to 1.5 Gbps
[ 2056.605713] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[ 2056.606248] ata4: irq_stat 0x00400040, connection status changed
[ 2056.606814] ata4: SError: { PHYRdyChg 10B8B DevExch }
[ 2056.607406] ata4: hard resetting link
[ 2057.349479] ata4: SATA link down (SStatus 0 SControl 300)
[ 2061.251193] ata3: hard resetting link
[ 2061.600498] ata3: SATA link down (SStatus 0 SControl 310)
[ 2061.600510] ata3.00: disabled
[ 2061.600523] ata3: EH complete
[ 2061.600562] ata3.00: detaching (SCSI 2:0:0:0)
[ 2061.631552] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[ 2061.631718] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 2061.631724] sd 2:0:0:0: [sdc] Stopping disk
[ 2061.631733] sd 2:0:0:0: [sdc] START_STOP FAILED
[ 2061.631736] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 2062.338960] ata4: hard resetting link
[ 2062.688230] ata4: SATA link down (SStatus 0 SControl 300)
[ 2062.688241] ata4: limiting SATA link speed to 1.5 Gbps
[ 2067.677655] ata4: hard resetting link
[ 2068.026983] ata4: SATA link down (SStatus 0 SControl 310)
[ 2068.026994] ata4.00: disabled
[ 2068.027005] ata4: EH complete
[ 2068.027062] ata4.00: detaching (SCSI 3:0:0:0)
[ 2068.058106] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
[ 2068.058144] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 2068.058149] sd 3:0:0:0: [sdd] Stopping disk
[ 2068.058158] sd 3:0:0:0: [sdd] START_STOP FAILED
[ 2068.058160] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
jeff@squirrel:~$


The question is, is this normal? I've not been able to see any problem, but I want to know if these messages should be expected.

When I reinsert the two drives I get more messages:
Code:

[ 7447.305647] ata2: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[ 7447.305653] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[ 7447.305658] ata1: irq_stat 0x00400040, connection status changed
[ 7447.305662] ata1: SError: { PHYRdyChg 10B8B DevExch }
[ 7447.305670] ata1: hard resetting link
[ 7447.310217] ata2: irq_stat 0x00400040, connection status changed
[ 7447.311511] ata2: SError: { PHYRdyChg 10B8B DevExch }
[ 7447.312836] ata2: hard resetting link
[ 7447.538488] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[ 7447.539719] ata4: irq_stat 0x00000040, connection status changed
[ 7447.541077] ata4: SError: { RecovComm PHYRdyChg CommWake DevExch }
[ 7447.542584] ata4: hard resetting link
[ 7448.282797] ata4: SATA link down (SStatus 0 SControl 300)
[ 7448.282808] ata4: EH complete
[ 7448.532302] ata1: SATA link down (SStatus 0 SControl 300)
[ 7448.532311] ata1.00: link offline, clearing class 1 to NONE
[ 7448.578091] ata1: hard resetting link
[ 7448.609857] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[ 7448.611254] ata4: irq_stat 0x00400040, connection status changed
[ 7448.612854] ata4: SError: { RecovComm PHYRdyChg CommWake DevExch }
[ 7448.614597] ata4: hard resetting link
[ 7453.082668] ata2: link is slow to respond, please be patient (ready=0)
[ 7453.960820] ata1: link is slow to respond, please be patient (ready=0)
[ 7454.399892] ata4: link is slow to respond, please be patient (ready=0)
[ 7456.744952] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7456.780966] ata2.00: configured for UDMA/133
[ 7456.780972] ata2: EH complete
[ 7457.713237] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7457.749425] ata1.00: configured for UDMA/133
[ 7457.749431] ata1: EH complete
[ 7458.601053] ata4: COMRESET failed (errno=-16)
[ 7458.602755] ata4: hard resetting link
[ 7460.129404] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[ 7460.129410] ata2: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
[ 7460.129415] ata2: irq_stat 0x00400040, connection status changed
[ 7460.129420] ata2: SError: { PHYRdyChg 10B8B DevExch }
[ 7460.129427] ata2: hard resetting link
[ 7460.135809] ata1: irq_stat 0x00400040, connection status changed
[ 7460.137465] ata1: SError: { PHYRdyChg 10B8B DevExch }
[ 7460.139145] ata1: hard resetting link
[ 7460.212549] ata3: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[ 7460.214059] ata3: irq_stat 0x00000040, connection status changed
[ 7460.215770] ata3: SError: { RecovComm PHYRdyChg CommWake DevExch }
[ 7460.217548] ata3: hard resetting link
[ 7464.389159] ata4: link is slow to respond, please be patient (ready=0)
[ 7465.915634] ata1: link is slow to respond, please be patient (ready=0)
[ 7465.915946] ata2: link is slow to respond, please be patient (ready=0)
[ 7466.005447] ata3: link is slow to respond, please be patient (ready=0)
[ 7468.600294] ata4: COMRESET failed (errno=-16)
[ 7468.601779] ata4: hard resetting link
[ 7469.039366] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7469.075254] ata1.00: configured for UDMA/133
[ 7469.075261] ata1: EH complete
[ 7469.338725] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7469.377672] ata2.00: configured for UDMA/133
[ 7469.377679] ata2: EH complete
[ 7470.186664] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7470.196873] ata4.00: ATA-8: WDC WD2001FASS-00U0B0, 01.00101, max UDMA/133
[ 7470.196878] ata4.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 7470.200350] ata4.00: configured for UDMA/133
[ 7470.200358] ata4: EH complete
[ 7470.200500] scsi 3:0:0:0: Direct-Access ATA WDC WD2001FASS-0 01.0 PQ: 0 ANSI: 5
[ 7470.200700] sd 3:0:0:0: Attached scsi generic sg2 type 0
[ 7470.200968] sd 3:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 7470.201049] sd 3:0:0:0: [sdc] Write Protect is off
[ 7470.201055] sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 7470.201101] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 7470.201366] sdc: unknown partition table
[ 7470.206602] sd 3:0:0:0: [sdc] Attached SCSI disk
[ 7470.206882] ata3: COMRESET failed (errno=-16)
[ 7470.208588] ata3: hard resetting link
[ 7475.984419] ata3: link is slow to respond, please be patient (ready=0)
[ 7480.235462] ata3: COMRESET failed (errno=-16)
[ 7480.236981] ata3: hard resetting link
[ 7481.283272] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7481.302440] ata3.00: ATA-8: WDC WD2001FASS-00U0B0, 01.00101, max UDMA/133
[ 7481.302446] ata3.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 7481.305596] ata3.00: configured for UDMA/133
[ 7481.305607] ata3: EH complete
[ 7481.305752] scsi 2:0:0:0: Direct-Access ATA WDC WD2001FASS-0 01.0 PQ: 0 ANSI: 5
[ 7481.305954] sd 2:0:0:0: Attached scsi generic sg3 type 0
[ 7481.306100] sd 2:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 7481.306157] sd 2:0:0:0: [sdd] Write Protect is off
[ 7481.306161] sd 2:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 7481.306191] sd 2:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 7481.306552] sdd: unknown partition table
[ 7481.312636] sd 2:0:0:0: [sdd] Attached SCSI disk
jeff@squirrel:~$


I can then mount and use the drives without incident:
Code:

[ 7648.578187] EXT4-fs (sdc): mounted filesystem with ordered data mode
[ 7650.243049] EXT4-fs (sdd): mounted filesystem with ordered data mode
jeff@squirrel:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 96121548 1568132 89670632 2% /
none 1921120 268 1920852 1% /dev
none 1925676 0 1925676 0% /dev/shm
none 1925676 268 1925408 1% /var/run
none 1925676 0 1925676 0% /var/lock
none 1925676 0 1925676 0% /lib/init/rw
none 96121548 1568132 89670632 2% /var/lib/ureadahead/debugfs
/dev/md2 1845856816 132421664 1621130620 8% /home
/dev/sdc 1953267136 105394424 1750196984 6% /bigd1
/dev/sdd 1953267136 104402004 1751189404 6% /bigd2
jeff@squirrel:~$

I'm somewhat concerned because the links to ata1 and ata2 are also reset - these drives hold the raid set with the system disk.

So what can you all tell me? Thanks in advance...

dreamgear 08-10-2010 03:49 PM

FWIW this might be an authoritative source: https://ata.wiki.kernel.org/index.php/Main_Page

Haven't parsed it yet.

Soadyheid 08-12-2010 06:08 PM

Are the four disks set up in a single RAID5 volume? Probably not as pulling two drives would be fatal. I take it ata1 and 2 are RAIDed (RAID0; system & mirror?) I take it it's a software RAID as I doubt you'd get anything logged in the messages file for a hardware one. Normally with a RAID you can hot swap one drive at a time without loosing any resilience. You'd need to let the RAID rebuild before pulling a second one. Then again... some hardware RAID arrays allow a RAID5 volume to be set up plus a hot spare which just sits in the array and is sync'd up when one of the disks within the RAID fails. you then just hot swap the faulty disk which is either rebuilt or becomes the hot spare.
Yes I'd expect to get an entry in the messages file (Just like plugging and unplugging a USB device!) Er... you haven't mentioned why you would want to pull two disks. I find that strange :)

Play Bonny! :hattip:

dreamgear 08-16-2010 12:42 PM

Thanks for your reply

Quote:

Originally Posted by Soadyheid (Post 4064642)
Are the four disks set up in a single RAID5 volume? Probably not as pulling two drives would be fatal. I take it ata1 and 2 are RAIDed (RAID0; system & mirror?) I take it it's a software RAID as I doubt you'd get anything logged in the messages file for a hardware one.

Yes, ata1 and ata2 are the system disk - software raid sets.

Quote:

Yes I'd expect to get an entry in the messages file (Just like plugging and unplugging a USB device!) Er... you haven't mentioned why you would want to pull two disks. I find that strange :)
Well I probably won't pull both at once, but the intended use is to create an archive and then physically remove it to a second location.

I was concerned in particular about ata1 and ata2 resetting their link even though neither were removed. Is that normal?

Thanks again for your help.

Soadyheid 08-16-2010 05:32 PM

Quote:

I was concerned in particular about ata1 and ata2 resetting their link even though neither were removed. Is that normal?
The SATA drives are probably connected to the same controller chip which handles four drives. When you remove a drive, the controller logs the message in the messages file. When you re-insert a drive it recognises that something has changed and has to reset itself to re-scan for drives, hence resetting the link. :)

Are you going to set ATA3 & 4 up as RAID 0 as well which would mean that they would both contain the same info? Pulling one as an archive... I don't think that replacing it at a later date would work, the data on the disk which wasn't removed would be re-written to the "archive" one. :( Each volume in the RAID contains a small "system" partition which defines the RAID and allows a faulty disk to be re-generated from the good remaining one. However, if ATA3 and 4 were just "straight" single disks I don't think this problem would arise and you'd be able to use your archive idea.
Anyway, that's my thoughts on the issue, I hope they're of some use.

Play Bonny! :hattip:

dreamgear 08-16-2010 08:36 PM

Quote:

Originally Posted by Soadyheid (Post 4068238)
The SATA drives are probably connected to the same controller chip which handles four drives. When you remove a drive, the controller logs the message in the messages file. When you re-insert a drive it recognises that something has changed and has to reset itself to re-scan for drives, hence resetting the link. :)

Are you going to set ATA3 & 4 up as RAID 0 as well which would mean that they would both contain the same info? Pulling one as an archive... I don't think that replacing it at a later date would work, the data on the disk which wasn't removed would be re-written to the "archive" one. :( Each volume in the RAID contains a small "system" partition which defines the RAID and allows a faulty disk to be re-generated from the good remaining one. However, if ATA3 and 4 were just "straight" single disks I don't think this problem would arise and you'd be able to use your archive idea.
Anyway, that's my thoughts on the issue, I hope they're of some use.

Play Bonny! :hattip:

I interpreted the messages the same way.. I just wanted to see if any red flags would go up when others in the know looked at them.

ATA3 and 4 are indeed "straight" disks - they will each be set up with a directory that will be the target for Bacula "copy" jobs (see bacula.org), as well as a directory to contain backed up VMware ESX system images. We're a small-medium sized company, and at the moment one 2TB disk will take the whole shebang. (since email was moved to the cloud) I plan to build another system at another site with a similar hot-swap SATA cage, where the drive which has been removed from the first system will live. Then on the next archive cycle the 2nd archive disk will be moved to the backup site and the 1st moved back to the main site. Depth could be added by simply buying more $169 disks and keeping them in a locked drawer in another town.

Thanks again. I will operate under the assumption that the messages do not indicate a problem. If anyone knows anything to the contrary, let me know. Now to build the latest version of Bacula and run it in parallel with my existing system for a while.

Soadyheid 08-17-2010 09:17 AM

Cheers! Dreamgear and good luck!

Play Bonny! :hattip:


All times are GMT -5. The time now is 08:47 AM.