USB hard drives seem to self-disconnect for no apparent reason
Some background information:
I have two external hard drives connected to my Linux box running Debian Etch (kernel 18.104.22.168)
The two hard drives are 1) a Seagate 160GB external with built-in usb interface and 2) Western Digital 250GB attached to a Dynex usb interface (lsusb reports it is a "Cypress Semiconductor Corp.")
They are attached to a Belkin USB 2.0 PCI interface (NEC chipset according to lspci).
This has happened twice, when I'm copying a large amount of data (first attempt was during a 50GB copy and second during a 100GB copy). During the copy, the operation will fail at some point with numerous errors, an I/O error among them. According to kern.log, the numerous errors are due to a "dead device", similar to what would happen if you unplugged the drive while I/O was in progress.
So, it seems as though at some point the drives are being disconnected. Only thing is, they aren't. I didn't touch them; they are still physically connected! lsusb at this point shows nothing connected, even if i disconnect and reconnect the drives. A reboot seems to fix it.
The earliest usb-related error in the log since my last reboot is this:
usb 1-3: reset high speed USB device using ehci_h
cd and address 2
That appears about four times and then
usb 1-3: device descriptor read/64, error -71
First message appears four times, then it goes back and forth in the logs between the first and second messages. Then:
usb 1-3: USB disconnect, address 2
Then the flood of errors and "rejected I/O due to device being removed" and "Buffer I/O error" and "EXT2-fs error" and "I can't find my inodes" and "You're fscked now" and all that good stuff that happens when a drive is suddenly pulled during I/O.
Funny thing is, I can't access neither hard drive. It's like the PCI USB card dies after that or something.
This never happened when I only had one drive hooked up. Any ideas what could be causing this?
Maybe the HD is really going bad?
If the HD is going bad then why does it seemingly knock out both drives at the same time? When this happens, I can't access either drive through the USB.
Both external HD's are externally powered. Inside the box is a 350w PSU, running a 500Mhz Celeron CPU, 2 NICs, the PCI USB card, and two HDs. I don't think I'm taxing the PSU in the box.
Maybe my USB card is going bad, or it could be an external power issue...maybe...
What filesystems you got on those drives? Could be a limitation in the filesystem driver, such as filesize limitation. I ran into all kinds of nasty errors because of that in the past.
How big is the stuff you are trying to copy?
Are you doing it through SMB by any chance?
This thread might be of your interest... Apparently there is such thing as a mount timeout. Never knew there was such thing.
By the way, thank you both (and anyone else who might respond) for your quick replies and helpfulness. Even if I can't solve this problem here, I appreciate your help.
Filesystems: fat32 on the 160gb, ext2fs on the 320gb
The first time i was trying to copy via smb. I was copying a large 50gb set of files from my WinXPSP2 box to the 320gb drive. I had smb shares pointed to the symlinks usbmount makes in /var/run/usbmount.
Eventually after a couple of fscks I got those copied. My next task is to copy a 95GB set of files from the 160gb to the 320gb. This task is where my second and third mysterious disconnects happened.
The third time I was doing it (which was about 4 hours ago) was not through samba, but directly on the debian box via a cp command (as root), but still through those symlinks. I'm about to try it with the drives manually mounted.
Wouldn't a filesystem error/crash leave something in the logs before a usb disconnect message? And that doesn't explain why it seems that both drives are disconnected and the entire bus is deactivated.
Mount timeout, hmm, didn't know about that either. But, I'm getting no timeout messages in my logs...
By default samba is limited to 2gb max per file... There is an option that disables the limitation.
All PSUs are not created equal. An iffy 350 on your setup could produce lots of unusal problems and be very difficult to diagnose.
If the drives use wall warts I would also suspect these.
|All times are GMT -5. The time now is 01:21 PM.|