External USB hard drive - file corruption
Started out trickling down hard drives. 2 - 1 TB SATA drives into server, 2 - 320 GB SATA drives from server to PCs etc. Anyhow I ended up with some spare PATA (IDE) drives (IBM/Hitachi Deskstars and a Seagate Barracuda). Looking for something to use them for besides paper weights I got hold of a "PortaDrive" gizmo which connects to an IDE or SATA drive and converts it to USB. So I connected the Barracuda, duly wiped and reformatted to NTFS, to the device and then to the server (Ubuntu 8.04.2).
The drive came up as expected. I used Gnome to drag and drop 10 large (4.5 GB PGP disk image) files to the external drive to test speed. About half what I get SATA drive to drive so not too bad. I then unmounted the external drive and connected it to an XP machine. I ran a file compare over the network to monitor the external drive performance vs. network. Again, not bad except...
All 10 if the files showed minor errors when compared to the originals. The errors were too small to be found by MD5. I reconnected the drive to the Linux machine and confirmed the errors with cmp.
I have tried 3 different drives, a drive formatted ext3, large and mixed size files. In every case I get some file corruption. I have connected the drives by PATA cable directly to a machine and copied the same file - no errors. I have copied smaller selections of files to a USB 8 GB flash drive on the same machines. No errors.
So I guess my questions are...
Are USB connected hard just drives flaky by nature? (Perhaps it is the little connector gizmo, I have a different unit on the way).
Is there a way to copy files in Linux with on-the-fly verification? Something like DOW/Windows xcopy -v ?
I have looked at the options with cp and rsync and am not sure I find any to do the trick.
p.s. Perhaps 40 - 160 GB drives are not good for anything but paper weights these days. With 1 TB drives are going for as low as $80US... My first 42 MB Seagate was $399 :-((
"The errors were too small to be found by MD5"
What do you mean with that? I know that md5 is being replaced by sha because of some problems with a too high number of possible collisions, but I am not aware that MD5 does a kind of approximation, ignoring some data.
USB HDDs work and are supposed to work the same way as sata, IDE, esata, scsi, etc. No loss of data is supposed to happen.
I therefore think your converter, or some USB cable you're using (e.g. not longer than 5m), is doing dirty tricks.
I thought the MD5 result was quite interesting. Basically I performed an MD5 calculation on each of the 10 large files as they sat on the server. I then did an MD5 calculation on each of the files as they sat on the USB drive. All of the MD5 values matched file by file! When I had moved the USB drive to the XP machine I again ran the MD5 calculations and again the values matched the other 2 runs.
From what I have read about MD5 it is theoretically possible to defeat it - that is get the same MD5 number from two different files. However, that was supposed to be more theoretical than practical. That said I also recall something about the MD5 algorithm treating data from the beginning of the file with more significance than later data. Perhaps with a LARGE file containing seemingly random data, it is encrypted after all, MD5 just doesn't cope.
As to the size of the error... cmp found bad data and, it being binary, attempted to display the mismatch. I would guess that perhaps 200 or so characters appeared in the result.
I opened two matching files with PGP and did a compare of the data files within. Out of 25K+ small files, only 6 mismatched. Not sure how PGP handles corruption within an encrypted disk file but at least the entire disk image file was not lost.
The USB cable is hard wired to the converter and is at most 2 feet long. I have one of the external enclosure type converters coming in Tuesday if Fed Ex is on schedule and I will do some more testing. Perhaps I need to write my own file comparison routine to better quantify the differences. I once did that when I was FTPing 80,000+ AutoCad drawings to a Silicon Graphics box, renaming them, converting the to MicroStation then force feeding them into a drawing management system. MD5 would have been great for that application as they were small files. However, I did not know about MD5 then and so I rolled my own. I wonder where I put the source??? I guess I should look into sha and see if that will help.
Thanks again for the reply.
Update - it seems that the problem was the USB converter/adapter gizmo I was using. I procured an external drive box/adapter, installed a 160 GB Barracuda in it, connected it to various machines, copied and verified files. Works great!
Thx for the information
|All times are GMT -5. The time now is 05:11 AM.|