[SOLVED] External USB hard drive - file corruption
Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Started out trickling down hard drives. 2 - 1 TB SATA drives into server, 2 - 320 GB SATA drives from server to PCs etc. Anyhow I ended up with some spare PATA (IDE) drives (IBM/Hitachi Deskstars and a Seagate Barracuda). Looking for something to use them for besides paper weights I got hold of a "PortaDrive" gizmo which connects to an IDE or SATA drive and converts it to USB. So I connected the Barracuda, duly wiped and reformatted to NTFS, to the device and then to the server (Ubuntu 8.04.2).
The drive came up as expected. I used Gnome to drag and drop 10 large (4.5 GB PGP disk image) files to the external drive to test speed. About half what I get SATA drive to drive so not too bad. I then unmounted the external drive and connected it to an XP machine. I ran a file compare over the network to monitor the external drive performance vs. network. Again, not bad except...
All 10 if the files showed minor errors when compared to the originals. The errors were too small to be found by MD5. I reconnected the drive to the Linux machine and confirmed the errors with cmp.
I have tried 3 different drives, a drive formatted ext3, large and mixed size files. In every case I get some file corruption. I have connected the drives by PATA cable directly to a machine and copied the same file - no errors. I have copied smaller selections of files to a USB 8 GB flash drive on the same machines. No errors.
So I guess my questions are...
Are USB connected hard just drives flaky by nature? (Perhaps it is the little connector gizmo, I have a different unit on the way).
Is there a way to copy files in Linux with on-the-fly verification? Something like DOW/Windows xcopy -v ?
I have looked at the options with cp and rsync and am not sure I find any to do the trick.
TIA,
Ken
p.s. Perhaps 40 - 160 GB drives are not good for anything but paper weights these days. With 1 TB drives are going for as low as $80US... My first 42 MB Seagate was $399 :-((
"The errors were too small to be found by MD5"
What do you mean with that? I know that md5 is being replaced by sha because of some problems with a too high number of possible collisions, but I am not aware that MD5 does a kind of approximation, ignoring some data.
USB HDDs work and are supposed to work the same way as sata, IDE, esata, scsi, etc. No loss of data is supposed to happen.
I therefore think your converter, or some USB cable you're using (e.g. not longer than 5m), is doing dirty tricks.
I thought the MD5 result was quite interesting. Basically I performed an MD5 calculation on each of the 10 large files as they sat on the server. I then did an MD5 calculation on each of the files as they sat on the USB drive. All of the MD5 values matched file by file! When I had moved the USB drive to the XP machine I again ran the MD5 calculations and again the values matched the other 2 runs.
From what I have read about MD5 it is theoretically possible to defeat it - that is get the same MD5 number from two different files. However, that was supposed to be more theoretical than practical. That said I also recall something about the MD5 algorithm treating data from the beginning of the file with more significance than later data. Perhaps with a LARGE file containing seemingly random data, it is encrypted after all, MD5 just doesn't cope.
As to the size of the error... cmp found bad data and, it being binary, attempted to display the mismatch. I would guess that perhaps 200 or so characters appeared in the result.
I opened two matching files with PGP and did a compare of the data files within. Out of 25K+ small files, only 6 mismatched. Not sure how PGP handles corruption within an encrypted disk file but at least the entire disk image file was not lost.
The USB cable is hard wired to the converter and is at most 2 feet long. I have one of the external enclosure type converters coming in Tuesday if Fed Ex is on schedule and I will do some more testing. Perhaps I need to write my own file comparison routine to better quantify the differences. I once did that when I was FTPing 80,000+ AutoCad drawings to a Silicon Graphics box, renaming them, converting the to MicroStation then force feeding them into a drawing management system. MD5 would have been great for that application as they were small files. However, I did not know about MD5 then and so I rolled my own. I wonder where I put the source??? I guess I should look into sha and see if that will help.
Update - it seems that the problem was the USB converter/adapter gizmo I was using. I procured an external drive box/adapter, installed a 160 GB Barracuda in it, connected it to various machines, copied and verified files. Works great!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.