mv command, data integrity
Greetings.
I am creating a cron job on an archlinux server. This cron job should run daily, moving audio files and large bitmap files from a local directory A, to a remote directory (a samba share) B. My first idea was to
To make sure the data finds its way to B exactly as it was in A, I felt the need to investigate how mv works. According to the man file and the Debian info files, mv will copy the file from A to C, and only delete the original from A when the transfer to C completes successfully. However, this documentation does not specify whether "successfully" just means the file was transferred, or whether it also does integrity checking (like computing an md5sum checksum etc). Does mv do this? If not, I need to use a different utility. I imagine loads of people have this very same need, and that this problem has been solved before. What other utility is ideal for this purpose? Or do I use a combination of cp and rm, and do the md5sum check myself? Thanks for your help, Willard. |
A few ramblings from me: I haven't reviewed the mv(1) source code. Like you, I poked through its man pages and info entry (in coreutils).
My gut feeling is that using mv to push the file over tcp should result in a "successful" operation (read: identical before and after), or a noisy error if something went awry. That said, it's trivial enough to do a sha1sum(1) of the file before and after the copy, a la:
Might as well be sure about the file's integrity and sleep well at night. |
There's always rsync; I believe that is very careful about checking whether it went ok.
|
Short version: How do I ensure that, when I have issued a shell command to copy file F from directory A to mount point C, that F has reached B before proceeding?
(I am thinking sync or fsync, but fsync is a C function, the man page of sync is useless, and the info page of sync does not explain what happens when you flush a mount of a remote directory) Long version: Quote:
Quote:
I realized that I did not need cryptographically-strong checksums, so md5sum should be fine. The remote machine has md5sum installed, so an option which does not require the file to be transferred back to local involves having remote do the md5sum on the received file, and make the result available to local. A much more elegant solution, however, would be to ensure that the file has reached remote when copied to C. One way of doing this would be to remount (unmount, then mount) C, as this flushes buffers. It would be better, however, if I could issue a "flush C" command. However, all I have found in this avenue are
Quote:
|
The trite answer is that you can never guarantee the target is exactly the same as the source.
In the instant after you read back the target for a check, it could be hit by that elusive Higgs Boson and flip one bit. Comes back to the law of diminishing returns - do the best you can, and be satisfied. |
Quote:
Honestly, I do not know how reliably a file reaches a remote machine intact when copied to a local mount point. However, I believe an integrity check of the remote file to compare to the original is considerably safer. |
Loosely speaking, you can either
1. use a tool that has built-in chksums eg rsync Or 2. use a post-facto chksum eg md5sum before & after copying In both cases, check for success before deletion of original files. |
All times are GMT -5. The time now is 11:43 AM. |