LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   mv command, data integrity (https://www.linuxquestions.org/questions/linux-software-2/mv-command-data-integrity-890441/)

Willard 07-07-2011 10:36 AM

mv command, data integrity
 
Greetings.

I am creating a cron job on an archlinux server. This cron job should run daily, moving audio files and large bitmap files from a local directory A, to a remote directory (a samba share) B.

My first idea was to
  1. mount the remote samba share locally, in mount point C,
  2. mv the files from A to C.

To make sure the data finds its way to B exactly as it was in A, I felt the need to investigate how mv works.

According to the man file and the Debian info files, mv will copy the file from A to C, and only delete the original from A when the transfer to C completes successfully.

However, this documentation does not specify whether "successfully" just means the file was transferred, or whether it also does integrity checking (like computing an md5sum checksum etc).

Does mv do this?

If not, I need to use a different utility. I imagine loads of people have this very same need, and that this problem has been solved before. What other utility is ideal for this purpose?

Or do I use a combination of cp and rm, and do the md5sum check myself?

Thanks for your help,

Willard.

anomie 07-07-2011 11:49 AM

A few ramblings from me: I haven't reviewed the mv(1) source code. Like you, I poked through its man pages and info entry (in coreutils).

My gut feeling is that using mv to push the file over tcp should result in a "successful" operation (read: identical before and after), or a noisy error if something went awry.

That said, it's trivial enough to do a sha1sum(1) of the file before and after the copy, a la:
  1. generate crypto digest for /path/A/audio_file01
  2. copy (rather than move) /path/A/audio_file01 -> /path/B
  3. generate crypto digest for /path/B/audio_file01
  4. do digests match? make noise and exit if not
  5. remove /path/A/audio_file01

Might as well be sure about the file's integrity and sleep well at night.

chrism01 07-07-2011 07:46 PM

There's always rsync; I believe that is very careful about checking whether it went ok.

Willard 06-03-2012 06:49 AM

Short version: How do I ensure that, when I have issued a shell command to copy file F from directory A to mount point C, that F has reached B before proceeding?

(I am thinking sync or fsync, but fsync is a C function, the man page of sync is useless, and the info page of sync does not explain what happens when you flush a mount of a remote directory)

Long version:

Quote:

Originally Posted by anomie (Post 4407993)
Might as well be sure about the file's integrity and sleep well at night.

I would expect integrity checks to be occurring more widely, and tools for this to be plentiful, given how paranoid linux-admins are.
Quote:

Originally Posted by anomie (Post 4407993)
[...]it's trivial enough to do a sha1sum(1) of the file before and after the copy, a la:
  1. generate crypto digest for /path/A/audio_file01
  2. copy (rather than move) /path/A/audio_file01 -> /path/B
  3. generate crypto digest for /path/B/audio_file01
  4. do digests match? make noise and exit if not
  5. remove /path/A/audio_file01

Some updates on the issue: I did basically what you described. Except, from what I see, when a shell command performing a file transfer to C is complete,
  1. the file might be "on its way", stored in buffers locally, and
  2. errors might be introduced in the file during transit / due to ruined server HDD.
I "solved" these issues by, when the transfer to C is complete,
  1. SCP-ing the file from the remote machine back to local machine
  2. perform sha1sum on the copy, compare sha sums locally.
No, I am not proud of this solution. I chose this approach after realizing that the remote server does not have the sha1sum command installed. Another reason why this solution is awful is that only one user (named "admin") has privileges to SCP, so for this to work, the local machine needs a) the "admin" password, or b) to have its passwordless public key in the authorized_keys file for "admin" on the the remote machine.

I realized that I did not need cryptographically-strong checksums, so md5sum should be fine. The remote machine has md5sum installed, so an option which does not require the file to be transferred back to local involves having remote do the md5sum on the received file, and make the result available to local.

A much more elegant solution, however, would be to ensure that the file has reached remote when copied to C. One way of doing this would be to remount (unmount, then mount) C, as this flushes buffers. It would be better, however, if I could issue a "flush C" command. However, all I have found in this avenue are
  1. fsync, which is a C function, not a conveniently-available bash command,
  2. sync, which flushes all buffers, but is a bit vague about what it does to mounts of remote directories.
If I can ensure that the file has reached the remote machine by flushing the mount point, then I can do the checksum on the file in C locally. However, depending on how md5sum works, this might copy the remote file back to local (which is getting silly again).
Quote:

Originally Posted by chrism01 (Post 4408353)
There's always rsync; I believe that is very careful about checking whether it went ok.

I envision two uses:
  1. copy to mount point C & integrity-check,
  2. copy to remote machine using smb & integrity-check.
For 1), does rsync flush file system buffers upon completion? For 2), how do you do this?

syg00 06-03-2012 07:03 AM

The trite answer is that you can never guarantee the target is exactly the same as the source.

In the instant after you read back the target for a check, it could be hit by that elusive Higgs Boson and flip one bit.
Comes back to the law of diminishing returns - do the best you can, and be satisfied.

Willard 06-03-2012 07:09 AM

Quote:

Originally Posted by syg00 (Post 4694375)
The trite answer is that you can never guarantee the target is exactly the same as the source.

In the instant after you read back the target for a check, it could be hit by that elusive Higgs Boson and flip one bit.
Comes back to the law of diminishing returns - do the best you can, and be satisfied.

I agree; file F might become F'!=F when it reaches the remote machine due to errors in transit, and then the same bit flipping can occur before you actually invoke the md5sum again, resulting in an md5sum of F.

Honestly, I do not know how reliably a file reaches a remote machine intact when copied to a local mount point. However, I believe an integrity check of the remote file to compare to the original is considerably safer.

chrism01 06-03-2012 07:16 PM

Loosely speaking, you can either

1. use a tool that has built-in chksums eg rsync

Or

2. use a post-facto chksum eg md5sum before & after copying

In both cases, check for success before deletion of original files.


All times are GMT -5. The time now is 11:43 AM.