LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 07-07-2011, 10:36 AM   #1
Willard
LQ Newbie
 
Registered: Nov 2009
Posts: 17

Rep: Reputation: 0
mv command, data integrity


Greetings.

I am creating a cron job on an archlinux server. This cron job should run daily, moving audio files and large bitmap files from a local directory A, to a remote directory (a samba share) B.

My first idea was to
  1. mount the remote samba share locally, in mount point C,
  2. mv the files from A to C.

To make sure the data finds its way to B exactly as it was in A, I felt the need to investigate how mv works.

According to the man file and the Debian info files, mv will copy the file from A to C, and only delete the original from A when the transfer to C completes successfully.

However, this documentation does not specify whether "successfully" just means the file was transferred, or whether it also does integrity checking (like computing an md5sum checksum etc).

Does mv do this?

If not, I need to use a different utility. I imagine loads of people have this very same need, and that this problem has been solved before. What other utility is ideal for this purpose?

Or do I use a combination of cp and rm, and do the md5sum check myself?

Thanks for your help,

Willard.
 
Old 07-07-2011, 11:49 AM   #2
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
A few ramblings from me: I haven't reviewed the mv(1) source code. Like you, I poked through its man pages and info entry (in coreutils).

My gut feeling is that using mv to push the file over tcp should result in a "successful" operation (read: identical before and after), or a noisy error if something went awry.

That said, it's trivial enough to do a sha1sum(1) of the file before and after the copy, a la:
  1. generate crypto digest for /path/A/audio_file01
  2. copy (rather than move) /path/A/audio_file01 -> /path/B
  3. generate crypto digest for /path/B/audio_file01
  4. do digests match? make noise and exit if not
  5. remove /path/A/audio_file01

Might as well be sure about the file's integrity and sleep well at night.
 
1 members found this post helpful.
Old 07-07-2011, 07:46 PM   #3
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,226

Rep: Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023
There's always rsync; I believe that is very careful about checking whether it went ok.
 
1 members found this post helpful.
Old 06-03-2012, 06:49 AM   #4
Willard
LQ Newbie
 
Registered: Nov 2009
Posts: 17

Original Poster
Rep: Reputation: 0
Short version: How do I ensure that, when I have issued a shell command to copy file F from directory A to mount point C, that F has reached B before proceeding?

(I am thinking sync or fsync, but fsync is a C function, the man page of sync is useless, and the info page of sync does not explain what happens when you flush a mount of a remote directory)

Long version:

Quote:
Originally Posted by anomie View Post
Might as well be sure about the file's integrity and sleep well at night.
I would expect integrity checks to be occurring more widely, and tools for this to be plentiful, given how paranoid linux-admins are.
Quote:
Originally Posted by anomie View Post
[...]it's trivial enough to do a sha1sum(1) of the file before and after the copy, a la:
  1. generate crypto digest for /path/A/audio_file01
  2. copy (rather than move) /path/A/audio_file01 -> /path/B
  3. generate crypto digest for /path/B/audio_file01
  4. do digests match? make noise and exit if not
  5. remove /path/A/audio_file01
Some updates on the issue: I did basically what you described. Except, from what I see, when a shell command performing a file transfer to C is complete,
  1. the file might be "on its way", stored in buffers locally, and
  2. errors might be introduced in the file during transit / due to ruined server HDD.
I "solved" these issues by, when the transfer to C is complete,
  1. SCP-ing the file from the remote machine back to local machine
  2. perform sha1sum on the copy, compare sha sums locally.
No, I am not proud of this solution. I chose this approach after realizing that the remote server does not have the sha1sum command installed. Another reason why this solution is awful is that only one user (named "admin") has privileges to SCP, so for this to work, the local machine needs a) the "admin" password, or b) to have its passwordless public key in the authorized_keys file for "admin" on the the remote machine.

I realized that I did not need cryptographically-strong checksums, so md5sum should be fine. The remote machine has md5sum installed, so an option which does not require the file to be transferred back to local involves having remote do the md5sum on the received file, and make the result available to local.

A much more elegant solution, however, would be to ensure that the file has reached remote when copied to C. One way of doing this would be to remount (unmount, then mount) C, as this flushes buffers. It would be better, however, if I could issue a "flush C" command. However, all I have found in this avenue are
  1. fsync, which is a C function, not a conveniently-available bash command,
  2. sync, which flushes all buffers, but is a bit vague about what it does to mounts of remote directories.
If I can ensure that the file has reached the remote machine by flushing the mount point, then I can do the checksum on the file in C locally. However, depending on how md5sum works, this might copy the remote file back to local (which is getting silly again).
Quote:
Originally Posted by chrism01 View Post
There's always rsync; I believe that is very careful about checking whether it went ok.
I envision two uses:
  1. copy to mount point C & integrity-check,
  2. copy to remote machine using smb & integrity-check.
For 1), does rsync flush file system buffers upon completion? For 2), how do you do this?

Last edited by Willard; 06-03-2012 at 06:52 AM.
 
Old 06-03-2012, 07:03 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,101

Rep: Reputation: 982Reputation: 982Reputation: 982Reputation: 982Reputation: 982Reputation: 982Reputation: 982Reputation: 982
The trite answer is that you can never guarantee the target is exactly the same as the source.

In the instant after you read back the target for a check, it could be hit by that elusive Higgs Boson and flip one bit.
Comes back to the law of diminishing returns - do the best you can, and be satisfied.
 
Old 06-03-2012, 07:09 AM   #6
Willard
LQ Newbie
 
Registered: Nov 2009
Posts: 17

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
The trite answer is that you can never guarantee the target is exactly the same as the source.

In the instant after you read back the target for a check, it could be hit by that elusive Higgs Boson and flip one bit.
Comes back to the law of diminishing returns - do the best you can, and be satisfied.
I agree; file F might become F'!=F when it reaches the remote machine due to errors in transit, and then the same bit flipping can occur before you actually invoke the md5sum again, resulting in an md5sum of F.

Honestly, I do not know how reliably a file reaches a remote machine intact when copied to a local mount point. However, I believe an integrity check of the remote file to compare to the original is considerably safer.
 
Old 06-03-2012, 07:16 PM   #7
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,226

Rep: Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023Reputation: 2023
Loosely speaking, you can either

1. use a tool that has built-in chksums eg rsync

Or

2. use a post-facto chksum eg md5sum before & after copying

In both cases, check for success before deletion of original files.
 
1 members found this post helpful.
  


Reply

Tags
backup, md5sum, mv


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Raw device data integrity test tool? dbrazeau Linux - Software 1 01-07-2010 06:57 AM
Testing the integrity and checking the data on a hard drive Cinematography Linux - Hardware 5 07-21-2009 04:45 PM
protecting data integrity of a server which is always on DJOtaku *BSD 4 09-22-2007 11:53 AM
Data Integrity Checks itnaa Linux - Software 7 12-22-2006 01:28 PM
creating tar files with high data integrity edman007 Linux - Software 13 10-10-2006 02:00 PM


All times are GMT -5. The time now is 09:41 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration