LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-13-2014, 08:04 AM   #16
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,247

Rep: Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328Reputation: 2328

The 2 key attributes of a file to check after a copy are filename and chksum the content.

Unless you've gone out of your way to change the filename, then chksum alone is sufficient and what most people use.
Of course if you use a tool like rsync that does the chksum as part of its work, then you don't even need to do that (although you can afterwards if you're paranoid).

Incidentally, use the stat cmd if your really want to check ctime/mtime/atime (not that it will achieve anything worthwhile).
As above, there's no such thing as 'creation time' in *nix.
If its that important, embed it into the filename.

I honestly believe you're over-thinking this ...
 
Old 05-13-2014, 08:18 AM   #17
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,475

Rep: Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424Reputation: 2424
Quote:
Originally Posted by Rygir View Post
That said, I was interested in rsync but I haven't been able to get it to work. Supposedly my nas supports it, but I doubt my source machine supports it. I forgot what went wrong exactly but maybe I was just doing it wrong....
So you mount your nas dir somewhere and try to sync it to your "another" dir. rsync --dry-run will tell you all the differences it found, it should only be executed on the local host and you can use almost any version of rsync to do that. It is definitely created for that kind of job, therefore I do not think you need to reinvent it again. rsync is available for almost every os.
 
Old 05-13-2014, 10:41 AM   #18
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,654

Rep: Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255
The only time I have run across a problem with rsync has to do with large filesystems (50 million+ files, a couple of million directories...).

rsync spends a LOT of time first checking source and destination for what to do.. and for such a large filesystem it can take many days... before it copies the first file.
 
Old 05-13-2014, 11:05 AM   #19
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,094

Rep: Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515
break the filesystem into logically seperate rsync jobs. This keeps down the time and memory required.
 
Old 05-13-2014, 11:21 AM   #20
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,654

Rep: Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255
That only helps a little, and doesn't help with the memory as the sum of each job still adds up.

And unless the jobs run in parallel, it doesn't improve the time either.

The largest number of such jobs possible (well, on the equipment I had) would be 12. After that the filesystems saturate and start causing delays - and there is still the delay while it checks the two filesystems (It would still take several weeks to transfer the data). And if a job aborts, it has to scan the filesystems again from the start.

My workaround was to write a perl script that handled the scan in parallel with the file transfer. Making the script checkpointable allowed it to restart without repeating the scan... and not repeat copying files already done. I got the scan down to 45 minutes if there was nothing to do. But add the file transfer and things could take a good bit longer. The first complete pass (scan and copy) took a bit over three weeks (checkpoints every 6 hours). But repeats got faster, until it was under a couple of hours with the normal updates being made.

Last edited by jpollard; 05-13-2014 at 11:34 AM.
 
Old 05-13-2014, 11:27 AM   #21
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,094

Rep: Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515Reputation: 1515
I disagree.

Do me a favor.

Grab an ENORMOUS rsync job.

Break it into x number of pieces, then run the jobs serially. And compare it to the BIG job.

Code:
date; rsync -varh --progress here there:/tmp/; date
I've done this over and over and written (time rsync) comparisons in my department, and the serialized rsync jobs always complete first. Watching memory consumption, it always utilizes less.

In my experience for some reason smaller rsync jobs will finish faster than one BIG one. And Im talking about 50% text log files, 50% binary files,.. numbering around the 2 million range at 800GB total.

Last edited by szboardstretcher; 05-13-2014 at 11:30 AM.
 
Old 05-13-2014, 12:09 PM   #22
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,654

Rep: Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255
That is because the scans take so long. The logical conclusion is to do one rsync job per file....

The trouble with serial syncs is that only one file at a time is transferred - taking forever (50 million remember).

And using rsync at that resolution (even down to a single directory...) is too slow. And since the directory tree is not that small, even breaking them down to the directory level doesn't work well.

BTW, the filesystems involved were 16TB. And the file servers involved would saturate due to the scans.

rsync works... it just doesn't scale well, and it isn't always possible to break a large tree down enough to make it fast. It is fast enough for leaf directories (well normally - but if you have 5,000+ files in a directory it isn't all that fast). But for thousands of intermediate level directories - it sucks. Works correctly if you start at the top level... but that doesn't allow enough of a breakdown. Even if the top level has a couple of hundred directories (plus files) you can only break it down into a couple of hundred jobs. Yet that leaves 10s of thousands of files and directories below that... and the scan again adds up...

The perl script I had completely separated the scanning from the copying. And the scanning itself could grow as fast as the number of directories found (thus, I had controls to throttle it). The copying would be done as fast as possible - but due to the nature of the network and the two filesystems involved, that was limited to about 4 files in parallel (after that, the throughput dropped).
 
  


Reply

Tags
bash, compare, files, metadata


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
BASH Shell Scripting -- foreach file in folder??? AC97Conquerer Programming 11 07-06-2011 01:25 AM
Bash Shell scripting help! computergeek7 Programming 2 03-01-2010 06:55 AM
Bash shell script and automating FTP Backups (File Verification) alternaloser Linux - Server 1 10-12-2009 01:18 AM
Help with Bash shell scripting HLM01 Linux - Newbie 7 01-31-2008 05:23 PM
My first bash at shell scripting (sorry...) Daws Linux - Newbie 5 03-21-2007 07:20 PM


All times are GMT -5. The time now is 10:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration