Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Ok, so this one has me flummoxed more than somewhat.
I'm supposed to sync a folder (including sub-folders and contents) between one machine and another. Simple, and straightforward, right? Well, here's where it gets fun. rsync is supposed to sync the difference between the two folders (and their sub-folders and contents).
What this rsync command seems to do is re-sync all folders and contents irrespective of what is happening.
Note: As this is used to keep images (which aren't small things at the best of times) in sync between web servers, it's set in a cron job to run every minute, but fails to complete in less than four minutes. The server load creeps up, the server falls over, etc.
Number of files: 80643
Number of files transferred: 14779
Total file size: 5281629189 bytes
Total transferred file size: 916813386 bytes
Literal data: 9320 bytes
Matched data: 916804066 bytes
File list size: 2953774
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 6265807
Total bytes received: 7531562
sent 6265807 bytes received 7531562 bytes 72048.92 bytes/sec
total size is 5281629189 speedup is 382.80
real 3m10.737s
user 0m7.012s
sys 0m4.468s
I've checked file ownership, and indeed set file ownership to be identical across both servers. And now the time stamps have been set to within seconds of each other, and it's just made things worse! HELP!
If the timestamps are not identical, then both the source and destination have to spend time generating and comparing checksums to determine how much changed data needs to be sent (just 9320 bytes for the case you presented). If the destination is on any variety of FAT filesystem, its timestamps have a resolution of 2 seconds, and any source timestamp that happens to be an odd number of seconds can never match. You can use rsync's "--modify-window=NUM" option to compare modification times with reduced accuracy. Setting that to 2 seconds is generally sufficient, but you might have to go to 3602 seconds to get around problems with Daylight Savings Time changes.
Are either the source or destination filesystems FAT32?
Quote:
Originally Posted by man rsync
--modify-window
When comparing two timestamps, rsync treats the timestamps as being equal if they
differ by no more than the modify-window value. This is normally 0 (for an exact
match), but you may find it useful to set this to a larger value in some situa‐
tions. In particular, when transferring to or from an MS Windows FAT filesystem
(which represents times with a 2-second resolution), --modify-window=1 is useful
(allowing times to differ by up to 1 second).
Thanks for both of your contributions. Thankfully, or unfortunately depending on your viewpoint, neither of the servers run FAT filesystems. They are both ext3. I have tried with the --modify-window set to 3602 just in case we had some stupid with the timezones going on, but that isn't the issue. I also tried (purely to see if there was an issue with the timezones) setting them both to be on UTC and then using NTP to get the clocks in sync. None of the above actions has helped.
Do the timestamps match after rsync has done its thing? That should happen regardless of whether the two clocks are in sync since rsync will use the utime() system call to copy the timestamp from the source to the destination.
You could try using the "--itemize-changes" option to see what differences rsync believes exist. (You'll need the manpage to interpret the result.)
chandhokshashank - LAN connection although I'm not sure of the relevance
rknichols - it does appear to be the same, so no issues there. However, I'll try the --itemise-changes option with the man page as suggested. This week seems to be rsync week for me, so nothing to lose, etc.
Many thanks for suggestions so far, and not to appear ungrateful, but do keep them coming in, cos I read every one of them. I really do appreciate it, and I'll keep you all posted.
1. although you can specify the start time of a cron job down to a minute, creating a new process every minute hammers the system as it has to create an entire new env each time.
Its also quite common to end up having them trip over each other (as you've seen).
2. My rule of thumb is to write a daemon for anything more frequent than every 5 mins and just have it sleep at the bottom of the loop for however long you'd like.
This dramatically lessens the load and also prevents the problem of multiple copies running.
As rknichols observed, the literal data sent is just 9320 bytes, so rsync is definitely not "re-syncing all folders and contents irrespective" as you suggested. However, it's comparing about 20% of the 80,000 files so the question is why. --itemize-changes will give you a clue, I suppose.
If it's not the time stamp, my guess is that the owner or group for some of the files is not getting transferred - perhaps because of permissions on the target.
Maybe it takes 3 minutes just to scan through the 80,000 files.
quick reply as yesterday was spent managing server performance due to rsync issues (told you it was one of those weeks! ).
chrism01, your point sounds very interesting. I don't suppose you'd have a link to hand about creating daemons, would you? I'd surely appreciate it.
SecretCode and rknichols, now that these servers seem more stable, I'm going to do an in-depth investigation of --itemise-changes to understand this better and move myself a step closer to being an rsync guru!
Kudos and ratings to all of you wonderful contributors, and as before, keep them coming...
Further update now that I'm back and looking at it again:
Major props to rknichols and SecretCode for re-prompting - now I understand --itemize-changes. However having done the following:
Code:
chmod 770 -R /path/to/images
and
Code:
chown owner:group -R /path/to/images
to both servers, I'd have thought that this would have meant the timestamps were now the only issue, but on running the latest
Code:
rsync --itemize-changes
I find that there are still files within this folder that say there are permissions issues, and are therefore rsyncing on the basis of timestamps and permissions.
I'm not convinced that this is part of the cause of the problem, but I thought I'd throw it in there. Possible red herring alert
User is called owner, group is group.
On server server1 owner is listed in /etc/passwd as owner:x:1000:1000::/home/owner
On the other server owner is listed in /etc/passwd as owner:x:1000:441::/home/owner
Both these GIDs are for the group group
Is it possible that this discrepancy is the cause of the files with permission issues?
Could well be ... I'll bet that rsync compares numeric group ids not group names, and (unless you have an LDAP system in place) there's no guarantee these match between different hosts.
Could well be ... I'll bet that rsync compares numeric group ids not group names,
No, as long as the same name exists on both machines, rsync defaults to comparing the names, not the numbers, with the exception of UID 0 and GID 0. See the section for "--numeric-ids" in the manpage.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.