Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
You should set up Network Time Protocol (NTP) on both boxes so that they have the same UTC time. Even if in different time zones this will make sure the times recorded on files are correct.
rsync should be seeing the files as the same if they are the same - have you used some flag that makes it change access, creation or modification time on the target? The main point of rsync is specifically to avoid having to re-transfer files that have already been transferred so if you're seeing an issue with re-transfers it seems likely you're telling it to do so. (There are flags to say to re-transfer even if it is on the target.) Review the man page (type "man rsync") and be sure you understand what the flags you're using are doing.
Sorry, I should probably have included this but, the initial transfer was not done using rsync. It was done using wget and the -m switch running wget in mirroring mode, which should have preserved the timestamps. Apparently timestamps aren't preserved.
Both boxes do synchronize to an ntp server so the time should be the same.
For the first run after the time change, you can run rsync with the --checksum option in order to ensure that only files that have a changed checksum get transferred, and update the modified time on all unchanged files.
I've been googling and as you can see from the quote I found there seems to be a way to get rsync to update timestamps anfter it has checked checksums. But the source I have found doesn't explain how. Does anyone know how this can be done?
I'm not entirely sure what's going on but there is a "-c" switch for rsync that will check checksums. I tried running rsync with that option enabled but apparently my boxes are so week that checking checksums takes as long as transfering the file does. Which means that this is a hopeless way of doing it. Any other suggestions?
If it were me I'd bite the bullet by removing the files done with wget then using rsync to re-transfer. It will send the files again but then the next time you run rsync after that it will only find the "real" changes.
rsync can be sped up by using compression (it does by default) if you have limited bandwidth (of course you CPU takes a hit).
Also if this is something internal to your LAN/WAN and not via internet you can tell rsync to use the old rsh instead of ssh. That will speed it up some because then it won't have to encrypt the connection. At a prior job we regularly would enable rsh to do large rsyncs then disable it when done. (You do NOT want to leave rsh enabled because it is a security risk but for a controlled window of time you can probably get away with it.)
In that case wouldn't it be easier to move the files in the expected tree layout first? After that you could do a recursive rsync dry run with timestamp checking disabled (--update, --size-only, --checksum) to see if at least the structure is OK. Also with rsync you can filter things so you can choose to update half of the 15 thousand small files and leave the 2TB one for later.
It's been awhile since I used rsh and that wasn't on Linux. However, on checking my CentOS5 system I see there is a "kshell" file in /etc/xinetd.d that runs "kshd" which is:
"# description: The kerberized rshell server accepts rshell commands authenticated and encrypted with Kerberos 5."
The rsh command and the kshd are both part of the package:
That package contains man pages for rsh and kshd. You can access them by typing:
The man page for kshd shows a -e to force encryption and the man page for rsh shows a -x to force encryption. It doesn't say whether or not it encrypts by default if both sides are capable of it. (The UNIX rsh didn't allow for encryption at all.) The package for kshell may have been installed by the xinetd package itself as it does bring in some basic services like that but disables them by default.
Also I misspoke earlier when I said rsync encrypts by default. Typically I use "-e ssh" to force it to use ssh for the transport (as opposed to running an rsync daemon) and it is ssh that encrypts by default not rsync itself. Therefore you might try running the rsync daemon on the target and NOT use the "-e ssh" option and may achieve the same effect. I haven't tried that myself to see if it is any faster but since rsync's man page talks about using ssh for encryption it suggests it doesn't do encryption if the daemon is used instead.