LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   synchronize two folders? (https://www.linuxquestions.org/questions/linux-newbie-8/synchronize-two-folders-817189/)

junglepunk 06-30-2010 10:23 AM

synchronize two folders?
 
I have made a backup of my files on a remote server and I'd like to maintain that backup using rsync. The problem is that the timestamps don't match perfectly between the source and the backup.

What can I do? I'd rather not replace all the files in my backup because there is so much data it would take a very very long time.

Is there perhaps a way to compare checksums and then update the timestamps?

Both are low power boxes with no GUI and only BusyBox CLI access.

MensaWater 06-30-2010 11:01 AM

You should set up Network Time Protocol (NTP) on both boxes so that they have the same UTC time. Even if in different time zones this will make sure the times recorded on files are correct.

rsync should be seeing the files as the same if they are the same - have you used some flag that makes it change access, creation or modification time on the target? The main point of rsync is specifically to avoid having to re-transfer files that have already been transferred so if you're seeing an issue with re-transfers it seems likely you're telling it to do so. (There are flags to say to re-transfer even if it is on the target.) Review the man page (type "man rsync") and be sure you understand what the flags you're using are doing.

junglepunk 06-30-2010 11:26 AM

Sorry, I should probably have included this but, the initial transfer was not done using rsync. It was done using wget and the -m switch running wget in mirroring mode, which should have preserved the timestamps. Apparently timestamps aren't preserved.

Both boxes do synchronize to an ntp server so the time should be the same.

junglepunk 06-30-2010 11:28 AM

Quote:

There are two ways to correct.

For the first run after the time change, you can run rsync with the --checksum option in order to ensure that only files that have a changed checksum get transferred, and update the modified time on all unchanged files.
Source: http://samba.anu.edu.au/rsync/daylight-savings.html

I've been googling and as you can see from the quote I found there seems to be a way to get rsync to update timestamps anfter it has checked checksums. But the source I have found doesn't explain how. Does anyone know how this can be done?

junglepunk 06-30-2010 11:55 AM

I'm not entirely sure what's going on but there is a "-c" switch for rsync that will check checksums. I tried running rsync with that option enabled but apparently my boxes are so week that checking checksums takes as long as transfering the file does. Which means that this is a hopeless way of doing it. Any other suggestions?

MensaWater 06-30-2010 12:13 PM

If it were me I'd bite the bullet by removing the files done with wget then using rsync to re-transfer. It will send the files again but then the next time you run rsync after that it will only find the "real" changes.

junglepunk 06-30-2010 12:50 PM

Yeah, it looks like that's the only way to do it. Damn shame. Took me a week to transfer the files with wget. Rsync is even slower. Oh well . . .

MensaWater 06-30-2010 01:05 PM

rsync can be sped up by using compression (it does by default) if you have limited bandwidth (of course you CPU takes a hit).

Also if this is something internal to your LAN/WAN and not via internet you can tell rsync to use the old rsh instead of ssh. That will speed it up some because then it won't have to encrypt the connection. At a prior job we regularly would enable rsh to do large rsyncs then disable it when done. (You do NOT want to leave rsh enabled because it is a security risk but for a controlled window of time you can probably get away with it.)

junglepunk 06-30-2010 01:40 PM

This is a LAN, and the LAN speed is not really the problem, CPU is however quite limited. I don't think compression would speed things up, rather it would probably slow things down.

Do you have any good resources on rsh for reading through? I've never hared of it.

unSpawn 06-30-2010 01:52 PM

Quote:

Originally Posted by junglepunk (Post 4019606)
Took me a week to transfer the files with wget.

In that case wouldn't it be easier to move the files in the expected tree layout first? After that you could do a recursive rsync dry run with timestamp checking disabled (--update, --size-only, --checksum) to see if at least the structure is OK. Also with rsync you can filter things so you can choose to update half of the 15 thousand small files and leave the 2TB one for later.

MensaWater 06-30-2010 02:07 PM

It's been awhile since I used rsh and that wasn't on Linux. However, on checking my CentOS5 system I see there is a "kshell" file in /etc/xinetd.d that runs "kshd" which is:
Quote:

"# description: The kerberized rshell server accepts rshell commands authenticated and encrypted with Kerberos 5."
The rsh command and the kshd are both part of the package:
krb5-workstation

That package contains man pages for rsh and kshd. You can access them by typing:
man rsh
man kshd

The man page for kshd shows a -e to force encryption and the man page for rsh shows a -x to force encryption. It doesn't say whether or not it encrypts by default if both sides are capable of it. (The UNIX rsh didn't allow for encryption at all.) The package for kshell may have been installed by the xinetd package itself as it does bring in some basic services like that but disables them by default.

Also I misspoke earlier when I said rsync encrypts by default. Typically I use "-e ssh" to force it to use ssh for the transport (as opposed to running an rsync daemon) and it is ssh that encrypts by default not rsync itself. Therefore you might try running the rsync daemon on the target and NOT use the "-e ssh" option and may achieve the same effect. I haven't tried that myself to see if it is any faster but since rsync's man page talks about using ssh for encryption it suggests it doesn't do encryption if the daemon is used instead.


All times are GMT -5. The time now is 04:31 AM.