rsync Backups
Is there a way with rsync to copy a large directory to multiple USB drives?
I have a directory that is 8.2 TB large and we need to get this to the client and the client only wants 2 TB drives. So my question is. Is there a way to use rsync to fill up the first USB drive and then have rsync ask for or after manual change of the USB drive for rsync to know where it left off and start copying data to the second USB drive at the point it left off on the first USB drive? Clear as mud, I hope so! |
Just one idea. Please check this against the man page and test it before you try to implement.
The general idea is to make a list of everything you want to copy, then start copying it. While copying, annote every file that has copied OK. Exclude these files as you do subsequent copies. First, create a payload list. Code:
rsync -ani /src /dest | \ Code:
rsync -ai \ Also, tar is good at creating multi-volume archives. There might be some solution with tar that works for you. |
It seems to me that is this almost identical to the old problem of packing files in limited news and mail messages.
Have a look at the various SHAR (shell archive) programs (there were lots of them), which not only packaged files, but also split them into groups with a total size limit on each group. In some you to extract specific files from one specific group. Files that were too big were split into smaller segments over multiple 'messages'. I am sure that software is still around. For another method you could try the RAR archive. whcih generates a large split archive. It can also recover files from have a few 'pieces'. Please be sure to let us know whatever solution you do come up with! It has a lot of relevance, not just to USB sticks, but CD and DVD data storage as well. ASIDE: this is actually known as a 'packaging' problem and has been shown to be NP-complete programming problem. That is there is no one 'perfect' solution that does not take a polynomial time calculation. However todays computers are fast enough that typically this is no barrier for any 'practical' situation. |
Will keep you posted. Currently I am working on a Perl script. If this works is there a good place to post something like this so the masses can have it? Is there anything else that needs to be done to the script before putting to general use (i.e. putting GNU info in it etc.)?
|
If it's a script you can simply post it in CODE tags in a reply post here in this thread.
|
You can always upload it to a site like a public 'dropbox' folder, then post a link here.
NOTE: I like using CPAN for more complex things, but they are module oriented, without a proper place for scripts that don't need modules! |
My Perl script is working like a champ so far. Have a few more tweaks then I will post here.
|
Code that I forgot to post and as always there probably is another way of doing this but this is the way I did it.
Quote:
|
Hmmmm you do know that rsync can do the comparision itself, using file sizes, times, and block level checksums (eg only the end of log files are updated).
Comparing files outside rsync, basically would involve the equivelent of copying the files anyway. |
No did not know this, good to know. The biggest issue I had was using rsync to copy files from disk to USB then when USB #1 fills up roll over to the second USB drive and so on. If rsync will do this as well please by all means post the command line arguments LOL.
|
It is a intergral part of rsync to only transfer the changes. It was specifically designed with slow modems in mind. This is what makes it different to a normal 'file copy' such as scp, cp, tar, cpio, and so on.
Rsync only replaces files on the destination (breaking any hardlinked copies), if a file data changes, which is why you can create large numbers of 'snapshots' (even once an hour) using very little disk space. Such rsync backups are not compressed, which allows each snapshot to be look almost exactly like a simple full working copy of the directories that were backed up. That is, it is easy to search, and access any file in any snapshot. You do not have do searching multiple incremental compressed backup files just to recover a specific bit of data, prehaps without knowning the exact filename that data is in. Just search for it directly as you normally would, across all the snapshots. It is the hard linking of unchanged files that gives a rsync multi-snapshot backup method such a good compression. However hardlinks only work on the same disk storage mount, so each USB would have to have at least one full copy of the files being backed up. Also hardlinked snapshoting will require... hard links.. which requires a UNIX style filesystem. USB sticks typically only use a low level VFAT filesystem (no hardlinks, and DOS file attributes) for maximum compatibility. As such USB sticks may need a different filesystem for it to work well. And larger USB drives with say a EXT4 filesystem tends to work better. It allows more hardlinked snapshots from the initial full copy (or last snapshot depending on how you look at it), and this higher disk space savings (hardlink compression) per snapshot. |
ASIDE: The use of a cloud based filesystem (like dropbox) also precludes the use of hardlinks. As such snapshoting to such a filesystem does not compress well as you do not get hardlink sharing of files accross individual snapshots.
However making snapshot backups on a local machine, of a (prosibly encrypted) cloud based 'working' filesystem that can be shared accross devices, should work very well. That one local machine keeps 'snapshot backups' (perhaps working automatically in the background), while the cloud allows access to the actual working directory from multiple locations. If something happens to the cloud, or your working directory gets corrupted for some reason, you have your highly-hardlinked snapshots to recover from. It will be straight forward then to copy the last good snapshot to a new replacement cloud provider. The last two posts have been included in my general notes (plain text file) on Rsync Backups and Snapshoting. http://www.ict.griffith.edu.au/antho...c_backup.hints |
All times are GMT -5. The time now is 02:49 AM. |