LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 02-20-2012, 05:30 PM   #1
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS
Posts: 174

Rep: Reputation: 6
rsync Backups


Is there a way with rsync to copy a large directory to multiple USB drives?

I have a directory that is 8.2 TB large and we need to get this to the client and the client only wants 2 TB drives.

So my question is. Is there a way to use rsync to fill up the first USB drive and then have rsync ask for or after manual change of the USB drive for rsync to know where it left off and start copying data to the second USB drive at the point it left off on the first USB drive?

Clear as mud, I hope so!
 
Old 02-21-2012, 01:54 PM   #2
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 206Reputation: 206Reputation: 206
Just one idea. Please check this against the man page and test it before you try to implement.

The general idea is to make a list of everything you want to copy, then start copying it. While copying, annote every file that has copied OK. Exclude these files as you do subsequent copies.

First, create a payload list.

Code:
rsync -ani /src /dest | \
    awk '{ print $NF }' 2>/dev/null | \
    tee payload.txt
Now, repeat this command by hand, running it once per drive, starting with a new empty one:

Code:
rsync -ai \
      --files-from=payload.txt \
      --exclude-from=completed.txt \
      /src /dest \
      2>/dev/null | \
    awk '{ print $NF }' >> completed.txt
But my real solution for you is to inspect the source directory by hand and break the content up by subdirectories in a way that makes sense. For example, you might have /src/engineering_files, /src/hr_docs, /src/ceo_data. Just sync each tree to a separate disk by hand.

Also, tar is good at creating multi-volume archives. There might be some solution with tar that works for you.

Last edited by jhwilliams; 02-21-2012 at 01:58 PM.
 
1 members found this post helpful.
Old 02-22-2012, 12:40 AM   #3
A.Thyssen
Member
 
Registered: May 2006
Location: Brisbane, Australia
Posts: 119

Rep: Reputation: 32
It seems to me that is this almost identical to the old problem of packing files in limited news and mail messages.

Have a look at the various SHAR (shell archive) programs (there were lots of them), which not only packaged files, but also split them into groups with a total size limit on each group. In some you to extract specific files from one specific group. Files that were too big were split into smaller segments over multiple 'messages'.

I am sure that software is still around.


For another method you could try the RAR archive. whcih generates a large split archive.
It can also recover files from have a few 'pieces'.


Please be sure to let us know whatever solution you do come up with!
It has a lot of relevance, not just to USB sticks, but CD and DVD data storage
as well.


ASIDE: this is actually known as a 'packaging' problem and has been shown to be NP-complete programming problem. That is there is no one 'perfect' solution that does not take a polynomial time calculation. However todays computers are fast enough that typically this is no barrier for any 'practical' situation.
 
Old 02-22-2012, 10:12 AM   #4
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS
Posts: 174

Original Poster
Rep: Reputation: 6
Will keep you posted. Currently I am working on a Perl script. If this works is there a good place to post something like this so the masses can have it? Is there anything else that needs to be done to the script before putting to general use (i.e. putting GNU info in it etc.)?
 
Old 02-22-2012, 10:53 AM   #5
sag47
Senior Member
 
Registered: Sep 2009
Location: Philly, PA
Distribution: Kubuntu x64, RHEL, Fedora Core, FreeBSD, Windows x64
Posts: 1,417
Blog Entries: 33

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
If it's a script you can simply post it in CODE tags in a reply post here in this thread.
 
Old 02-22-2012, 07:39 PM   #6
A.Thyssen
Member
 
Registered: May 2006
Location: Brisbane, Australia
Posts: 119

Rep: Reputation: 32
You can always upload it to a site like a public 'dropbox' folder, then post a link here.

NOTE: I like using CPAN for more complex things, but they are module oriented, without a proper place for scripts that don't need modules!
 
Old 02-27-2012, 10:53 AM   #7
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS
Posts: 174

Original Poster
Rep: Reputation: 6
My Perl script is working like a champ so far. Have a few more tweaks then I will post here.
 
Old 01-29-2013, 03:28 PM   #8
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS
Posts: 174

Original Poster
Rep: Reputation: 6
Code that I forgot to post and as always there probably is another way of doing this but this is the way I did it.

Quote:
#!/usr/bin/perl
#########################################################
# This script will take user input for source and #
# upto 2 destinations and then create an array of the #
# source and destinations and push to @diff array. Once #
# the @diff array is populated it will then use rsync #
# to copy the data from source to destination1 and then #
# when destination1 is full is will roll over to #
# destination2. #
# #
# Update History: #
# 21-Feb-2012 - +added script usage #
# 22-Feb-2012 - +added comments to the script #
# 23-Feb-2012 - +added change ownership to destinations #
# 24-Feb-2012 - -removed change ownership #
# 24-Feb-2012 - -removed elsif statement for moving to #
# next disk when first disk is full #
# +changed ls -l to ls -lk to get 1024 KB #
# -removed nagios alert for now #
# 25-Feb-2012 - +added @space2, $info2 for $ddir2 #
# 27-Feb-2012 - +added rsync command with no logging #
# #
#########################################################

#--------------------------#
# Script settings #
#--------------------------#
use diagnostics;
use strict;
use warnings;

#--------------------------#
# Script Usage #
#--------------------------#
if (@ARGV != 3) {
print "\n\n";
print "usage: rsync_copy.pl <source drive> <destination drive1> <destination drive2>\n";
print "example: rsync_copy.pl /home/user /mnt/drive1 /mnt/drive2\n";
exit;
}

#--------------------------#
# Global Variables #
#--------------------------#
my $rsync = "/usr/bin/rsync -avp --progress";
my $bdir = "<put your directory here>";
my $sdir = $ARGV[0];
my $ddir1 = $ARGV[1];
my $ddir2 = $ARGV[2];
my $logdir = "<your directory here>";
my $ofile1 = "rsync_1-";
my $ofile2 = "rsync_2-";
my $error = "/tmp/changedrive";
my (@files, @dest1, @dest2, @diff, @isect, %count, @space1, @space2, @disk1, @disk2, @filesize, @fsize);
my ($files, $dest, $nfile, $diff, $isect, $item, $space1, $space2, $filesize, $fs, $fsize);
my ($info1, $info2, $disk1, $disk2, $part, $size, $used, $free1, $free2, $perc, $file);
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
my $datestring = sprintf("%4d%02d%02d%02d%02d%02d",($year + 1900),($mon+1),$mday,$hour,$min,$sec);

#--------------------------#
# Main Routine #
#--------------------------#
### Get the source drive (server) contents ###
@files = `ls $bdir$sdir`;
#@files = sort(@files); ### Only remove comment if testing to see order ###
chomp @files;

### Get the destination source #1 contents ###
@dest1 = `ls $ddir1`;
#@dest1 = sort(@dest1); ### Only remove comment if testing to see order ###
chomp @dest1;

### Get the destination source #2 contents ###
@dest2 = `ls $ddir2`;
#@dest2 = sort(@dest2); ### Only remove comment if testing to see order ###
chomp @dest2;


@isect = ( ); ### Files that intersect in both @dest1 and @files ###
@diff = ( ); ### Files that are different between @dest1 and @files ###
%count = ( );

foreach $item (@dest1, @dest2, @files) { $count{$item}++;}

foreach $item (keys %count)
{
if ($count{$item} == 2)
{
push @isect, $item;
#@isect = sort(@isect); ### Only remove comment if testing to see order ###
}
else
{
push @diff, $item;
@diff = sort(@diff); ### sort the array to rsync files in order (i.e. file1, file2 etc) ###
}
}

### uncomment only if debugging output of arrays ###
#print "\ndest1 Array = @dest1\n";
#print "\ndest2 Array = @dest2\n";
#print "\nfiles Array = @files\n";
#print "\nIntersect Array = @isect\n";
#print "\nDiff Array = @diff\n\n";

### If @diff has files then proceed ###
if (@diff)
{
foreach $diff (@diff)
{
### Get the free space of the destination source ###
@space1 = `df -k $ddir1`;
@space2 = `df -k $ddir2`;

### Get the second line of the df -k output ###
$info1 = $space1[1];
$info2 = $space2[1];

### Split the df -k output by space ###
@disk1 = split(' ',$info1);
@disk2 = split(' ',$info2);

### Seperate the values by size, free, partition etc. ###
$free1 = $disk1[3];
$free2 = $disk2[3];

### Get the file size of the files in the @diff array ###
@filesize = `ls -lk $bdir$sdir/$diff`;
foreach $filesize (@filesize)
{
chomp $filesize;

### Split the ls -lk by space ###
@fsize = split(' ',$filesize);

### Get the file size field ###
$fs = $fsize[4];

### Example of outputs from df -k and ls -lk ###
#disk space left = 57632320 = (55 GB)
#file size in kb = 00056332 = (56 MB)

### Check to see if free space on destination source will allow the next file ###
# 57632320 >= 00056332
if ($free1 >= $fs)
{
print "Free disk space (KB): $free1\n";

### rsync the files to destination source and output to a file for reading ###
#`$rsync $bdir$sdir/$diff $ddir1 >> $logdir$ofile1$datestring.txt`; ### Use this line if you want to log it
`$rsync $bdir$sdir/$diff $ddir1`; ### Comment this line out if you use logging
print "$rsync $bdir$sdir/$diff $ddir1\n";
}
### Changed this line to add 2nd if condition ###
else
{
print "Free disk space (KB): $free2\n";

### rsync the files to destination source and output to a file for reading ###
#`$rsync $bdir$sdir/$diff $ddir2 >> $logdir$ofile2$datestring.txt`; ### Use this line if you want to log it
`$rsync $bdir$sdir/$diff $ddir2`; ### Comment this line out if you use logging
print "$rsync $bdir$sdir/$diff $ddir2\n";
}
}
}
}
else
{
### Print to screen that there is nothign left to rsync ###
print "\nNothing to rsync\n\n";
}

exit 0;
#--------------------------#
# End of Script #
#--------------------------#

Last edited by d072330; 01-29-2013 at 03:29 PM.
 
Old 01-29-2013, 05:45 PM   #9
A.Thyssen
Member
 
Registered: May 2006
Location: Brisbane, Australia
Posts: 119

Rep: Reputation: 32
Hmmmm you do know that rsync can do the comparision itself, using file sizes, times, and block level checksums (eg only the end of log files are updated).

Comparing files outside rsync, basically would involve the equivelent of copying the files anyway.
 
Old 01-30-2013, 10:05 AM   #10
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS
Posts: 174

Original Poster
Rep: Reputation: 6
No did not know this, good to know. The biggest issue I had was using rsync to copy files from disk to USB then when USB #1 fills up roll over to the second USB drive and so on. If rsync will do this as well please by all means post the command line arguments LOL.
 
Old 01-30-2013, 06:01 PM   #11
A.Thyssen
Member
 
Registered: May 2006
Location: Brisbane, Australia
Posts: 119

Rep: Reputation: 32
It is a intergral part of rsync to only transfer the changes. It was specifically designed with slow modems in mind. This is what makes it different to a normal 'file copy' such as scp, cp, tar, cpio, and so on.

Rsync only replaces files on the destination (breaking any hardlinked copies), if a file data changes, which is why you can create large numbers of 'snapshots' (even once an hour) using very little disk space.

Such rsync backups are not compressed, which allows each snapshot to be look almost exactly like a simple full working copy of the directories that were backed up. That is, it is easy to search, and access any file in any snapshot. You do not have do searching multiple incremental compressed backup files just to recover a specific bit of data, prehaps without knowning the exact filename that data is in. Just search for it directly as you normally would, across all the snapshots. It is the hard linking of unchanged files that gives a rsync multi-snapshot backup method such a good compression.

However hardlinks only work on the same disk storage mount, so each USB would have to have at least one full copy of the files being backed up. Also hardlinked snapshoting will require... hard links.. which requires a UNIX style filesystem. USB sticks typically only use a low level VFAT filesystem (no hardlinks, and DOS file attributes) for maximum compatibility.

As such USB sticks may need a different filesystem for it to work well. And larger USB drives with say a EXT4 filesystem tends to work better. It allows more hardlinked snapshots from the initial full copy (or last snapshot depending on how you look at it), and this higher disk space savings (hardlink compression) per snapshot.

Last edited by A.Thyssen; 01-30-2013 at 06:14 PM.
 
Old 01-30-2013, 06:09 PM   #12
A.Thyssen
Member
 
Registered: May 2006
Location: Brisbane, Australia
Posts: 119

Rep: Reputation: 32
ASIDE: The use of a cloud based filesystem (like dropbox) also precludes the use of hardlinks. As such snapshoting to such a filesystem does not compress well as you do not get hardlink sharing of files accross individual snapshots.

However making snapshot backups on a local machine, of a (prosibly encrypted) cloud based 'working' filesystem that can be shared accross devices, should work very well.

That one local machine keeps 'snapshot backups' (perhaps working automatically in the background), while the cloud allows access to the actual working directory from multiple locations.

If something happens to the cloud, or your working directory gets corrupted for some reason, you have your highly-hardlinked snapshots to recover from. It will be straight forward then to copy the last good snapshot to a new replacement cloud provider.


The last two posts have been included in my general notes (plain text file) on Rsync Backups and Snapshoting.
http://www.ict.griffith.edu.au/antho...c_backup.hints

Last edited by A.Thyssen; 01-30-2013 at 06:31 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
rsync - Backups pkraus109 Linux - Server 2 05-21-2009 12:36 PM
Rsync for incremental backups Meson Linux - General 1 10-30-2007 09:44 AM
Rsync backups gabsik Linux - General 3 11-24-2006 07:14 PM
Rsync backups gabsik Linux - Networking 1 03-30-2006 10:31 AM
Rsync for incremental backups? Phaethar Linux - Software 3 12-04-2003 01:27 PM


All times are GMT -5. The time now is 12:29 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration