LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   need to rsync only selected files (--files-from) also need to delete files on dest. ? (https://www.linuxquestions.org/questions/linux-general-1/need-to-rsync-only-selected-files-files-from-also-need-to-delete-files-on-dest-763322/)

BrianK 10-20-2009 07:42 PM

need to rsync only selected files (--files-from) also need to delete files on dest. ?
 
I'm working on an offsite backup.

To minimize the amount of data transfer, I've written a script that scours ~100TB of data & grabs a long list of files it needs to backup.
If I then rsync using --files-from, this method works well.

The problem is, that list will change daily & many files that were backed up yesterday will be deleted today & therefore should be deleted in the offsite backup as well. How do I accomplish this? I've tried --delete-excluded in combination with --delete, but I'm not seeing any files being deleted on the destination side.

ideas?

Jerre Cope 10-20-2009 11:22 PM

Why are you doing rsync's work? Rsync will find the files that need backed up and only backup the portion of the file that changed if possible. Note the -z option to compress the network traffic.

The --delete option should work. Your homemade lists may be causing a conflict.

BrianK 10-21-2009 12:42 AM

Oh, I'm well aware of how rsync works (well, aside from this particular issue). ;) At the end of the day: because I can't rsync 100TB of data.

Furthermore, most of the data on disk are result sets - I don't need to backup the processed data, I need to backup the files that create the processed data... In the event of a fire or an earthquake (this is southern California, after all ;)), I can pull a minimal backup back online, set several hundred processors to work, and have everything back to the way it was in the matter of a day or three.

Why not just rsync the whole 100TB and let rsync figure out the differences? Two [main] reasons:
1. I would need another 100TB at a co-lo facility. Cost of hardware + cost of rack space + maintenance on that many spindles is prohibitive.
2. My company generates several hundred gigs to possible 1TB or more of data per day. Transfering that much data would take entirely too much time & cost entirely too much.

It makes little sense to backup data that can easily be regenerated. It makes a lot of sense to backup files that generate other data (which = money). So I have all these fancy scripts that find the generating data files... now I need to back them up.

That was probably a much longer explanation than you were interested in. heheh :p

Jerre Cope 10-21-2009 05:04 PM

OK

I'm thinking since you've committed this much effort into optimizing, rsync your list of files to the coloc, add a date stamp field the records, then at the coloc, delete the files from the list after N days.

I've managed to delete files off of 3 machines within minutes using the --delete option, with a little stupidity and bad timing.

Swim swim...

BrianK 10-21-2009 07:01 PM

Interesting idea.

I'd like to think rsync can do what I'm after & I'd rather not rely on crons on both ends. However, that is certainly a route to a solution. If all else fails, I'll go that route. Thanks for the suggestion.

Does anyone know if rsync actually does what I'm after - Delete files not in a -files-from option?

I see there's a -include-from that looks at patterns & works with --exclude-from... maybe there's something there. hmmm....

Jerre Cope 10-22-2009 09:52 PM

Pardon the tangent. This is a working example:

rsync -azv --delete --recursive dutyman:/usr/lib/basic/ usr/lib/basic >>$LOGMSGFILE 2>>$LOGMSGFILE || RSYNCOK="N"

This drops the file in the target directories that no longer exist in the original.

Perhaps this will help you create a micro model, before you unleash on your macro del mundo system.


All times are GMT -5. The time now is 07:30 AM.