Quote:
Originally Posted by Guttorm
Your script will be a slow when there are lots of files. Every one of those -exec will start a new shell for every file.
|
One of the really great benefits of Unix/Linux is that the cost for spawning/killing a shell for running the same command over and over again is very small. It will basically just copy the parent shell and put in the command from cached pages.
Quote:
Originally Posted by Guttorm
I don't think you can get rsync to do what you want, but you could use a dry-run and get a list of files that a regular rsync would skip.
Code:
rsync -a -vv --dry-run fresh old
It writes " is uptodate" for every file it would skip, so these are the ones you want to delete? To filter out those, you could do something like this:
Code:
rsync -a -vv --dry-run fresh old | egrep ' is uptodate$' | sed 's/ is uptodate//g'
|
Yes, that is an interesting approach. Although rsync will probably not easily lend itself to doing this.
Lets make a test case.
Code:
mkdir freshdir
mkdir olddir
mkdir freshdir/sub1
touch freshdir/sub1/fila1
touch freshdir/sub1/fila2
dd if=/dev/zero bs=1k count=1k of=freshdir/filb1
dd if=/dev/zero bs=1k count=1k of=freshdir/filb2
dd if=/dev/zero bs=1k count=2k of=freshdir/filc1
dd if=/dev/zero bs=1k count=3k of=freshdir/filc2
chmod 644 freshdir/filc1
rsync -av --delete freshdir/ olddir/
Now we have the directories freshdir and olddir which have exactly the same contents.
So we continue by making some changes.
Code:
# Change meta of freshdir/sub1/fila2
touch freshdir/sub1/fila2
# change contents of freshdir/filb1 and freshdir/filb2
dd if=/dev/zero bs=1k count=2k of=freshdir/filb1
dd if=/dev/zero bs=1k count=2k of=freshdir/filb2
# force meta (time) of freshdir/filb2 and olddir/filb2 to be the same
touch -t 201511041121.00 olddir/filb2
touch -t 201511041121.00 freshdir/filb2
# Change meta (privileges) of freshdir/filc1
chmod 664 freshdir/filc1
# Create two new equal files freshdir/fild1 and olddir/fild1
touch -t 199901011111.11 freshdir/fild1
touch -t 199901011111.11 olddir/fild1
# Create two new files freshdir/fild2 and olddir/fild2 with same contents but different meta (time)
touch -t 199901011111.11 freshdir/fild2
touch -t 199901011111.10 olddir/fild2
Now freshdir is different to olddir. In contents and/or meta.
Directories compare like the following:
Code:
sub1/fila1 : Same
sub1/fila2 : different time meta
filb1 : Different time meta/contents
filb2 : Same meta. Different contents
filc1 : Different privs meta
filc2 : Same
fild1 : Same
fild2 : Different time meta
So we let rsync make a dry run, and look at its output.
Code:
rsync -avvn --delete freshdir/ olddir/
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
filc1
filc2 is uptodate
fild1 is uptodate
sub1/fila1 is uptodate
./
filb1
filb2
fild2
sub1/fila2
total: matches=0 hash_hits=0 false_alarms=0 data=0
sent 236 bytes received 191 bytes 854.00 bytes/sec
total size is 9,437,184 speedup is 22,101.13 (DRY RUN)
It will mark the files sub1/fila1, filc2 and fild1 as being same in meta and contents.
Now we run rsync and apply your filter:
Code:
rsync -avvn --delete freshdir/ olddir/ | egrep ' is uptodate$' | sed 's/ is uptodate//g'
filc2
fild1
sub1/fila1
It gives a list of subpath/file for files that are the same in meta and contents.
the list could then be used with something like:
Code:
rsync -avvn --delete freshdir/ olddir/ | egrep ' is uptodate$' | sed 's/ is uptodate//g' | tee thefiles
pushd olddir && cat ../thefiles | xargs rm ; popd
It works like a charm to produce a duplicate-file kill list. And the kill list can be inspected before we start deleting the files.
Thank you for the idea.
But rsync will of course output lots of other info.
So your approach is quite doable, but is it safe?