Full File Sync

TBotNik · 11-14-2018, 11:59 AM

All,

Looking for a file sync program, hopefully git based, that can do:

Add multiple duplicate directories to the repository,
Test and eliminate/delete any 0Kb files in any of the directories loaded,
Find all "like named" files, including File, File1, File2 as autonamed by Linux
Do complete code comparisons before merger,
Hold the file with newest date as the anchor/master file,
Merge all changes from other "Like" files into the master file,
Delete all non-master files, after merge to master,
Hold non-mergable files in a separate repository for review,
Has suggested merge files for non-mergable files, by code comparison,
Has CRON based backup system for master and non-merged files.
Use a graphical or browser interface to work the remaining non-mergable files.

I've not yet found and app that does this and read a lot of info. Understand git itself can do this, but not git savy, so not sure how to implement it. Have git installed!

Cheers!

TBNK

frankbell · 11-14-2018, 08:04 PM

Would you list some of the programs you have decided not against, so that posters don't go back over ground you have already plowed?

fatmac · 11-15-2018, 06:05 AM

Normally, rsync is used by most people & GUI programs.

TBotNik · 11-15-2018, 12:16 PM

fatmac,

From what I read RSYNC is only capable of half these features, but inform me if you see differntly.

Kinda wondering if I have to write a BASH script that gets all the directories and files then starts calling other apps like rsync to work the backend issues. To properly sync with versioning, a compare will have to be executed.

Oh! On my list of features I have to be able to:

Read and Extract compressed files also,
Sync to my own git repo and then to Dropboxes git repo,

Cheers!

TBNK

scasey · 11-15-2018, 01:01 PM

Check out rsnapshot. Yes, uses rsync but is highly configurable and very flexible.

TBotNik · 02-20-2019, 07:27 PM

Quote:

Originally Posted by fatmac

Normally, rsync is used by most people & GUI programs.

fatmac,

I conceptually haven't a clue when I read the rsync docs and haven't found a "Rsync for Dummies" book yet!

TBNK

TBotNik · 02-20-2019, 07:29 PM

All,

Seriously, because of so many ways to approach/do this and all totally confusing, conceptually, thinking about writing a "SyncMasters Bible" to show and explain, with examples of each method!

What do y'all think?

TBNK

Turbocapitalist · 02-21-2019, 12:21 AM

It would be both interesting and useful to many.

slac-in-the-box · 02-25-2019, 06:20 AM

One funny thing about rsync flags are the flags that represent sequences of other flags, like -a (--archive) which is equivalent of -rlptgoD... so after trying heaps of options, I settled on good old -abcv, with an added -v for verbosity, lol. My rsync often look like

Code:

rsync -abc --exclude="nothispattern" --progress --suffix="$(date +%s) sourcepath/ targetpath

Usually one of the paths is on a network, and the -c flag only transfers files with different md5 checksums, which really saves bandwidth (which I still pay for by the GB)... Some network locations won't allow the -c flag, as it creates more work for their cpus, so I have to remove it in those situation.

The --suffix flag tells it to keep both files when there are duplicates with same name.

If I don't put the trailing slash on the source path, it makes a directory inside the target path, such that repeat syncs can create a directory inside a directory inside a directory ad infinitum.

These are not all of your features, however, I'm still an rsync novice--as pointed out, the man page is lengthy. Perhaps some of your other features are in there too.

TBotNik · 03-27-2019, 10:05 PM

All,

OK, my biggest problem is that Dropbox has gone south, it's been total crap now for over 5 years. It errors out, requiring a new install and it can not use a currently existing path of /../,,/,,/Dropbox, so writes a new /Dropbox folder somewhere else on your box, so now have over 20 copies of the #$$^&^%%$ thing.

So I'm not longer using DBox, so I have to find all copied files on my present box, I have to:

Set a target folder/dir that I consider all the latest copies of the files,
Run a script to find all the files and record them in the DB,
Run another "diff" type script to determine which files are equal to the ones in the target dir,
Delete the extra copies that are exactly "like" or "equal" to the one's in the target dir,
Mark the deleted files in the DB, so they will be overlooked for additional processing,
Run an additional "diff" type script to record the lines or records not equal and place in the DB,
Merge the files if a document type,
Update the DBs is the file is a DB type and records have been changed or added.
Mark the completion dates, in the sync DB, when process is complete for the orginal file and it's dupes.
Find all .sql files not existing in the target dir and move/copy them there, since this will be the actual data backup dir.
Backup the entire updated set of DBs to a dated .tar.gz file, so we have the latest backup, since completing the sync,
Repeat this sync process for all files and dirs on the Network attached machines (20).

So since MySQL and the other DBs are usually the hardest to backup and recover I started with all the backed up .sql files. Elsewhere in post:

http://www.linuxquestions.org/questi...nd-4175648313/

You'll see I was struggling with getting the /etc/updatedb.conf and the disk mounts to work correctly, to see all the /*.sql files. Finally got that working right, so have all the .sql files captured in /home/files/sql_dump.txt, which is over 4,000 files. Wrote the additional db_syncs.sql file to create the DB for recording these:

Code:

-- Database: `db_syncs`
# DROP Database IF EXISTS `db_syncs`;
CREATE Database IF NOT EXISTS `db_syncs`;
USE `db_syncs`;

DROP TABLE IF EXISTS `dh_files`;
CREATE TABLE `dh_files` (
	`fil_idx` INT(11) NOT NULL AUTO_INCREMENT	COMMENT 'Unique Cat Key',
	`fil_sfx` INT(11) NOT NULL 					COMMENT 'Xref to SameFile',
	`fil_pth` VARCHAR(255) NOT NULL				COMMENT 'File Path',
	`fil_bnm` VARCHAR(125) NOT NULL				COMMENT 'File BaseName',
	`fil_nam` VARCHAR(125) NOT NULL				COMMENT 'File Name',
	`fil_ext` VARCHAR(12) NOT NULL				COMMENT 'File Ext',
	`fil_org` ENUM('Y','N') DEFAILT 'N'			COMMENT 'File From Org Dir',
	`fil_eql` ENUM('Y','N') DEFAILT 'N'			COMMENT 'File Equal to Org File',
	`fil_ddt` datetime								COMMENT 'Delete Date',
	`fil_cdt` datetime								COMMENT 'Create Date',
	`fil_mdt` datetime								COMMENT 'Modify Date',
    PRIMARY KEY (`fil_idx`));

DROP TABLE IF EXISTS `dh_same`;
CREATE TABLE `dh_same` (
	`sam_idx` INT(11) NOT NULL AUTO_INCREMENT	COMMENT 'Unique Match Key',
	`sam_fdx` INT(11) NOT NULL 					COMMENT 'Xref to New File',
	`sam_pdx` INT(11) NOT NULL 					COMMENT 'Xref to Org File',
	`sam_nam` VARCHAR(255) NOT NULL				COMMENT 'File Name',
	`sam_cdt` datetime NOT NULL					COMMENT 'Create Date',
	`sam_mdt` datetime NOT NULL					COMMENT 'Modify Date',
    PRIMARY KEY (`sam_idx`));

Sure I will have to add to this DB as I go along and learn what else I need. 2nd table is the one where "Same File" by filename are recorded.

So now writing a PHP script to run the process. I do over 90% of my coding in PHP, but will also want to create a BASH version for those of you challenged in PHP.

Running my script I ran into errors, which I posted at:

https://www.linuxquestions.org/quest...23#post5978623

Cheers!

TBNK

PS
I dumped DBox for these 3 reasons:

They could/would not fix their original problem that was causing all the duplicate folders,
They refuse to support me in attempting to fix/merge the duplicates into one true updated version,
From all the users I support and signup with DBox, I had 1.4TB of "FREE" space, but they now want $$ for it.

Cheers!

TBotNik · 12-11-2019, 10:38 AM

All,

Finally decided to use "git merge-file" to accomplish elimination of the dupes, but try as I might, I can not get the cmd line string in the right format to actually make it work!

TBNK

TBotNik · 01-02-2020, 02:14 PM

Cn I get some help here?

Is there a GIT forum or a GIT help page on irc.freenode where I can get this fix?

TBNK

TBotNik · 04-02-2020, 12:51 AM

Quote:

Originally Posted by slac-in-the-box

One funny thing about rsync flags are the flags that represent sequences of other flags, like -a (--archive) which is equivalent of -rlptgoD... so after trying heaps of options, I settled on good old -abcv, with an added -v for verbosity, lol. My rsync often look like

Code:

rsync -abc --exclude="nothispattern" --progress --suffix="$(date +%s) sourcepath/ targetpath

Usually one of the paths is on a network, and the -c flag only transfers files with different md5 checksums, which really saves bandwidth (which I still pay for by the GB)... Some network locations won't allow the -c flag, as it creates more work for their cpus, so I have to remove it in those situation.

The --suffix flag tells it to keep both files when there are duplicates with same name.

If I don't put the trailing slash on the source path, it makes a directory inside the target path, such that repeat syncs can create a directory inside a directory inside a directory ad infinitum.

These are not all of your features, however, I'm still an rsync novice--as pointed out, the man page is lengthy. Perhaps some of your other features are in there too.

slac-in-the-box,

Your comment made more sense than most, but being HEWBIE to the whole sync thing still GREEK to me. Hope you have patience with me and explain a little more!

Do you have a good "SETUP" HOWTO link to share?

Cheers!

TBNK

descendant_command · 04-02-2020, 02:43 AM

How much are we talking about?

'Meld' might be a helpful tool for manual sorting.
It can compare/merge dir's as well as files.

Hermani · 04-02-2020, 03:26 PM

Why don't you check out FreeFileSync at https://freefilesync.org/

Have been using this program with a lot of success and (more important) zero failure. I think it can do most if not all you require.