LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-05-2019, 04:42 AM   #1
cantab
Member
 
Registered: Oct 2009
Location: England
Distribution: Kubuntu, Ubuntu, Debian, Proxmox.
Posts: 553

Rep: Reputation: 115Reputation: 115
Software to maintain mirror of millions of files?


Hi all,

I have around 7 million files totaling 1.4 TB on one Linux server that I need to mirror onto another. I'll do the initial copy using cp -a onto a USB hard drive but I then need to keep the mirror up-to-date. The servers are linked via our site-to-site VPN with about 10 mb/s throughput and 50 ms ping.

(The files, by the way, are the backuppc 4.0 pool, so holding the backups for our workstations. Fortunately backuppc 4.0 doesn't use hardlinks like the older versions did.)

The amount of data that's new or changed could be very variable. Usually it won't be much, but every once in a while there could be hundreds of gigs to shift, possibly in a single file.

Requirements:
  • Changes only need to propogate one way.
  • It's OK for the transfer to be scheduled, it doesn't need to be real time. (I can use run-one to ensure duplicate transfer processes don't get started.)
  • It needs to be possible to interrupt the transfer, start it again later, and minimise repeat work. (So it doesn't get stuck in an endless loop).
  • Ideally it can cope with the mirror source changing during process, though if required I can ensure it stays unchanged during the weekend.
  • Either server can initiate the process.
  • Encryption is not required (since the VPN encrypts the data over the internet).
  • Bandwidth limiting would be good. (But if not native, I can use trickle for that.)
  • Source is debian 9, destination is ubuntu 18.04, I would prefer to use software in the repos.

I would have just gone with rsync, but I've heard reports of it struggling with millions of files, so I wondered if people had any other suggestions? I use unison for two-way syncs but I've found it to be temperamental.
 
Old 03-05-2019, 04:45 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,078

Rep: Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364Reputation: 7364
I would still give it a try (to rsync).
 
Old 03-05-2019, 04:50 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,374
Blog Entries: 3

Rep: Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771
I haven't heard of such problems but would be interested if you can verify or debunk that rumor.

Or if the volume really is a problem, try rsync on several smaller subsets of the data.

There's an OpenRsync in the works over at the OpenBSD project. It should be fully interoperable with the original rsync but is a clean-room re-implementation. I'm not sure how far along they are with it though and if others have been able to port it to other systems yet.
 
Old 03-05-2019, 04:53 AM   #4
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,486

Rep: Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556
I would rsync in sections of the tree. So if you have a folder structure like

Code:
Root
Root - WS1
Root - WS2
Root - WS3
Then write a job to enumerate the folders under root and do them as individual jobs. That way you've smaller jobs and the ability to detect failure in a more structured manner.
 
Old 03-05-2019, 06:17 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,157

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
rsync has always had problems with massive numbers of files. The solution is snapshot, rather than having to read a bazillion inodes from disk. I long ago went with btrfs, but then I don't have a production environment to look after, and I'm anal about backups.
Snaps only track changes, and they are static (in the sense of "point-in-time"), so you can back them up at your leisure then delete them to recover the space. This is how I use them. With btrfs you can even only send the difference between two snaps (at the source) to save time/data. This is an *old* concept in the enterprise world.
I've even seen drivers at the VFS block level that will do similar for non-snapshot enabled filesystems (ext?, XFS ...) but have never tested them.
 
Old 03-05-2019, 06:34 AM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,374
Blog Entries: 3

Rep: Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771Reputation: 3771
Quote:
Originally Posted by syg00 View Post
... The solution is snapshot, rather than having to read a bazillion inodes from disk. I long ago went with btrfs, but then I don't have a production environment to look after, and I'm anal about backups...
I've read that OpenZFS can do some kind of replication.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Building a Debian Wheezy/Ubuntu Precise Mirror using apt-mirror bennetfox Linux - Server 16 02-23-2013 03:56 PM
Mirror mirror on the wall taylorkh Red Hat 2 07-09-2011 10:46 AM
[SOLVED] Problems with Debootstrap and apt-mirror-based Debian mirror lil_drummaboy Debian 3 06-22-2010 06:12 PM
apt-mirror doesn't create appropriate mirror ??? abd_bela Debian 1 09-30-2009 04:23 PM
LXer: How To Create A Local Debian/Ubuntu Mirror With apt-mirror LXer Syndicated Linux News 0 01-04-2007 05:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 01:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration