[SOLVED] Home Backup Server - Is this a good approach?
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hello, I'm trying to create a decent backup server for some personal files (documents, pictures, etc. for one user). I currently have two USB-enabled routers running OpenWRT in different cities, and a VPS (and I may add some more). What I'd like is a folder on my local Samba server that automatically and instantly backs-up to the other remote sites. The goal is to protect myself against data loss in the event of theft, HDD failure, or VPS companies going bankrupt, but I'm not (ostensibly) preparing for anything like a Carrington Event. Here is the approach I was thinking of:
Create a ~32 GB file on each machine.
Use NFS (or NBD?) to share it with my local router. (Probably over the open internet.)
Use LUKS on the shared image. (The passwords are kept in RAM on the router I have physical control over.)
Use some form of RAID (1 with 3 servers, 6 if I add some more VPSs) on the shared, encrypted images.
Format the RAID volume using Btrfs so I have snapshotting.
Script some monthly/weekly backups of all image files to my local drives (where I've got plenty of storage).
Thoughts? Suggestions? Simpler methods of accomplishing essentially the same thing?
Distribution: Arch, Debian, Fedora, Mint, CentOS, FreeBSD, OS X
Posts: 16
Rep:
You bring up RAID... are you saying you would use the image file on each of the servers as a member of the RAID? Performance in that scenario is going to be pretty horrible, and it sounds much more fragile than the basic approach people have been using for ages.
I think you'd probably be much better off just using rsync to propagate changes, either an image that you encrypt and store to, or the flat files.
stupendoussteve has it right. use rsync and if you are worried about data theft at any end, use volume encryption at both ends or heck even take it slightly further and encrypt a tarball that you use rsync to send from an encrypted volume to the remote encrypted volumes so that not only is the volume encrypted but the data is as well.
trying to create a RAID via the WAN would be so slow and pathetic you would see more chance of loss of data then any RAID could provide to recover.
How dismal of performance are we talking here? Upload speed is only 1 Mbs, so I was anticipating something like 20 - 50 KB/sec write speed, which is fine for what I plan to use it for.
The reason I was thinking RAID rather than rsync is two fold. The first is that running rsync on multiple remote filesystems seems like it'd be even slower than RAID (i.e. simultaneously write a 100 KB file to 3+ sites VS scan multiple 32 GB filesystems remotely and transfer modified files). The second is that I'd rather not trust any specific HDD to not have read/write errors, including the local one. With rsync, if the local drive starts putting out garbage data then it'll propagate to the other servers until I notice and stop it.
not sure on the first part, but i can address then 2nd part. its simple just use a day of the week folder option to store up to 7 days worth of your data on the remote locations. this gives you a fair amount of spread for data recovery, or go nuts and use the day of the month # so you have between 28-31 days worth of backups. all depends on how relaxed you want to be at monitoring your local system.
after reading your post I am still quite not very clear in terms of what exactly you want to do, from what ever I've understood is you are trying to avoid data loss, I don't understand why are you making things so complicated, If I were to protect my data from data loss I would simply stock-up a NAS-Box at home configured with RAID 5 or 1 then keep extra portable HDD where I would keep another copy of data and If I were to pay any third party for keeping my data, I think drop-box from goggle is quite reliable.
Part of the complexity is taking advantage of what I have on hand. I use a VPS to run some undemanding stuff, so I have ~40 GB to play with, but I don't get great reliability (maybe two nines). I'm also using routers (NAS essentially) and rather old USB harddrives, so I anticipate that one or more will fail before too much longer. Since one location will be off-site, I won't be around to swap out a faulty drive, so I'm trying to handle redundancy at the site level rather than the drive level. Furthermore, as MegaUpload proved, online file storage services can easily vanish overnight. (Less likely with Google, but not impossible.)
My current solution is to use my local router (NAS), and periodically copy the files to the variety of old harddrives I have laying around. Over the holidays I leave one such drive at my parent's house. This is time consuming though, so my backups tend to be quite old. If my NAS harddrive dies, then I'll probably lose a month of data. If my home burns down then I'd lose a year, with the exception of a few hundred megabytes on Google Drive. When I was researching a better method, the intrinsic error rate of harddrives made me worried enough to want insurance against the failure of two drives (or 1 whole drive and some bad sectors on the others).
A confounding problem is that my next job will require me to keep some detailed logs, and losing a week's worth would be kinda bad. I'm also privy to some private data that I won't risk on the cloud or leave unencrypted for thieves. Unfortunately, the onus is on me to keep track of all this, so I'm looking to make something secure, resilient, and low maintenance. It's well above and beyond what most people do in my situation, but I needed something better anyway.
Distribution: Arch, Debian, Fedora, Mint, CentOS, FreeBSD, OS X
Posts: 16
Rep:
Why not something like SpiderOak or CrashPlan, or even one of the backups to Amazon S3? Most allow you to generate your own encryption information so that not even they could recover files, unless you provide it to them. Seems much easier if you're really trying for nearly bulletproof. CrashPlan also does free backups to local folders as well as other computers running the client (could be your parents with your external drive plugged in).
The second is that I'd rather not trust any specific HDD to not have read/write errors, including the local one. With rsync, if the local drive starts putting out garbage data then it'll propagate to the other servers until I notice and stop it.
There is no reason for you to not trust the disk but trust the RAM, CPU, mainboard, ... . If your RAM in the machine that handles the RAID gets faulty you have good chances that your complete RAID would get corrupted, leading to a complete data loss (other causes of course can be bugs in drivers, especially when it comes to something that I wouldn't consider good enough for productive environments, like btrfs). Here you see why a RAID is not a sufficient replacement for a proper backup strategy (and in fact RAID was never intended to be that).
You can only prevent something like that with regular backups to different machines on different locations (I backup my files to a local fileserver, a Dropbox account and a VPS), if you are worried about bandwidth usage do incremental backups so that you can restore from a given point in time without always have to make complete backups. Which interval you choose for your backups is of course dependent on the amount of data you produce a day/week/month.
There is no reason for you to not trust the disk but trust the RAM, CPU, mainboard,
That is a good point which I hadn't thought about. Non-ECC RAM suffers between 2.5 - 7 bit flips per gigabit per hour (varies by altitude), and all other components introduce ~1 bit flip per 20 terabits. That's a little distressing that home computers are averaging up to one error every 20 MB of RAM per hour, but I'm not exactly sure how I could realistically do anything about it. Filesystems and such seem to work despite that, so I'm not going to worry too much.
Quote:
Originally Posted by TobiSGD
especially when it comes to something that I wouldn't consider good enough for productive environments, like btrfs
This was something I was wondering about, since I haven't used it. Snapshotting is nice, but I didn't know if it was stable enough to use yet. Some projects are rock solid even in alpha, while others... aren't.
I'm going to go ahead and mark this topic as solved, as the consensus seems to be a resounding "no". Not exactly what I was hoping to hear, but I just dabble in this stuff as a hobby so I'll trust everyone's judgement.
Distribution: Arch, Debian, Fedora, Mint, CentOS, FreeBSD, OS X
Posts: 16
Rep:
If you're seriously worried about filesystem corruption, I would recommend taking a look at FreeBSD and the ZFS filesystem (a ZFS port to Linux is also available - I have used it and had no problems, but in FreeBSD it is integrated and actually supported). ZFS is designed to be resistant to filesystem corruption, including corruption from bit flips, and it will scan and repair corrupted files. It also includes its own RAID system that can support up to two disk failures, and snapshots. Many of the ZFS features are planned for btrfs in the future, which isn't a surprise as Oracle makes both.
An easier alternative way to get the benefits is FreeNAS.
EDIT: I was wrong, ZFS now has a "raidz-3" feature that can lose up to three disks!
Last edited by Stupendoussteve; 04-02-2013 at 11:00 PM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.