LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Need Backup Solution Advice (https://www.linuxquestions.org/questions/linux-software-2/need-backup-solution-advice-764829/)

C4talyst 10-27-2009 12:43 PM

Need Backup Solution Advice
 
Hello Board,

I'm a long-time lurker here and have been referred to LQ by Google for stuff more times than I can count. I signed up today to ask for advice on creating a simple backup solution for a server that contains a lot of data.

I've helped a friend put together a very nice system, Dell R710 w/ dual quad-core processors and 32GB of RAM. It's running CentOS 5.3 and is used to power a large government web site.

We need to backup approximately 1.5TB of data and prefer a daily incremental task and a weekly full backup task. I have two 2TB external drives that are attached to the server for this purpose.

I wanted to get some opinion from others as I've never had to backup such a large amount of data. I was thinking about using a simple rsync script for the daily incremental and just a full copy of /etc, /home, /var and /opt for the weekly full. Those are the only directories/partitions we're concerned about.

Any assistance in determining a solution is greatly appreciated.

Thanks,
C4t

Lordandmaker 10-27-2009 01:22 PM

What to backup depends on what data changes the most, and what's most valuable to you. Personally, I'd keep /etc under version control and use that to restore if I need a backup. I imagine the web site is somewhere under /var? Bear in mind there's probably a chunk of data in there, too, that you don't want taking up space on your backups.

I would suggest, though, that you don't keep your backups in the same room as the server. Theft or fire is likely to affect both of them. Keeping the backups online (or having them able to be brought online autonomously) also gives anyone who does break in the ability to hose them.
I'd advocate an off-site backup system, ideally one with some human interaction.

mesiol 10-27-2009 02:47 PM

Hi,

intelligent backup requires to handle various restore scenarios. Rsync is a possibility to backup, but how does your restore look like?


There are dozens of questions you should ask yourself about your backup/recovery scenario.

- How large is your data change per day?
- Is there really a full backup required every week? or is one per month enough?
- How long should the backup be available for restore? Is point-in-time recovery required?
- Is disk backup really what you want or is a more reliable media required?
- Are there any legal issues to be resolved?
- How often can you test the restore?
- What about fire or floodwater?
- What about handling filesystem links?
- What about files deleted and added new having same filename?

First you should think about your strategy, this will be more difficult than finding a software solution. Plan your backup careful. After that go on the market and take a look what solution will fit your strategy.

chrism01 10-27-2009 08:06 PM

I agree, strategy first (as above posts) then tactics ie tech solns.
rsync sounds good and if you give it a clean/empty target once a week, you'll get the full backup.
Consider several generations:

daily - incremental
weekly - full
mthly - save last weekly backup per mth
yrly - save last mthly backup per yr

Also consider whether those definitions of mth, yr will work or whether you want calendar mths/yrs.
Also, consider your country's end-of-financial-yr date; you may want/need a full backup as of the last day of the financial/tax yr.
Ask the Gov dept involved what their internal rules are; you need to fit in with them. They may even be governed by legislation.

You may want to consider duplicate copies of each backup in case one goes bad... stored eg one on-site for speed, one offsite for DR.
Check whether off-site should be encrypted, prob a good idea.
http://en.wikipedia.org/wiki/Remote_backup_service
There's a famous quote about un-encrypted backups making it too easy for people to copy the data; can't find it right now though.

choogendyk 10-27-2009 08:08 PM

rsync is one good approach, because it is efficient. However, the end result is a current copy of what's on your original drive. If you want to recover a configuration file you had yesterday, before you broke your system, then you may be out of luck. One solution to that is snapshots. And, there just happens to be a solution using rsync to create snapshot like backups. http://www.mikerubel.org/computers/rsync_snapshots/

The disadvantage of that approach is that a failure of the backup drive loses all of that. So, make it raid. Or make it mirror. Or make multiple copies of it in some way. If you have two drives, you could rsync one, take it off site, and rsync the other the next day. Then let it run with rsync snapshots for a week. Then swap the drives and let the rsync snapshot procedure create an updated snapshot on the first drive. Then let it continue running rsync snapshots for a week. Then swap again. At some point, you might run out of space. Then you could start pruning older snapshots.

Another alternative is to go with a tape library. (you said this is a large government site, right? so budget shouldn't be a complete road block.) I found a relatively inexpensive (as tape libraries go) one -- the Sony LIB162 AIT5. It has 16 tape slots. Each tape holds 400G native, and might compress to more than double that depending on your data. Because it is a carasoul changer, it is a simpler mechanism than most, and runs about $5K. If you start looking at LTO4 robots with typically 24 slots or more, the prices are typically $10K or more. But that's all just ball park. You then also have to budget for tapes. The advantage is that you can then have a cycle with nightly backups, tapes going back, say, 6 weeks or more, and off site archival tapes. I use Amanda to manage all that. Amanda has planner that works out dump strategies to smooth the backup over the entire dump cycle (say, a week), so that you don't have the huge resource hog of once a week full backups of everything, and then the backup system on semi idle the rest of the week just doing incrementals -- http://wiki.zmanda.com/index.php/FAQ...da_use_them%3F. That's one of the main reasons I chose Amanda.

I actually like as much redundancy as I can manage. I have an external raid array that is managed by ZFS. It uses raidz2 (that's roughly equivalent to raid6) with a hot spare, so it has 9 data drives, 2 parity drives, and 1 hot spare. It would have to experience 4 drive failures to actually lose data. Using ZFS snapshots, I run a snapshot every night, and I keep those for the semester. In addition to that, I run a 6 week tape cycle, periodic archives, and cycle tapes off site. I also have some large radmind directories containing images that allow us to configure large numbers of lab and desktop computers easily and automatically. I use rsync to keep an up to date copy of that directory on another server in another building. I also have a cron daemon that does a remote copy of the Amanda configuration and index directories to a server in another building after the completion of each daily Amanda backup. So, gee, am I covered? hmm, I'm sure I can come up with something else I ought to be doing.

Just spend some time imagining what can go wrong. Then think about how you would recover from that. Then think some more.

If you're interested in digging deeper, check out the O'Reilly Backup and Recovery book, and/or take a look at the companion web site http://www.backupcentral.com/.

C4talyst 10-28-2009 01:21 AM

Thanks for all the great replies, you've given me a lot to think about. We already know what needs to be backed up, and when; I'm also aware of how much the data will grow over time. I'm just not certain about which route to take to implement the actual backups. I need speed and efficiency for the daily incremental and the weekly full.

I plan to rotate the external drives offsite like I do with tapes on other machines.


All times are GMT -5. The time now is 03:10 PM.