LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Articles > Jeremy's Magazine Articles
User Name
Password

Notices


By jeremy at 2005-03-07 09:50
TECH SUPPORT
Quick and Dirty Backups
by Jeremy Garcia

Everyone in IT knows that a good backup strategy is critical. The unfortunate reality, however, is that all too many users -- from the home to the enterprise -- don't yet have an adequate backup system in place. If you can count yourself among that group, consider this New Year's resolution: "I will back up my data."

When formulating your backup plan, ask yourself these questions: What do I need to back up? How often does it need to be backed up? How long will I need the data for? What medium would I prefer to backup to? Armed with answers, you can begin to search for a solution that fits your needs.

Your search may lead you to applications such as Arkeia, Amanda, and dump, and a variety of storage mediums, such as DLT, DAT, and CDR. Enterprise users will likely look into solutions such as Legato, ARCserve, and VERITAS. Your backup needs should dictate the solution. It's also extremely important that you fully test your backup (and restore!) procedure after it's put into place. The worst time to discover that your system is flawed is after data loss has already occurred.

Here, let's focus on using two standard Linux utilities, rsync and tar, that can quickly and easily backup data over a network (LAN or WAN) to a hard drive on a remote machine. While rsync and tar lack some of the more advanced features of other backup applications, the two tools are simple to configure, free to use, and readily available. Chances are that your Linux distrbution includes both.

The first utility, rsync, synchronizes source and destination directories, either locally or remotely. One of rsync's greatest strengths is that it only transfers the differences between two sets of files, which saves bandwidth and transfer time. However, a major drawback to rsync is that if a file becomes corrupted or is accidentally deleted, rsync replicates the corruption or deletion. You can somewhat mitigate this problem by syncing to rotating directories, such as one directory for each day of the week.

The syntax for rsync is similar to that of cp. The basic command to replicate from a local machine to a remote one is:

Code:
$ rsync -e ssh -a --delete  /usr/local/ backup jeremy@backup.host:/home/backups
This command recursively replicates the entire contents of /usr/local/backup/ on the local machine to /home/backups/ on the remote host, while preserving symbolic links, permissions, file ownership, timestamps, and devices. -e tells rsync use to use a secure ssh connection instead of rsh (the default), and --delete removes any file from the remote side that no longer exists on the local side.

So, to use rsync as a backup method, simply schedule the above command with cron, setting a frequency. Before you do that, though, make sure that the password-less logins we set up using ssh keys in the July 2004 "Tech Support" (http://www.linux-mag.com/2004-07/tech_support_01.html) are working. Without ssh keys, cron just hangs, waiting for your password.

You can use the following script to rsync to a different destination directory based on the day of the week:

Code:
#!/bin/sh
BACKUPDIR='date +%A'
rsync -e ssh -a --delete --backup \
  --backup-dir=/home/backups/$BACKUPDIR \
  /usr/local/backup \ 
  jeremy@backup.host:/home/backups/today
rsync is extremely flexible and has tons of options, so read its man page to tweak the examples to better suit your needs.

tar is a backup program designed to store and extract files from an archive file, better known as a tarfile. Using tar for backup is easy: just place everything you want to archive into the tarfile, and copy the tarfile to another machine for safekeeping. This technique stores every file every time, and lets you recover a file from an arbitrary point in time. You can also use gzip to make tarfiles smaller.

The following script creates a compressed tarfile of the local /usr/local/backup/ directory, places the archive in /tmp/ with a filename that contains the year, month, and day, and copies it to /home/backups on backup.host:

Code:
#!/bin/sh
DATE='date +%F'
tar zcpf /tmp/backupfile-$DATE.gz \
  /usr/local/backup
scp /tmp/backupfile-$DATE.gz \
  jeremy@backup.host:/home/backups
rm /tmp/backupfile-$DATE.gz
While neither rsync or tar alone constitute a comprehensive backup strategy, both allow you to quickly and reliably backup content to a remote machine using standard tools. They also make restores trivial.

by Technoslave on Mon, 2005-03-07 10:48
Just did a cursory glance at the article, the thing I got out of it was ssh and rsync, maybe some tar thrown in there too. That's how I currently do my backup. I was doing it to two other boxes, one local and one off-site. I backup my home directory, minus a couple of directories that have MP3s in it and media files like movies and tv shows. I don't back up my /tmp dir. I backup all my stats gathering programs and their data, vnstat, mrtg, awstats, as well as my dns/bind, apache, and sql directories, plus a few others.

All in all it's over a gig or so worth of data that I've backed up. The initial backup took a while to do, especially the offsite stuff. But my nightly backups now take around a minute to do, and that's mostly due to rsync checking sums to see if anything needs to be uploaded.

From a perspective of someone who has been doing administration for 12+ years, I don't find anything wrong with doing it this way, especially since I don't have access to a tape drive.

Before attempting this ... well, ok, after I already started, I found several other articles about how to do this and in an even more efficient manner, such as using hard links and soft links and some other junk to actually have a history of backups. It was all rather neat, but all I really need is just a backup, and that's what I have and it works.

Nothing wrong with using rsync to create an off-site or on-site/off-box backup.

by prn on Fri, 2006-03-10 11:56
Nice article. I am working in an environment where I have a couple of machines I need to back up this way and it works very nicely. The main thing I would strongly suggest is adding the -z switch to the rsync command line as in:

rsync -az -e /bin/ssh --delete ...

The -z switch will compress the data as it is sent over the network to the remote machine. That can result in considerable speedup in the data transfer, depending on the type of data being transferred.

Of course, I also advise anyone thinking of doing this to read the man pages on rsync. I have also found it useful to "exclude" certain files or types of files from the backup. The format for exclude-files can be a little tricky, but, again depending on your needs, can be worth quite a lot of savings. For example on one machine where I use this method to back up the entire machine, my exclude-file contains lines like:

+ /dev/
- /dev/**
+ /proc/
- /proc/**

These tell rsync to have a /dev directory and a /proc directory on the remote machine, but not to copy any of their contents. I'm sure other types of exclusions will occur to many readers.

Very useful material! Keep up the good work.

Paul


  



All times are GMT -5. The time now is 06:43 PM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration