Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
10-22-2005, 01:59 AM
|
#1
|
Member
Registered: Jul 2005
Location: West Coast South, USA
Distribution: debian 3.1
Posts: 267
Rep:
|
rsync and backing up
I've been all over the net and reading tons of scripts and howtos relating
to doing backups with rsync. There seems to be a common feature
in all of these efforts involving using | cp -al | to create hard links
against which rsync then runs in order to create incremental backups across a series of timescales. Really nice...
Im starting to appreciate the value in doing it this way, but then i came across one write-up that suggests that rsync has been improved and can do incremental backups on its own now. And i realize that the write-ups i am reading are mostly outdated just a bit.
So not to re-build the wheel... What is the most current, best way to accomplish a sweet backup (to disk) scheme....
Thx,
Danimal.
|
|
|
10-22-2005, 03:09 AM
|
#2
|
Member
Registered: Sep 2005
Location: New delhi
Distribution: RHEL 3.0/4.0
Posts: 777
Rep:
|
Daniel
RSYNC always used to do the synchronisaton of two directories or mirroring in a incremental way only until you have specified it to do overwrite or delete existing like options & thats the key to its success.
Last edited by amitsharma_26; 10-22-2005 at 03:16 AM.
|
|
|
10-22-2005, 06:57 AM
|
#3
|
LQ Guru
Registered: Jan 2001
Posts: 24,149
|
After I suffered a crash a few months back, didn't have any backups of a lot of recent updates (stupid me and to think I am a backup administrator at my current job) so afterwards I threw together a quick script to perform backups for my local webserver. It's simple, I have two scripts, one for full weekly and daily incremental that I throw onto a separate hard disk and I have a weekly cron script that rsync's the data offsite to a few shell accounts I have available. That way I have my local backups on their own dedicated hard disk and then copies of my full weekly backups remotely...
Again, I threw these together in a few minutes and haven't had time to improve them. I'm actually pondering about starting a new project for myself in creating a cool console based or maybe web based backup program that uses nothing but Perl, PHP or Bash scripts to setup backups that is easy to configure and setup.
Here's my full weekly script I use: (You'll notice I backup /var/log/packages, I use Slackware and I back this up so if I have a system crash, I know what packages I had installed at that given time. Everything else is a given, I only care about my configs in /etc and mainly my home directories where all my web server content resides in and user home directories and files )
Code:
#!/bin/bash
#
# System Backup Script.
HOST=myhostname
DATE=$(date +%F)
FILENAME=$HOST.$DATE
cd /
tar cf /data/backup/weekly/$FILENAME.tar /etc /usr/local/bin /home /var/log/packages
gzip /data/backup/weekly/$FILENAME.tar
sleep 2
exit 0
Here is my daily incremental script which I only currently backup my /home directories as those files change daily unlike most of the others:
Code:
#!/bin/bash
PATH=/bin:/usr/bin
BACKUPDIR=/data/backup/daily/myhostname/
cd $BACKUPDIR
OPTS=" --delete --exclude cache --exclude Cache"
TODAY=`date +%d%b%y`
YESTERDAYDIR=`/bin/ls -lrt | grep ^d | tail -1 | awk '{print $NF}'`
#echo $YESTERDAYDIR
if [ $YESTERDAYDIR != $TODAY ]
then
cp -al $YESTERDAYDIR $TODAY
else
echo Retrying Unfinished Backup
fi
rsync $OPTS -a /home /$BACKUPDIR/$TODAY
# Make $TODAY the most recently modified directory
# so it will be easially found tomorrow
touch $BACKUPDIR/$TODAY
I also have cron jobs that clean out my backup directory, keeping Full Weekly's for 2 months and the Daily Incrementals for 30 days, just so it doesn't keep growing and growing. And then my cron job that runs weekly to rsync the latest backup to my remote hosts, in which I place on 3 different remote locations, one here in town, a machine I have setup for my mother's business in Houston and one in Atlanta.. Oh how I love having friends that provide shell accounts with space..
Oh, I also have mysql databases so a mysqldump and rsync I have in place as well, but that runs daily as the databases aren't that big.
|
|
|
10-22-2005, 09:55 AM
|
#4
|
Senior Member
Registered: Nov 2003
Location: Orlando FL
Distribution: Debian
Posts: 1,765
Rep:
|
trickykid, those are great and exactly what i am looking to do for my web/e-mail server.
i have not done any kind of coding since apple basic back in 1980 under an old apple ][ + so i am going to ask some very newbie type questions here.
1. i understand when you have the #!/bin/bash it is a bash script.
2. that is basically the extent of my understanding.
so with that how do i name the script so it is able to be run by cron.
do i need to chmod a+x after i name it?
does it have to be FILENAME.sh?
does there have to be an extension?
with those scripts i take it you have to create the directory you will be doing the backup to, or does your script make new ones every day/week?
also bit off topic, but can the same be done to and from a MAC running OSx?
|
|
|
10-22-2005, 10:56 AM
|
#5
|
LQ Guru
Registered: Jan 2001
Posts: 24,149
|
so with that how do i name the script so it is able to be run by cron.
You can name the script whatever you want. I place mine in /usr/local/bin and the full weekly suprisingly is called full_weekly and the daily backup script is called incremental_daily.
Then my crontab looks like this to run them:
Code:
# Daily Incremental /HOME Backup
00 2 * * * /usr/local/bin/incremental_daily 1> /dev/null
#
# Full Weekly Backup Script
00 1 * * 0 /usr/local/bin/full_weekly 1> /dev/null
#
# Script to Rotate Backups.
00 2 * * 1 /usr/local/bin/clean_backups 1> /dev/null
I also included the cron that runs the script to clean out old backups.
do i need to chmod a+x after i name it?
Yes, they need to be executable.
does it have to be FILENAME.sh?
Not sure what you mean by this. If your referring to variable in the script, you can change this to whatever you want.
FILENAME=$HOST.$DATE can be WHATEVERYOUWANT=$HOST.$DATE and then change $FILENAME to $WHATEVERYOUWANT
does there have to be an extension?
If your referring to the backup file output name, you don't have to have any extension. If your referring to the script name, no extension is necessary.
with those scripts i take it you have to create the directory you will be doing the backup to, or does your script make new ones every day/week?
The full weekly won't create the directory it resides in since it's just a tar and gzipped file of whatever you specify. The incremental daily will create a directory after the date for you, in whatever directory you specify to place it in.
also bit off topic, but can the same be done to and from a MAC running OSx?
If it's running bash and has rsync installed, I don't see a reason it couldn't.
|
|
|
10-22-2005, 12:08 PM
|
#6
|
Member
Registered: Jul 2005
Location: West Coast South, USA
Distribution: debian 3.1
Posts: 267
Original Poster
Rep:
|
Thanks for the recent posts and information...
But I'd really like to stay on point. There's TONS of usable backup scripts on the net. I am asking more specifically about using rsync in conjunction (or not) with cp -al
The question really is - has rsync been updated to accomplish the same 'rolling' style of "snapshot" backups on its own - without using cp -al
Thanks
|
|
|
10-22-2005, 02:51 PM
|
#7
|
LQ Guru
Registered: Jan 2001
Posts: 24,149
|
Quote:
Originally posted by danimalz
Thanks for the recent posts and information...
But I'd really like to stay on point. There's TONS of usable backup scripts on the net. I am asking more specifically about using rsync in conjunction (or not) with cp -al
The question really is - has rsync been updated to accomplish the same 'rolling' style of "snapshot" backups on its own - without using cp -al
Thanks
|
I guess you didn't notice this line in my incremental daily backup script:
Code:
rsync $OPTS -a /home /$BACKUPDIR/$TODAY
So yes, as far as I'm aware, it's been part of rsync since the early days that I've been using it.
|
|
|
10-23-2005, 10:38 PM
|
#8
|
Senior Member
Registered: Nov 2003
Location: Orlando FL
Distribution: Debian
Posts: 1,765
Rep:
|
trickykid, can you point me to a really basic place to learn about rsync that does not assume the user knows WTF they are talking about? i need a howto for dumbies so i can learn.
|
|
|
10-24-2005, 12:29 AM
|
#9
|
LQ Guru
Registered: Jan 2001
Posts: 24,149
|
Quote:
Originally posted by Lleb_KCir
trickykid, can you point me to a really basic place to learn about rsync that does not assume the user knows WTF they are talking about? i need a howto for dumbies so i can learn.
|
I don't know of any good one's as I just refer to the man pages when I learned to use it back in the day. This one might be a start though:
http://everythinglinux.org/rsync/ - Quick getting started
http://rsync.samba.org/how-rsync-works.html - Overview of how it works.
But the man page has the best descriptions and examples:
http://rsync.samba.org/ftp/rsync/rsync.html
Any other questions that come up you might have, feel free to ask.
|
|
|
10-24-2005, 01:51 AM
|
#10
|
Member
Registered: Jul 2005
Location: West Coast South, USA
Distribution: debian 3.1
Posts: 267
Original Poster
Rep:
|
Quote:
Originally posted by trickykid
I guess you didn't notice this line in my incremental daily backup script:
Code:
rsync $OPTS -a /home /$BACKUPDIR/$TODAY
So yes, as far as I'm aware, it's been part of rsync since the early days that I've been using it.
|
Okay, I've looked more closely. But none of what you are doing appears to be what I am after. Tricky - you are way more experienced with unix - for sure, absolutely - than I am, so pls bear with me here. I'll share some links with you if you want; but before i do that, I want to see if I can get across what I mean. Or I want you to show me how your scripts are accomplishing this already.
Okay - here goes.
Let's suppose you've got a LOT of data that you want backed up - say 100GB of music (bad example, i know - mine is on dvd, but let's pretend ) You'd want to avoid backin up the whole damn thing just to ensure you've got the two most recent eminem tracks, and that you don't continue to keep your five album set of Nsynch that you just deleted - right?
Under your weekly script, the one using rsync - you do accomlish a very basic incremental in that "rsync - a" would only backup the files that are new, or changed, or (with --deleted) have been deleted. Pretty cool just at that. But let's now assume that you want to keep 15 weeks of data, only because your employer might want to go back and get that Nsych stuff that he told you to delete, but wants back 'right-away'. It looks to me that under your scheme that old stuff would be long gone 14 weeks ago. You could rotate the backups 15 weeks, but you'd be buying EMC (here's where i beg your patience...)
The scheme Im looking (& trying to grasp) at woud provide this 15week capability, but would only require an insignificant amount of additional storage - an amount pretty much equal only to the adds, deletes, changes, over time.
The heart of the scheme is to rotate the backups and use 'cp -al' before using rsync. 'cp -al' is used to create a hard-link 'snapshot' of your previous backup - because it is only re-linking the files, it is very, very fast and doesn't take much space at all. Then rsync is applied against the hardlinks, and it would do what it normally does. But you'd never have to recopy the entire data set. Over time, your "backup.current" directory would consist of a mix of a few new and changed files and a large mix of hardlinks to files actually 'residing' in and amongst the other 14 directories. But to users, "backup.current" is a complete backup of last weeks data. And "backup.week13" is a complete backup of the files from 13 weeks ago. All available real-time, to be grabbed with a simple copy command. The entire library of rotated backups would be magnitudes smaller than 15 sets of backups.
I hope this makes some sense, and here's a link because Im sure i've confused you somwhat less than I myself am confused - hard links are new to me and difficult...
Danimal
http://www.mikerubel.org/computers/rsync_snapshots/
|
|
|
10-24-2005, 02:50 AM
|
#11
|
LQ Guru
Registered: Jan 2001
Posts: 24,149
|
Okay, my setup is a little different in the way I have it setup to do backups.
Forget my full weekly as it actually redo's all of the files, not just the one's have have changed. It was designed to be a full backup of everything each week, no deletion of any files whatsoever and simply gzips the contents. The way it's setup now, I'll have 8 full weekly's on hand at all times. My incremental works a little the way you want I think.
The basic:
rsync -av
Would basically only copy any new files to the remote destination. I think you have that part down.
What your really after is the cp -al before you rsync like I have in my incremental backup. You'd just want to setup your backups using cp -al and then rsync them to your destination. Then probably the easiest way to keep the data for 15 weeks would be to setup a cron job that removes the files on the remote desination server. You could also do this on your source as well, unless your just backing up directly from source to destination.
I have a dedicated drive on the source machine. I don't do rsync backups of my incrementals. The only reason I have these is if I really screw some file up, I can revert and pull it from a day back or so. If something happened to my incrementals, I'd be forced to revert to my weekly's to pull the file and hopefully hadn't change too much of it's content, that would be the only downside to my current setup. Like I said, I wrote mine in a few minutes after a really bad drive failure and had 0 backups of everything on it. I had months of website updates, all gone. So at least what I have now would only at most set me back a week.
Think doing this with my incremental would probably work for you if I'm understanding you correctly:
Code:
#!/bin/bash
PATH=/bin:/usr/bin
BACKUPDIR=/data/backup/daily/myhostname/
cd $BACKUPDIR
OPTS=" --delete --exclude cache --exclude Cache"
TODAY=`date +%d%b%y`
YESTERDAYDIR=`/bin/ls -lrt | grep ^d | tail -1 | awk '{print $NF}'`
#echo $YESTERDAYDIR
if [ $YESTERDAYDIR != $TODAY ]
then
cp -al $YESTERDAYDIR $TODAY
else
echo Retrying Unfinished Backup
fi
rsync $OPTS -a /home /$BACKUPDIR/$TODAY
# Make $TODAY the most recently modified directory
# so it will be easially found tomorrow
touch $BACKUPDIR/$TODAY
# Now rsync the contents to a remote server
# Setup and use SSH Keys if you don't want to be asked for a password and add the -e ssh option
rsync $OPTS -a $BACKUP/$TODAY user@remoteserver:/path/to/directory/
So now basically what this script does is first check to make sure that yesterday's directory does not equal today, which is basically the date and then does the cp -al of my /home directory and if it does, it will proceed to the rsync command to copy the unfinished backup from before. It then does the rsync bit to copy the files over. If it's running for the first time, then it's going to do the full copy. Once it creates the backup directory of today's date, tomorrow it is going to do the same thing but only copy any new files that have changed after with the cp -al command, it creates the links so you don't end up copying all the files over again. Then once it finishes that, it touches the new $today directory so it's definitely found tomorrow, just an incase setup.
Then with the rsync it copies to the remote destination. What you'll probably want to do on the remote destination is have a cron job run the cp -al command to link the backup directory to a new $today directory like it does locally so then when your local script runs the next day and copies any new files to the destination, it only copies any of the newer files to the remote destination as it will detect the files that are already there so it doesn't take up space that is unecessary.
Then to keep everything around for 15 weeks, setup a cron job that runs a script that looks something like this:
Code:
#!/bin/bash
# Removes Daily Incrementals: 15 Weeks or 105 Days
find /path/to/backups -type d -maxdepth 2 -mtime +105 -exec rm -rf {} \;
It might be wise to add a touch script in this one as well so it doesn't delete your actual /path/to/backups and notice the -type d which is for directories:
Code:
touch /path/to/backups
So I think this is something your looking for or I might have just confused myself even.
|
|
|
10-24-2005, 03:53 AM
|
#12
|
Member
Registered: Jul 2005
Location: West Coast South, USA
Distribution: debian 3.1
Posts: 267
Original Poster
Rep:
|
Sheesh.
Well i need to digest this & test some stuff.
Really, its the hardlinks that are troubling me. I understand that a hardlink is really just another reference to a file. Any file
exists if and only if there is at least one hardlink to it. You do a ls -i and see the same inode, there seems to be no difference. But there can be many hardlinks to one file. Coming from a windows world, this is unsettling, yet cool in a usability way. Here's a troubling thing:
If I have the directories /test and /test2
with the files:
file1
file2
file3
each file 1mb.
and then I copy /test to /test2 as hardlinks
cp -al test/ test2/
I do ls *
and get
test
file1 1mb
file2 1mb
file3 1mb
test2
file1 1mb
file2 1mb
file3 1mb
but then i du
i'll get
/test saying 0mb
and
/test2 saying 3mb
but intuitively it should be the opposite.
Maybe:
Since the OS has stored the data in some place it doesn't care who is pointing to it.
when du is run, the OS reports the 1st instance of link + filesize
** then it sees the 2nd instance and just reports the links and not the size, since it already has...?
The structure is what Im failing to get, i guess.
Are there some commands I can use to better see what's going on? Is there some way to understand
what the original, unique (hard)link was?
The more i get into the woods with linux, the more i seem not to know. I'd like to take a class, yet i
saw the study materials from a friend who did (an advanced class) and was appalled at the crap they
were teaching. What's the best way to go...?
This is all fun and good, but im tired now...
|
|
|
10-24-2005, 12:01 PM
|
#13
|
Senior Member
Registered: Nov 2003
Location: Orlando FL
Distribution: Debian
Posts: 1,765
Rep:
|
thank you, ill start reading and as i come up with more questions ill be sure to post them.
|
|
|
All times are GMT -5. The time now is 03:40 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|