Is my backup strategy overkill?
My computer is my laptop. I perform all backups using a script that I wrote that implements rsync.
Also in my house are a desktop, an external disk, and two usb flash drives. On all 4 of the devices I keep a one-to-one copy of the files which I want backed up in a folder called "Today." I don't have any particular schedule, I just run it as I see fit. On the desktop and external disk I also keep folders called yesterday, week, month, six_month, and year. When the year folder is over a year old, it updates itself from the six_month. When the six_month is over six months old, it updates from month. Etc. Also, on all 4 devices there is a folder called TRASH. When running a backup to the today folder, if a file is changed or modified, the old version has the date appended to it (filename.ext__YYYYMMDDHHMMSS) and it is moved to the TRASH folder. When the rotations are taking place, the TRASH folder is NOT used. Code:
/disk/user/today However, it occurred to me that maybe having the TRASH folder AND the 5 snapshots isn't necessary. If there is a corrupted file on my laptop, will rsync view it as a change, thus storing the good version in the TRASH folder? I think if I picked between the trash system and the rotations system, I would choose the trash system... I'm aware that the major hole in my strategy right now is that all of these devices are located in the same building, but I'm going to get a remote setup running soon. Basically it would just be one more devices with the same type of directory structure. Here is the script. It's pretty straight forward, all of the setup is at the top, the actual program is at the bottom. Sorry if it's a little messy but that's because of all the output and colors. Code:
#!/bin/bash |
Depends on what your backing up really.
If your a coder or developer then it would make sense. If its pictures of your cat and the odd letter to your gran, then yes its overkill =) |
FWIW, I realized a while ago that I can divide my personal files into three categories: those where I care about previous versions (because they are my own work), those where I don't (multimedia, mostly), and settings files. All my email is on a reliable IMAP service (Fastmail).
For the first type of file I currently use Subversion with the repositories on an off-site server, and keep a second set of working copies there as well. I'll be migrating those to Git shortly, which has strong integrity checks and doesn't require you to maintain a central repository. For the second type I just have rsync scripts so that I can keep copies on both my laptop and an external hard drive, to prevent a single drive failure from causing any losses. I think that it pays to be a little paranoid. The hard drive on my laptop started to act oddly, and failed totally two boots later. Since I knew that I hadn't actually lost any data it wasn't a big deal. |
Ahhh, I forgot to say. I do have three categories. If you look at the backup definitions, the array backups. There are numbers associated with each folder, 0 1 and 2. 0 Is for my mozilla and pidgin profiles. I only want a 1 to 1 of each and I don't care about deletions. 1 is for a 1 to 1 in the today folder but deletions go to the TRASH folder. 2 is for large stuff like music and videos. Chances are I don't want to store EVERYTHING on my laptop, so stuff with the 2 backup isn't deleted on the backup location if it's removed from my laptop.
Also, the large stuff, like music and videos is only backed up to the "today" folder. It isn't rotated into the snapshot rotations. You see, the rot variable specifies rsync commands for rotations. Large stuff that can be replaced is ignored. Code:
rot="--delete-during --exclude=/Music/* --exclude=/Videos/* --exclude=/Downloads/* --exclude=/dat" Code:
meth=( |
Whether or not it's overkill depends on how important the files are to you. If they are very important, the farther back you can reach to find a file which isn't corrupt, the better for you.
Which brings up a question I have. How do you verify the files are not corrupt when you make the backup, or when you try to restore from a previous backup? Do you just assume the files are not corrupt? Do you just assume the files in the backups are not corrutp? Do you do any kind of checksum comparison between a backup and and files on disk to verify they are good? Files change over time. Some change because of edits, such as documents. Some change because of system updates. Some change because of file corruption. How do you protect yourself from unwanted changes (corruption)? It wouldn't do much good to have multiple backups if file corruption is carried forward through each rotation. |
Quote:
|
Quote:
Three things got me off the fence: - The Ruby community seems to have picked Git, and are moving in numbers. - Watching the video of Linus' talk at Google about Git. He is very articulate and absolutely passionate about the issues, particularly data integrity. - I tried it and was amazed: the interface of the current version is fine, the docs are good (!), and it is *very* fast. Like apt-get, it can be so quick that you don't quite believe that it did what you asked. |
Anyone that can talk for 70 minutes about there software on stage deserves to have it tried out.
|
Any strategy for backing up data is not overkill. Most people I talk to don't backup at all. They don't even save Word documents until they've finished it.
Quote:
|
Quote:
My laptop, and two usb drives have only the most up to date copy of my data, so there is 3. The deskop and external raid device each have a copy of the most uptodate as well as 5 old snapshots. That's 15 copies of my data over 4 devices, 21 over 5 if you count each disk in the raid enclosure as a separate device. Then in addition to that each device has a trash folder, when backups are made to the most uptodate folder, deletions are placed in the trash with a timestamp appended to the file name. My real question was whether people thought that this trash system was redundant when combined with the rotations system. My backups are done by comparing file sizes and times. Error checking is run on my laptop once every 27 boots (Ubuntu default). I'm not sure about the desktop, and really never on either usb key. But since I'm assuming that corruption will not effect file size or times, that corruption on my laptop will not be pushed to the backups. Therefore the fact that I'm not backing up with a checksum comparison is actually providing a layer of protection. Is this a fair assumption? |
Quote:
I still don't think it's overkill. I was always taught that you can't have too many copies of a backup. That becomes even more true if the data is valuable and/or irreplacable. As you mention in your original post, keeping the older versions of files in TRASH while the others get updated gives you previous versions to fall back on if necessary. So, no I don't think it's redundant. Although at some point you may want to put an age limit on the files in TRASH. But that's entirely up to you. |
Quote:
Code:
find $trash/* -amin +$trash_ret -cmin +$trash_ret -type f -delete -print |
All times are GMT -5. The time now is 09:30 PM. |