Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I'm not rich, but my data and my time is important to me. I'm a home user with a bit of disposable income, but I'm certainly not an enterprise business user with megabucks to spend. Even so, I'm willing to spend a significant chunk of my home computing budget, whatever it takes, on this.
I have had hard drive failures before, and I've lost data permanently - I don't ever want to lose data again. I want to have the ability to recover from any data loss, but most especially from hard drive failures. All the means of backing up data that I know of are either fragile, slow, and unreliable (tape), or inadequate and slow for large amounts of data (DVD), or are slow and require too much dedicated down time (dd command to copy whole offline drives).
Discounting my laptops, which I don't require backups for, I have three desktop systems with about 1TB of total combined disk right now, with plans to expand by about 1TB more in the near future as I add a family home web server host that I will maintain for my rather large extended family.
My desktops run Linux, FreeBSD, and Windows XP. I have a Win2K and an OS/2 Warp 3 system I bring up occasionally, but I'm not too worried about backing up those partitions. Ideally, I'd like an OS and hardware architecture-independent solution, something that handles any kind of data from any type of partition from any OS.
I don't really know how to do this, but I'm hoping some of you can offer me some advice.
Is RAID 5 the answer? Would I have to implement it separately on each system, or is there a network backup solution that would let me put several terabytes of backup disk on a big honker data backup system running RAID 5?
Or is there another simpler solution to ensure recoverability of 2TB of data spread among three desktop systems? I don't want to spend all my spare time running backup programs and keeping track of media, and I can't afford to hire someone to do this for me. Automation and unattended backup is my goal.
I'm sick of loosing data and I have too much to backup by conventional home-user means. Can you help me?
My situation was that a 160GB unmirrored and unbacked up IDE drive housing my Linux OS and a lot of important data failed, and I lost it all. Yes, yes, shame on me. Spilt milk, and I'm not going to cry anymore. Over the last couple of weeks I was able to recover bits and pieces only, mostly stuff I didn't care about, using dd and dd_rescue. The disk died a rather hard and horrible death, I'm afraid.
I'm not going to let this happen to me again.
Here is my plan, thus far, for safeguarding my data from loss due to drive failure or other localized catastrophe. Call it phase 1 of my disaster recovery plan. I don't have any provision yet in my plan for safe offsite storage of backed up data, but I'll address that in phase 2.
My end state for phase 1 is to have each of my systems running fully on RAID 1 mirrored local disks, and also have all important data directories 100% backed up on a RAID 5 NAS system. If I can afford to, I'll backup the operating systems to the RAID 5 NAS also, as it is inconvenient to rebuild a system from scratch. But data is more important, and that will be the first priority.
I have a Linux system (my main system, Slackware 10.2), two WinXP desktops, two WinXP laptops (to be converted to Linux when I get brave enough), and a new FreeBSD system I'm just starting to play with.
I'll start with Linux first, then I'll attack the others. Since RAID support for SATA drives is not well-standardized in general, Linux drivers are not available for most of the various SATA RAID chipsets, and support from Linux installation CDs for a SATA RAID 1 install is non-existent, I've decided on this approach:
Install Slackware 10.2 test26.s kernel to an unmirrored IDE drive, upgrade to latest 2.6 kernel via compile to include SATA support, add EVMS, add two 300GB SATA drives, setup RAID 1 mirroring using EVMS on the drives (this is a fakeraid style solution, and my research leads me to believe it is likely easier to implement under Linux than a native SATA RAID chipset driver implementation). Once I get RAID 1 mirroring working on the pair of SATA drives, I'll migrate the system off of the IDE drive and onto the mirrored drives. The IDE drive can then be used for /tmp and similar transient stuff.
At this point, I'll be running fully on RAID 1 mirrored disks, with only volatile stuff on unmirrored IDE disk. Next step is to connect a NAS unit that supports RAID 5. I've chosen to use the Infrant ReadyNAS NV 1.0TB system that gives about 750MB of RAID 5 storage spread across four 250GB SATA drives. I'll then backup only important data to the NAS unit. With the RAID 1 mirrored local disk setup, I don't think another layer of redundant backup for the OS itself is essential.
I expect it'll take me a while to get all this working. I don't have any experience with RAID or SATA or NAS, but I'm going to figure all this out and get it going. Once I get my Linux system squared away, I'll proceed down a similar path on my WinXP systems, and the FreeBSD system if I ever really start using it. Who knows, maybe I'll even use the NAS to backup my laptops.
How does this sound? I'm sure others have been down this path before, especially enterprise system maintainers. Does anyone have any helpful ideas, criticisms, cautions, encouragement for me? Think this is overkill for a home LAN? Well, how important is your data, and how diligent are you in backing it up?
Yes, I had document files, jpgs, mpgs, avi files from captured family 8mm films (fortunately I have the original Kodachrome films), html documents for a personal web site I was in the process of buiding, c and java source code, years of tax returns, scanned receipts, address books and check book registers, years of email archives, ... you name it. All the kinds of things people tend to collect over the years that they might not maintain paper copies of.
I had some of it backed up separately, like the income tax information. I made a couple of DVD backups of my digital photo and scanned film libraries about six months ago, so I actually didn't lose much of that. I used dd_rescue to copy the bad drive to another drive, then used reiserfsck --rebuild-tree on the copy, and almost everything wound up in numbered file names under the lost+found directory. Thousands of files and directories, but it's really hard to tell what it all is.
Look at what *needs* to be backed up.
For me, just "user data" - all the O/S's come on CD/DVD/download.
Separate based on that basis - have all your data on separate partitions from your O/S(s). Non-changing data (e.g. JPEGs) don't need continual backing up; every 6 months or so maybe.
Decide on a cycle for full backup (say once a month or so), and on other days just do differential backups.
For big backups maybe get a network drive.
Maybe RAID 1 could be a sollution, so you have always a copy from the other disk.
This is not true. RAID is not a backup solution. RAID might save your butt for a disk failure, but it won't help you one bit for a human failure. Just try running "rm -rf *" on a RAID system and then try to recover your data "from the other disk". Won't happen.
I backup things to a second hard drive that contains nothing but backups for several computers on the LAN (three computers - one W2k, one WinXP, and one Debian). The Debian box is what does the backups. The backup drive is physically on the Debian box and is not normally mounted, except when actually running a backup (mounted RW, for root only) or restoring data (mounted RO). The Debian box has an APC UPS backup with automated shutdown software. Additionally, the Windows boxes cross-backup to each other (not involving Debian), although I really don't consider this a big part of my backup strategy. But it might save me partially if my Debian box were to catch on fire, nuke both it's harddrives, and spread the flames to my backup DVD's stored in the same room!
Backups of "system stuff" are tarred and gzipped, backups of "user stuff" are rsynced and multiple snapshot copies are kept, allowing me to go back in time and restore a previous version of a file that has been modified. Additionally, Debain's root and /boot filesystems are dd'ed along with the MBR's of all computers. Then backups are burned to DVD's manually (whenever I get around to it - this is my weak point!) Things that are really irreplaceable (like family picture JPG's) are burned to CD's (I trust these more than DVD's - better error correction) and stored in a bank safe deposit box. My plan is to re-burn these precious photo cd's every few years to help guard against CD degradation (keeping the older burns as well), but I haven't actually done this yet ... oops!
I keep multiple LiveCD distros laying around for emergencies (Knoppix, Kanotix, and Slax). I have booted each of my computers using these LiveCDs to prove to myself that they are compatible, and I have practiced accessing all partitions on all computers (some EXT3, some XFS, LVM2 in places, NTFS, FAT32, etc.) I've also verified LiveCD access to my network (and the Internet) and USB flashdrives from each computer.
One point to remember: You are more likely to require file recovery because of some mistake a human makes, than because of a failed disk. RAID certainly has it's uses, but it is not a backup solution.
P.S. - your backup plan sounds quite ambitious, and sophisticated. Don't forget to put you electronics on UPS power backup, and test your setup for clean shutdowns on power failure. Make sure you cover all the bases of "common failure points". A lightning strike could fry all the electronics in you house. A house fire would be devastating. You typically cover this common failure point with off-site storage of backups. Depending on the sensitivity of what you're backing up, consider encrypting it. I use TrueCrypt, and the executables and source code are included on each backup DVD (when I encrypt). I chose TrueCrypt because (1) it works on Linux, (2) it works on Windows, and (3) it's open source and free, so I have the source code. Oh, and one more thing ... just because you manage to back things up and burn a tar.gz file to a DVD, don't go patting yourself on the back quite yet. Sample those DVDs on occassion and make sure they are readable on a variety of different systems with different brand DVD readers. And do some test untarring and ungzipping to make sure things are as you expect.
A very good post, haertig. Thanks. You mentioned several things I hadn't thought of, like actually testing the backups on different hardware to make sure the media is readable and can be ungzipped or untarred.
I have two UPS's that my desktop systems are connected to. I'll be sure to plug in the NAS device to the UPS as well, AND test for a clean automated shutdown of all systems on power failure.
I just downloaded TrueCrypt and set up an encrypted volume for sensitive data on my WinXP system, as it seems like an excellent idea for added identity-theft protection. I was thinking it would be complicated, but it was really quite easy to set up. I now plan to do this on all systems from now on.
And, I think you identified the weak spot in most people's backup and recovery strategies
whenever I get around to it - this is my weak point!
but I haven't actually done this yet ... oops!
. The human factor has got to be the hardest part of any backup/recovery strategy.
Here are some (more or less complicated/feature/automated)
*faubackup (I use this one for my desktop computer, very very simple, no dependencies, have to integrate in cron)
*bacula integrates a verification/recovery client. can store catalog in pgsql and others
*dvbackup (firewire linked to your dvcam)
Thanks, syg00 and nx5000. Bacula is impressive, sophisticated, and it seems quite comprehensive. The documentation is among the best I've ever seen for open source software. This might be an ideal backup solution if you have a high-capacity enterprise-quality autoloading tape drive to backup to.
Here is another interesting piece of software - a FreeBSD-based Network-Attached Storage server called FreeNAS. It could be used to build your own NAS system similar to the Infrant ReadyNAS or the Buffalo TeraStation NAS.