backing up? or versioning the file server 's data? a thought...

nass · 04-23-2009, 12:27 PM

hello everyone,
whats puzzling me is whether it is best to set up a backup utility in my file server ... or to set up a versioning utility instead, like git or svn...

the reason i am confused is because i think that a backup will blindly replicate your file operations to a storage unit.. as such if you delete a file today, and backup erases it from the storage as well, then you won't be able to retrieve your file again. if at a later date you realise you have accidentally deleted smth you wanted)...

so to this respect a versioning system seems more powerful...

on the other hand if it is not source code files that you want to version mainly, but you library of pics, docs, mp3s and movies and generally binary files ... then versioning seems like an overkill...

what are your thoughts...?

irishbitte · 04-23-2009, 02:01 PM

rsync is the best of both. You can configure it to sync one way only, so even if you delete a file on your computer, the backup will still have it.

archangel_617b · 04-23-2009, 03:14 PM

There's utilities like rsnapshot which basically manage that for your. rsnapshot is an rsync front-end, basically a script that handles creating incremental backups of your file system(s). There's rdiff as well, but I haven't used it. Or Amanda if you want a more heavy-weight solution.

- Arch

nass · 04-24-2009, 05:02 AM

curious,
if rsnapshot takes a snapshot of the data, then the very 1st snapshot will be an exact copy of the data , and as such, exactly the same size?
so you'll need double the HD size?

or is it that the backup makes a link to the data? so it doesn't take up this lot of extra space....?

choogendyk · 04-24-2009, 06:51 AM

If it just made a link, it certainly wouldn't be any kind of backup. And, if you put it on the same disk ("double the HD size"), then you have no protection at all against disk failure.

True backups (for the realistically paranoid) are going to have multiple copies over time on separate media with some stored in separate locations. The corporate tradition would be weekly full backups with nightly incremental backups all to separate tapes, with tapes running back a month, 6 weeks, or longer. Home users on a budget might typically get an external drive that is larger than their system drive. That's better than nothing, but still carries a lot of risk, since all the backups are on one disk.

You can also pay someone else to store your backups through the use of cloud backup -- http://www.readwriteweb.com/archives...e_services.php. There is some risk to that too. Apparently, both Yahoo and HP are shutting down their services -- http://www.backupcentral.com/index.p...=228&Itemid=47. But if you choose the bigger, more natural players in this field (e.g. Amazon S3), that would seem less likely. The open source backup application Amanda is one of the few that has built in support for backing up to Amazon S3. Using this method, you wouldn't want to be backing up, say, 300G. The bandwidth is much smaller, and you need to plan carefully.

nass · 04-24-2009, 10:31 AM

Quote:

If it just made a link, it certainly wouldn't be any kind of backup. And, if you put it on the same disk ("double the HD size"), then you have no protection at all against disk failure.

well thats true. but going one step further (really talking about the possible (?) technology involved) as soon as you take a snapshot , links get created... then as soon as a file gets modified or deleted (by any means) the link is destroyed the actual file (prior to modification/deletion) takes its place... then this sort of ghost backup would not require the 2x Size of the data-to-be-backed-up...

of course that would require much processing power being used so that whenever a deamon senses that a file is about to be
created/modified/deleted
it must take action prior to the operation and (respectively)
create a link in the backup folder /
break the link and save the last version of a file prior to modification in the backup folder/
break a link (if it exists) and save the last version of a file prior to its deletion in the back up...

as for the inevitable problem of a disk in failure, i guess a raid system would solve that...? perhaps storing the files in a raid and then a secondary raid on different physical disks this sort of ghost backup?

does this sound like a feasible solution and if it does... does it exist?

such a solution - if feasible - would not benefit big server farms or big companies in doing their backups... since eventually the data size of the backup would expand... and an initial investment would take that into account... thus having the required disks size for a normal backup from the start.
but for small office and home users, a backup solution where you wouldn't need to get double the targeted disk size... would perhaps look very appealing at least to your pocket..

what do you think?

archangel_617b · 04-24-2009, 12:33 PM

Quote:

Originally Posted by nass

what do you think?

Sounds like you're all over the place.

The first rule of Computer Club is: Backup your data.
The second rule of Computer Club is: Backup your bloody data!

If a drive fails or a user deletes the data, or a virus eats your system, or a sinkhole opens up and swallows your building, you need a backup. In every one of these cases.

RAID? Why worry? You don't even have a backup yet. The only thing RAID protects you against is a bad drive. What about viruses, user error, exploding power supplies, or any of the other million things that can go wrong with your system?

So yes, for a backup, you need to buy another hard drive. And no excuses about prohibitive costs for home users. Drives are about $50 each. If you've got a computer, you can afford to get a drive for backup purposes.

Now that we've agreed you should setup your backups, what else do you need? There's no such thing as a system so smart and pro-active as to keep copies of every single file change that happens. It's "too complicated" which translated into the real world means "error prone". Such a system can be as likely to destroy your data as protect it.

Most backup setups provide periodic snapshots. If you do rsnapshot across large numbers of files, you may find that the most frequently you can capture versions of files is every 4, 8, or 24 hours. If you spend six figures and get a NetApp, you can do it quite a bit more frequently than that.

But if you don't actually want backups (which we've agreed isn't the case) but just want to be able to restore files after a user mungs them, then just use some sort revision control system. Either CVS or Subversion, etc for source code, or a document management system like Clearspace, Sharepoint, etc for documents and potentially media.

But first thing is first, get a backup system setup. Then evaluate whether you want to spend the additional effort of checking all your files into some revision control system.

HTH!

- Arch

choogendyk · 04-25-2009, 09:37 AM

Archangel nailed it.

nass' made up term of "ghost backups" is just that, a ghost. First time you lose the system drive, or lose some important file, you would find that there is no backup. Only versioning.

Also, it has been said many times in these forums, RAID is not backup. It protects somewhat against disk failure, but you can still lose data quite easily. And, if you're talking about a home user, it would end up being as expensive or more so than a backup system.

On my enterprise systems, I use zfs raid pools with snapshots, and I still have nightly backups with Amanda to a tape library with a 6 week cycle and archives. For my home systems, I replicate data among systems with rsync and have a couple of external drives with duplicate backups. Some things are also (but not only) archived on CDs or DVDs. Some things are also in paper files and/or cms web archives (drupal) in the cloud. Reasonable paranoia dictates multiple backups of important things.

Repeating myself (and Archangel), just get (at least) an extra drive and do backups. Rsync with rsnapshot might do nicely -- http://www.rsnapshot.org/howto/. Or take a look at BackupPC -- http://www.zmanda.com/backuppc.html

If you are really interested in learning more about the topic of backups, get a copy of Curtis Preston's O'Reilly book "Backup and Recovery" -- http://oreilly.com/catalog/9780596102463/, and/or spend some time browsing through the companion website and links out from there -- http://www.backupcentral.com/.

irishbitte · 04-26-2009, 04:32 AM

Just a suggestion to OP, i reckon s/he has been reading up on something like this: http://en.wikipedia.org/wiki/Time_Ma...pple_software) but it has to be pointed out that this is an example of a complete backup with regular snapshots. The main advantage is it 'just works' the disadvantage is it's only one backup. But for home use, it may be exactly what is required. This is an example of something we should be trying to accomplish in the open source community, unfortunately it requires a different filesystem paradigm to the ones we currently have.

choogendyk · 04-26-2009, 11:15 AM

You are right that Time Machine is an example of a complete backup with regular snapshots. Similar sorts of things are, in fact, available for linux.

There have been recipes for using rsync to accomplish something like this for some time. The classic one was posted in 2004 -- http://www.mikerubel.org/computers/rsync_snapshots/, and here is another with specific reference to Time Machine -- http://blog.interlinked.org/tutorial...e_machine.html

rsnapshot with rsync accomplishes something similar -- http://www.rsnapshot.org/

There is TimeVault (listed as Time Machine for Linux) -- http://lifehacker.com/software/featu...nux-275399.php

There is flyback -- http://code.google.com/p/flyback/, or http://bernaz.wordpress.com/2008/01/...ity-for-linux/

Or how about Dirvish -- http://www.dirvish.org/

All of that was within the first page of a google search for "time machine for linux".

Taking a completely different approach, zfs can be set up for linux -- http://www.linuxworld.com/news/2007/...-on-linux.html. ZFS is "native" to Solaris, and I use it for file system snapshots on my Solaris 10 systems.