LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-27-2009, 04:41 AM   #1
Cyberman
Member
 
Registered: Aug 2005
Distribution: Debian Stable
Posts: 218

Rep: Reputation: 17
a better backup method?


I didn't know if I should put this in server or programming, so I decided to put it here. But I'd like to say it would be ideal if things could be in bash.

Now, I have this idea. It came from being highly annoyed with backup methods. I know there are some good ones, but I think they could be better. There are symbolic links and ways to point to objects.

One of my issues is that with an incremental backup: it backups the new locations, even if the file didn't change. So, it wastes space. Sure, the file might not have changed, but it's location did; thus, the backup program, such as sbackup, thinks that it's a new file and needs to be added to an incremental backup tar.

ex:

/home/blahblahblah/jackhandy.mpeg
was put into the full backup and the file was 2GB.

the next week it was moved to...
/home/blahblahblah/jackhandyfiles/jackhandy.mpeg

And yet typical backup methods put this inside of the incremental backup.
That's annoying and wasteful.

Anyone see a problem with that? I think there could be an improvement.

It would be ideal if the program would check the file's checksum/properties against files of the same name and simply link to the file in a previous full/incremental backup.

So, I created a general thought plot as to how backup methods could be improved and used. Tell me if any of you understand what I'm getting at and think backup methods should be like this in the future:

Code:
1) Full backup
2) Incremental backup
3) Restoration from last incremental backup
4) Restoration from any point

2) Incremental backup

Incremental backup attributes:
1. Logs all instances that have changed since the last full backup
a. Logs if file is no longer there.
b. Logs if file has moved.
ba. Checks if file that has moved is the same file.
baa. If the file that has moved is not the same file (checksum different), then it is copied into the incremental backup.
bab. This new file's checksum is logged.
bb. If the file that has moved is the same file (same checksum), only a symlink is created to its new place of destination.
bba. This symlink points to the same file's old location that exists in either an incremental backup or in the full backup.
bba*. This prevents the file from being copied again, which would increase storage requirements.

Ways to make the checksum process easier:
1. Log the checksum of files only over a certain size, such as 1MB or 100KB.

3) Restoration from last incremental backup

1. Directory tree is created according what the most recent version of the tree should look like.
2. Symlinks are turned into the actual file they link to.

Last edited by Cyberman; 03-27-2009 at 04:45 AM.
 
Old 03-27-2009, 10:32 AM   #2
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
No - I think backups are "point in time" snapshots so it makes perfect sense to me that a moved file should be backed up again because its "fully qualified path name" is completely different than it was before.

YOU my know where every single file on your system is at a given point but I doubt many others want to keep that detail in their head.

You could of course avoid this hassle by not moving your files around all the time.

By the way there is a thing in the world called "data deduplication" (See Data Domain for example). The idea in this is that it does keep track of bytes and hashes so only backs up what is truly unique data from backup to backup (and usually compresses as well). Using this technology you might get something like 80 to 1 compression on backups of sparse files. This kind of solution makes sense especially for multiple servers with the same OS as it will not copy all the same OS files for every one of them - it will only create pointers for the ones after the first one.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bast Backup Method? SlowCoder Linux - General 1 07-28-2007 09:08 PM
good backup method msound Linux - General 2 10-07-2005 11:58 AM
What is your backup method? ChillyWillie Mandriva 4 07-26-2005 02:20 PM
using tar as a backup method tarax Linux - Newbie 3 06-22-2004 04:25 PM
best backup method slightcrazed Linux - General 9 06-26-2003 03:23 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:04 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration