LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 09-19-2013, 06:59 PM   #1
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD
Posts: 2,032

Rep: Reputation: 309Reputation: 309Reputation: 309Reputation: 309
Question: Incremental backups of large files that change frequently


I have done rsync "snapshot" incremental backups on my own systems for years. Here's a good explanation of what I mean by this: http://www.mikerubel.org/computers/rsync_snapshots/

I wish to implement this backup strategy on others computers. If not this exact strategy, something functionally equivalent. I have run into a potential problem though. These other users are not so computer savvy, and they tend to have really huge email stores. For example, in Thunderbird, they have "Inbox", "Sent", and "Trash" files that combine to be over 2Gb. Now, only a very small portion of those huge files changes daily, but they do change daily, and with standard backup strategies being file-based you end up with these huge files being backed up in their entirety day after day, consuming much disk space.

Other than training users to keep cleaner/smaller mail stores, is there a backup strategy/solution to handle this dilemma? Other than email, the rest of the users data is quite static, and backups are easy for that other data. But for these users, pretty much computer=email, so that's what needs to be backed up.

I was toying with the idea of using some kind of source code control system on their huge email files, since things like rcs and sccs store only the file changes in their revisions (for text files). I'd have to find some way to break out the original file stored in the source code system from its revisions so that these things would be stored in seperate files. I'd then have to have cron "check in, then check out" their huge email stores into source code control before a backup. And have the backup script exclude the original huge email store files and backup their source code control equivalent files instead. Utilizing source code control is just a way of me being lazy and not writing my own "diff" scripting strategy.

I figure somebody has probably implemented some function like this and that would save me the time of re-inventing the wheel. Does anybody know of such a thing?

Alternately, I could give up on trying to parse their large email stores, just back them up in their grandiose magnitude each day, but limit each user to XXX amount of backup storage space. For users with clean/small email stores, they may get months and months of incremental backups. For the "dirty" users, they may only get a few days.

Any ideas on how to implement this, or any alternate/better strategies? Thanks in advance.

Last edited by haertig; 09-19-2013 at 07:00 PM.
 
Old 09-19-2013, 07:19 PM   #2
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,619

Rep: Reputation: 676Reputation: 676Reputation: 676Reputation: 676Reputation: 676Reputation: 676
There is a program called rdiff-backup that would be ideally suited for this. The history of the file is stored as a series of reverse diffs from the current state. The program (a set of Python scripts, actually) has unfortunately not seen any active development since 2009, but it works quite well. The outstanding bugs mainly have to do with keeping the archive on a Windows filesystem, and with breaking (on restoral) sets of hard linked files into 2 or more separate groups if names have been added over time to the set. About the only operational shortcoming is that there is no way to go into, say, an archive that has been updated daily for 2 years and keep only the monthly increments for the older dates.

(I've heard that close examination of the Python code has been associated with increased risk of cancer of the eyeballs in the state of California.)

Last edited by rknichols; 09-19-2013 at 07:23 PM. Reason: Add caution about examining the source
 
Old 09-19-2013, 10:00 PM   #3
Berhanie
Senior Member
 
Registered: Dec 2003
Location: phnom penh
Distribution: Fedora
Posts: 1,625

Rep: Reputation: 165Reputation: 165
Having a central IMAP mailstore, where each mail corresponds to a file (e.g. maildir) makes easier not only to back up the mail, but also to restore it. The users can continue to use Thunderbird, Outlook, etc, and freely switch among them.
 
Old 09-19-2013, 10:13 PM   #4
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD
Posts: 2,032

Original Poster
Rep: Reputation: 309Reputation: 309Reputation: 309Reputation: 309
Quote:
Originally Posted by Berhanie View Post
Having a central IMAP mailstore...
That would be nice, but I am not talking about a corporate setup here. These users are "my father-in-law", "my sister", "my parents", etc. Just personal PC's (I have installed Linux for them, to make my job of "free remote support" easier!) Their email providers are their individual ISP's, and are all POP. The thought is good though ... thanks!
 
Old 09-20-2013, 11:06 AM   #5
Habitual
Senior Member
 
Registered: Jan 2011
Distribution: Undecided
Posts: 3,624
Blog Entries: 1

Rep: Reputation: Disabled
+1 for IMAP and backups.
 
Old 09-20-2013, 11:48 AM   #6
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD
Posts: 2,032

Original Poster
Rep: Reputation: 309Reputation: 309Reputation: 309Reputation: 309
Quote:
Originally Posted by rknichols View Post
There is a program called rdiff-backup that would be ideally suited for this.
I was not aware of this program. I did a brief look into it and as you say, it may be just what I'm looking for. I will have to research it in more depth and do some testing with it when I get a little more free time. Thanks for the pointer!
 
Old 09-21-2013, 03:25 PM   #7
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD
Posts: 2,032

Original Poster
Rep: Reputation: 309Reputation: 309Reputation: 309Reputation: 309
I am wondering if there is an email client that uses POP3/SMTP to interface with the ISP mail server, and that client stores the emails locally as individual files.

e.g., Thunderbird stores multiple emails in large files, - "Inbox", "Sent", etc. Is there a client that instead of having, say, an "Inbox" file, has an Inbox directory, which in turn contains individual files for the various emails? A structure like that would make local backups trivial.
 
Old 09-21-2013, 06:35 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,497

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
I thought rsync was designed for this very thing - that's why it uses delta-transfer if the file exists at the destination.

For an improvement on the system the OP uses, see rsnapshot - it was spawned from the very same article.
 
Old 09-21-2013, 09:27 PM   #9
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,619

Rep: Reputation: 676Reputation: 676Reputation: 676Reputation: 676Reputation: 676Reputation: 676
rsync addresses the problem of transmission, but neither rsync nor rsnapshot solve the problem of storage when you have very large files that each change just a little each day and you need to be able to recover the state they were in on some arbitrary date in the past.

Here's a link to a pretty good comparison between rsnapshot and rdiff-backup.

Last edited by rknichols; 09-21-2013 at 09:32 PM. Reason: Add link to comparison article
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
using rsync for incremental backups cccc Linux - Server 5 01-29-2010 07:02 AM
Incremental backups? arashi256 Linux - Newbie 9 07-06-2009 09:46 PM
Verifying Rsync Backups of Large Volumes of Files mcgirvanmedia Linux - Server 2 06-04-2008 12:30 AM
Which directory contains files that normally change their size frequently? Simon Adebisi Linux - Software 4 06-28-2005 04:37 PM
Rsync for incremental backups? Phaethar Linux - Software 3 12-04-2003 02:27 PM


All times are GMT -5. The time now is 07:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration