LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 10-20-2015, 01:10 PM   #1
Lockywolf
Member
 
Registered: Jul 2007
Posts: 626

Rep: Reputation: 212Reputation: 212Reputation: 212
Need advice on making backups of a large file storage.


Hello, everyone.

I need an advice on the following matter:

I have a filestorage (a hard drive partition), which holds files of various degrees of importance:
  • Important
  • Not so important
  • Rubbish

Certain actions might happen to this storage:
  1. New files uploaded
  2. Directory structure change
  3. Some rubbish removed
  4. Some useful files deleted by mistake
  5. Some files become corrupt due to block errors
  6. Filesystem errors happen

I also have a separate hard drive of 25% larger quantity, on which I want to organize a backup of the first one, satisfying the following property:

Actions 1,2,3 should propagate to the backup. Action 4 should be undoable by restoring files from the backup during some time. (Desirably, as long as possible (until there is space left), but at least one month long). Actions 5 and 6 should be detected and reported to the operator when each backup "task" is performed (perhaps, once every two days) and must in no circumstances propagate to the backup. They should also be undoable from the backup.

Is there any smart backup solution for this task?

I googled for some proposed backup solutions, but most of them either are based on rsync (which syncs the tree, but AFAIU has nothing to prevent point 4, or snapshots, which are just too big.

Any suggestions?

Last edited by Lockywolf; 10-20-2015 at 01:12 PM.
 
Old 10-20-2015, 07:12 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,237

Rep: Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712
1,2,3: catered for by any backup system

4a: going back and retrieving old backups - again any backup system inc rsync

4b: if you want to prevent (!) files being deleted you can

4b1: train your users
4b2: adjust ownerships/perms to reduce likelihood
4b3: use chattr http://linux.die.net/man/1/chattr
4b4: if you have a master list of important files, use http://linux.die.net/man/1/inotifywait to be notified when it happens (or run a cron job that effectively does the same thing - use sha1sum or similar to check content if file exists)

5 (& 6): if your disk is regularly throwing errors, replace it. Seriously, its dying. You could look at http://linux.die.net/man/8/smartctl and recreate with mkfs, but its not worth it.
 
Old 10-20-2015, 09:23 PM   #3
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
On one hand you say that deleted files can be restored from the backup. On the other, that rsync is unable to prevent deletion. This is true, but I doubt there is any backup solution that prevents deletion; the purpose of a backup solution is to allow restoring files. And rsync can do that.

The fact that your backup drive is just 25 larger than the main drive is a bit worrying though, except if you keep deleting old backups or a large part of your data belongs in the rubbish category and will be deleted occasionally.

I see another problem. Points 3 and 4 seem to conflict with each other. How can the backup solution know that a file is rubbish, and how can it judge whether deletion was by mistake or by design?
 
Old 10-21-2015, 01:12 PM   #4
Lockywolf
Member
 
Registered: Jul 2007
Posts: 626

Original Poster
Rep: Reputation: 212Reputation: 212Reputation: 212
Quote:
Originally Posted by berndbausch View Post
On one hand you say that deleted files can be restored from the backup. On the other, that rsync is unable to prevent deletion. This is true, but I doubt there is any backup solution that prevents deletion; the purpose of a backup solution is to allow restoring files. And rsync can do that.

The fact that your backup drive is just 25 larger than the main drive is a bit worrying though, except if you keep deleting old backups or a large part of your data belongs in the rubbish category and will be deleted occasionally.

I see another problem. Points 3 and 4 seem to conflict with each other. How can the backup solution know that a file is rubbish, and how can it judge whether deletion was by mistake or by design?
Points 3 and 4 do not conflict! That's exactly why I need a smart backup solution.

The only way to detect if a file is NOT rubbish is if I start missing it after a week of absence. That's why all the deleted files (from the master storage) should be only "marked" as deleted in the backup! The actual data should be purged from the disk if and only if: a)disk space is needed to backup legitimate files b)a month has passed since the file was deleted from the main storage. If a month has not passed and space is already needed, the backup process should pause and send me a notification.

Quote:
5 (& 6): if your disk is regularly throwing errors, replace it
Of course, it doesn't. Otherwise I would have already replaced it. Of course I have smartd running. But that's all palliative measures. The backup process will be a regular event accessing the drive. It's an obvious point to set up an error detector. And number 6 is just unavoidable when you access ext4 from windows. All drivers are terrible and create bogus files. These are easily fixed by fsck, but I want to be sure that they do not propagate to the backup.
 
Old 11-02-2015, 10:43 AM   #5
Lockywolf
Member
 
Registered: Jul 2007
Posts: 626

Original Poster
Rep: Reputation: 212Reputation: 212Reputation: 212
Bump. So nobody has any suggestions?
 
Old 11-02-2015, 10:55 AM   #6
Smokey_justme
Member
 
Registered: Oct 2009
Distribution: Slackware
Posts: 534

Rep: Reputation: 203Reputation: 203Reputation: 203
Yes, I have one: btrfs...

Last edited by Smokey_justme; 11-02-2015 at 10:56 AM. Reason: edited the link to go directly to Incremental Backup part of the wiki
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Suggestions for large data backups lpallard Linux - Server 5 09-15-2013 10:17 AM
[SOLVED] Sector editing - making small changes to large files without entire file rewrite. BitBuster Linux - Software 8 02-12-2011 05:36 AM
File search in large Storage environment(SAN) silica Linux - Server 3 03-17-2009 02:39 PM
making the file containing large data empty chetanpatel Linux - Newbie 4 03-16-2009 12:15 PM
Large backups or Tar doesn't work jeanpba Linux - Hardware 1 10-04-2002 10:04 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:47 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration