LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Need advice on making backups of a large file storage. (https://www.linuxquestions.org/questions/linux-software-2/need-advice-on-making-backups-of-a-large-file-storage-4175556699/)

Lockywolf 10-20-2015 01:10 PM

Need advice on making backups of a large file storage.
 
Hello, everyone.

I need an advice on the following matter:

I have a filestorage (a hard drive partition), which holds files of various degrees of importance:
  • Important
  • Not so important
  • Rubbish

Certain actions might happen to this storage:
  1. New files uploaded
  2. Directory structure change
  3. Some rubbish removed
  4. Some useful files deleted by mistake
  5. Some files become corrupt due to block errors
  6. Filesystem errors happen

I also have a separate hard drive of 25% larger quantity, on which I want to organize a backup of the first one, satisfying the following property:

Actions 1,2,3 should propagate to the backup. Action 4 should be undoable by restoring files from the backup during some time. (Desirably, as long as possible (until there is space left), but at least one month long). Actions 5 and 6 should be detected and reported to the operator when each backup "task" is performed (perhaps, once every two days) and must in no circumstances propagate to the backup. They should also be undoable from the backup.

Is there any smart backup solution for this task?

I googled for some proposed backup solutions, but most of them either are based on rsync (which syncs the tree, but AFAIU has nothing to prevent point 4, or snapshots, which are just too big.

Any suggestions?

chrism01 10-20-2015 07:12 PM

1,2,3: catered for by any backup system

4a: going back and retrieving old backups - again any backup system inc rsync

4b: if you want to prevent (!) files being deleted you can

4b1: train your users
4b2: adjust ownerships/perms to reduce likelihood
4b3: use chattr http://linux.die.net/man/1/chattr
4b4: if you have a master list of important files, use http://linux.die.net/man/1/inotifywait to be notified when it happens (or run a cron job that effectively does the same thing - use sha1sum or similar to check content if file exists)

5 (& 6): if your disk is regularly throwing errors, replace it. Seriously, its dying. You could look at http://linux.die.net/man/8/smartctl and recreate with mkfs, but its not worth it.

berndbausch 10-20-2015 09:23 PM

On one hand you say that deleted files can be restored from the backup. On the other, that rsync is unable to prevent deletion. This is true, but I doubt there is any backup solution that prevents deletion; the purpose of a backup solution is to allow restoring files. And rsync can do that.

The fact that your backup drive is just 25 larger than the main drive is a bit worrying though, except if you keep deleting old backups or a large part of your data belongs in the rubbish category and will be deleted occasionally.

I see another problem. Points 3 and 4 seem to conflict with each other. How can the backup solution know that a file is rubbish, and how can it judge whether deletion was by mistake or by design?

Lockywolf 10-21-2015 01:12 PM

Quote:

Originally Posted by berndbausch (Post 5437775)
On one hand you say that deleted files can be restored from the backup. On the other, that rsync is unable to prevent deletion. This is true, but I doubt there is any backup solution that prevents deletion; the purpose of a backup solution is to allow restoring files. And rsync can do that.

The fact that your backup drive is just 25 larger than the main drive is a bit worrying though, except if you keep deleting old backups or a large part of your data belongs in the rubbish category and will be deleted occasionally.

I see another problem. Points 3 and 4 seem to conflict with each other. How can the backup solution know that a file is rubbish, and how can it judge whether deletion was by mistake or by design?

Points 3 and 4 do not conflict! That's exactly why I need a smart backup solution.

The only way to detect if a file is NOT rubbish is if I start missing it after a week of absence. That's why all the deleted files (from the master storage) should be only "marked" as deleted in the backup! The actual data should be purged from the disk if and only if: a)disk space is needed to backup legitimate files b)a month has passed since the file was deleted from the main storage. If a month has not passed and space is already needed, the backup process should pause and send me a notification.

Quote:

5 (& 6): if your disk is regularly throwing errors, replace it
Of course, it doesn't. Otherwise I would have already replaced it. Of course I have smartd running. But that's all palliative measures. The backup process will be a regular event accessing the drive. It's an obvious point to set up an error detector. And number 6 is just unavoidable when you access ext4 from windows. All drivers are terrible and create bogus files. These are easily fixed by fsck, but I want to be sure that they do not propagate to the backup.

Lockywolf 11-02-2015 10:43 AM

Bump. So nobody has any suggestions?

Smokey_justme 11-02-2015 10:55 AM

Yes, I have one: btrfs...


All times are GMT -5. The time now is 08:37 AM.