Why are backing-up programs so often poorly-written?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: Solaris 9 & 10, Mac OS X, Ubuntu Server
Posts: 1,197
Rep:
OK, so part of the problem with this thread is that you haven't defined what you really want to do until your most recent message. You were referring to "backup" software, but what you really seem to want is "imaging" software. They are really very very different. For backup software, I would be thinking Amanda, Bacula, BackupPC, and so on. Those clearly aren't intended for what you are finally asking for. Imaging software would be like Acronis, PartImage, Disk Image, Clonezilla, and so on.
Another point is old (or different) hardware. When talking about speed and difficulty with backup software, those have to be taken into account. The same applies to imaging software. When you recover an image, it has to be compatible with the hardware you recover it to. On my Sun systems, a slight difference may mean that I have to rebuild the /dev directory, do a reconfigure reboot, and so on. If I didn't know that, I'd just be complaining that it didn't work.
On the lab systems we manage at work, we use RAdminD. We have to arrange separate transcripts and images for the different hardware and software configurations that exist in the labs. It works beautifully, but it takes a fair bit of effort to set it up.
If you want to comprehend the difficulties, dive in and try writing your own imaging software.
Distribution: Solaris 9 & 10, Mac OS X, Ubuntu Server
Posts: 1,197
Rep:
Quote:
Originally Posted by Completely Clueless
I don't recall saying that!
nor do I.
Quote:
Originally Posted by Completely Clueless
However, I have tried out a new backup program from Terabye which somehow locks a running system so you can carry on working whist the backup is made. It works, too!
Wait a minute. I thought all backup (actually imaging) software sucked!
aah, so you had a solution, but just wanted to do some complaining?
Sorry, I didn't realise bitching about crap software was against LQ policy.
Quote:
Since you weren't actually looking for a solution, have you picked up anything worthwhile from this thread?
Indeed, yes. That's why I left a "thank you" for TexMex after one of his manifold contributions. I'm also experimenting with "tar" which seems to be quite promising, plus the latest bug-fixed version of Clonezilla has recently been released. I think I have enough viable non-proprietory solutions available now.
Seriously, it seems like a simple problem but think about how hard it is.
You bring up some very good points. It's even more complex if you need automation and a remote connection. Here's an example of something I'm trying to figure out myself (just for illustrative purposes; not trying to take over the thread with my problems.)
I've been tasked to create a daily cron job to back up all of the data on a Linux server. It has >300GB of data, >300k files, and backup must be sent to a smbmounted location. The obvious problem here is that it takes >20 hours to send that much data across the network. Not only that, but the backup device is only 1TB. Therefore: 1) daily is out of the question, 2) multiple, full backups will be very limited, 3) 99.99% of the data won't need to be backed up every single time.
I have a script that mounts the share after checking if it's already mounted and if a backup is already in progress, then tars everything to the device because it wouldn't retain Unix attributes otherwise. Because it takes so long, absolutely nothing of relevance can be done with the machine while it's running. That not only excludes running it during the week, but also running e.g. intensive data analyses across a weekend.
So here I'm left with a few choices:
Lock out the data directories while I run the backup. This isn't very feasible, mostly because the data is on the root partition. Plus, does one init 1 to kick everyone off the machine at backup time?
Compare each file mod time with the mod time in a "base" backup archive and create a daily archive of those that change, then merge the daily partial archives into the base archive as they get older (this can be done in the background.) Probably the most feasible, but still has problems.
Run the backup once a month for the entire set of data and keep only the most recent 3.
Put the RAID in a fire-proof media safe next to the machine.
Put a subversion repo on the backup device. But who will svn add the new files, etc.? Not only that, but how would it deal with a repo with an initial size of 300GB?
There's nothing simple about this situation, yet it's probably very common with servers that host a lot of working data.
Anyway, just something to think about.
Kevin Barry
aah, so you had a solution, but just wanted to do some complaining?
Since you weren't actually looking for a solution, have you picked up anything worthwhile from this thread?
There is some merit to solutionless complaining. For example, if enough people realize they aren't the only ones unhappy with backup solutions, some of them might do something about it. Unfortunately silence is often implicit agreement.
Kevin Barry
Silence is implicit agreement in the mind of a fool who wants to fool himself.
Very profound, although you misunderstand what I meant. I'm talking about the psychological phenomenon where members of a group appear to be acting as such. For example, say a group of guys is walking down the street and one starts beating an old lady while the others stand around. Someone else walking by will probably think that a group of guys is beating up an old lady rather than one guy beating her up with several individuals standing around who might oppose what's going on. One might think, "if they were opposed to it they'd speak up." It isn't ignorance to see it as a group action; it's a natural tendency. If of the group of people using backup software no one complains except someone with a solution, there will be the inaccurate appearance that everyone is happy with their software except the few that speak up. Much like not everyone is capable of stopping someone from beating an old lady, not everyone is capable of rewriting backup software, although they might be opposed to the status quo. This is just an illustration; backing up data isn't beating old ladies. I strongly believe that analogy can never be evidence; only clarification.
Kevin Barry
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195
Rep:
Quote:
Originally Posted by ta0kira
(just for illustrative purposes; not trying to take over the thread with my problems.)
But you are not against reading some tips anyway?
Quote:
Originally Posted by ta0kira
It has >300GB of data, >300k files, and backup must be sent to a smbmounted location.
Been there, done that although I did it from an smbmount. Take a look into the star program. It is upward compatible with good ol' tar. I have used it for years to make incremental backups. It is good at it.
Quote:
Originally Posted by ta0kira
Because it takes so long, absolutely nothing of relevance can be done with the machine while it's running. That not only excludes running it during the week, but also running e.g. intensive data analyses across a weekend.
What is your main problem, network traffic or processor load? For the first the solution could be to install a second network adapter and route all traffic to your backup machine over that NIC. If it is processor load, can you nice it? When no-one is working on the machine it won't affect your backup speed.
What is your main problem, network traffic or processor load? For the first the solution could be to install a second network adapter and route all traffic to your backup machine over that NIC. If it is processor load, can you nice it? When no-one is working on the machine it won't affect your backup speed.
Thanks for the suggestions. The backup process is regulated by the network and protocol speeds to the point where it doesn't really effect processor or memory usage (2.4GHz Xeon quad, 4GB RAM.) The reason not much can be done is all users of the machine, other than me, use nothing but MATLAB and highly-specialized scientific software, all of which execute from and on the data being backed up. There isn't any predictability as to what exactly will be used during a given data process; the only certainty is that if someone besides me has something running then it's altering the data being backed up.
Kevin Barry
...The latest bug-fixed version of Clonezilla has recently been released...
And it transpires it's actually a 'disimprovement' over the versions released earlier this year! Please forgive me quoting myself, but it HAS to be said. I tried it out a few hours ago and it really IS a bloody awful release. I'm going back to the version I downloaded on the 31st of Jan, which unlike the current release, does actually work rather well (if you uncheck its ludicrous default settings and re-jig its equally stoopid default priorities).
I'm talking about the psychological phenomenon where members of a group appear to be acting as such. For example, say a group of guys is walking down the street and one starts beating an old lady while the others stand around. Someone else walking by will probably think that a group of guys is beating up an old lady rather than one guy beating her up with several individuals standing around who might oppose what's going on. One might think, "if they were opposed to it they'd speak up." It isn't ignorance to see it as a group action; it's a natural tendency. If of the group of people using backup software no one complains except someone with a solution, there will be the inaccurate appearance that everyone is happy with their software except the few that speak up. Much like not everyone is capable of stopping someone from beating an old lady, not everyone is capable of rewriting backup software, although they might be opposed to the status quo. This is just an illustration; backing up data isn't beating old ladies. I strongly believe that analogy can never be evidence; only clarification.
Kevin Barry
My my, what an analogy, lol
Well, in the case of the old lady I certainly agree, silence must be agreement or allowance. But, that is just about the most extreme case you could pick. In the case of software or argument, I think it's less extreme, and in such cases silence can mean indifference and even defiance.
I think it depends a lot on context, for example if it was in a meeting and the boss would say at the end of his speech: "So, does everyone agree ?" or "Are we good on that ?" and is followed by complete silence with people staring at him, he would be a fool to think it means agreement.
I think it depends a lot on context, for example if it was in a meeting and the boss would say at the end of his speech: "So, does everyone agree ?" or "Are we good on that ?" and is followed by complete silence with people staring at him, he would be a fool to think it means agreement.
Lack of disagreement is easily exploited, as "the boss" well knows. If you keep doing your job, it doesn't matter if deep down you disagree.
Kevin Barry
Distribution: Solaris 9 & 10, Mac OS X, Ubuntu Server
Posts: 1,197
Rep:
Quote:
Originally Posted by ta0kira
The backup process is regulated by the network and protocol speeds to the point where it doesn't really effect processor or memory usage (2.4GHz Xeon quad, 4GB RAM.)
Upgrade some critical network connections? E.G. put a gigabit switch between the server with the data and the backup server. Or move the two together on the same server.
I have a situation where I've rsynced data stores between servers in two buildings, and one of those servers also happens to be the backup server for that building. So it's a snap to back up the data from that server and exclude the other one from the other building's backup.
We're in the process of upgrading the network backbone in the other building to gigabit, so it won't be as big a problem for things that have to be backed up within that building.
Quote:
Originally Posted by ta0kira
There isn't any predictability as to what exactly will be used during a given data process; the only certainty is that if someone besides me has something running then it's altering the data being backed up.
Snapshots? On Solaris, I've always used fssnap to do snapshots before doing a ufsdump. I'm now moving to ZFS, which, so far, has been both amazingly cool and amazingly easy.
There are solutions for snapshots on Linux as well, though it might require some retooling of your data stores. The prime example would be LVM, though you could get ZFS on linux.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.