LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   I need a new backup plan. Suggestions? (https://www.linuxquestions.org/questions/linux-general-1/i-need-a-new-backup-plan-suggestions-4175522321/)

deathsfriend99 10-16-2014 08:39 AM

I need a new backup plan. Suggestions?
 
I currently have 2 files servers with about 20TB of data total on a Dell MD3600i. I expect this to grow to 30TB in the next year or 2.

I am currently backing up incrementally with an rsync script to a few backup servers outside the datacenter that have a JBOD (8 2TB disks) attached to them running software raid on CentOS 6.x.

My problem is that these disks are somewhat unreliable, and I am having to replace them pretty often. On top of that, the raid array keeps getting corrupted. I have to fsck the array every month or so.

I tried an online cloud backup (CrashPlan) who offer unlimited space, but they throttle their uploads, so 6 months later, it still hasn't finished an initial backup even though I have a 1Gb connection. Typical transfer speeds have been 1-2Mbps to CrashPlan.

I'm looking for fairly inexpensive (hopefully offsite) backup solutions that work well and are low maintenance. Is my JBOD really the best solution?

TB0ne 10-16-2014 08:59 AM

Quote:

Originally Posted by deathsfriend99 (Post 5254581)
I currently have 2 files servers with about 20TB of data total on a Dell MD3600i. I expect this to grow to 30TB in the next year or 2.

I am currently backing up incrementally with an rsync script to a few backup servers outside the datacenter that have a JBOD (8 2TB disks) attached to them running software raid on CentOS 6.x.

Software RAID is ALWAYS going to be much slower than hardware RAID. And you're not really doing a backup...you're making a copy, and there IS a difference. A 'real' backup program does versioning, media rotation, and a host of other things. You need to check out programs like bacula or zmanda.
Quote:

My problem is that these disks are somewhat unreliable, and I am having to replace them pretty often. On top of that, the raid array keeps getting corrupted. I have to fsck the array every month or so.
You shouldn't have to fsck things unless it's not shut down properly. And disks should really last for years, so you may have larger issues. Power problems? Poor on-site admins at the data center, who aren't doing things properly?
Quote:

I tried an online cloud backup (CrashPlan) who offer unlimited space, but they throttle their uploads, so 6 months later, it still hasn't finished an initial backup even though I have a 1Gb connection. Typical transfer speeds have been 1-2Mbps to CrashPlan.
First, check your connection. Here, it isn't unheard of to have a 60MB/s download speed, but only 2-4MB UPload speed. Your ISP connection may be the same, so that's the first thing to check. Could be fixed with a phone call, and they could change your ratios on the fly.
Quote:

I'm looking for fairly inexpensive (hopefully offsite) backup solutions that work well and are low maintenance. Is my JBOD really the best solution?
No, it isn't. 30TB isn't that much data, really; one of my clients currently has over 4 PETABYTES of data on both disk (SAN/DDR for the current versions of files), and LTO tapes (older versions, full backups that need to be kept for a long time). Since you solicited opinions, I'd suggest spending the $$$ on a decent LTO drive. A single LTO6 tape can hold 2.5 TB per tape, at about $40 per tape. Multiple versioning, and long-term storage are MUCH easier for those prices. Need offsite? Toss the tapes into a safe-deposit box, or use any commercial offsite storage service.

You CAN go the hard drive route, but you will quickly spend a LOT more money. If you want to do long term archival, or keep multiple versions of a file, it quickly adds up, and you hit a limit on how many disks you can attach using JBOD/SATA type things. If you go the SAN route, then you will need a shedload of $$$ to make it work.

TenTenths 10-16-2014 09:17 AM

Are you using "consumer" grade disks in your backup server or are they from a known server manufacturer?

If you've the budget available I'd ditch the software RAID and get a hardware RAID card with on-board battery for the write cache that supports RAID6, that way you mitigate the loss of a single disk fairly easily and you can have a hot-spare drive in the server. I'd also consider using an LTO6 tape drive autoloader to do a weekly rotate off-site backup with a further monthly with a 1 year retention. A lot depends on the criticality of the data.

deathsfriend99 10-16-2014 09:45 AM

Thanks for the suggestions so far. I hadn't thought of a LTO drive. I'll look into that.
Let me clarify some things.

My backups to my offsite JBOD are slightly more sophisticated than a simple copy. I do version history up to 6 versions, then I dump the last version. The rsync is done with an update option and the backups are then shuffled so that older versions are kept. The data are mostly scientific in nature, and not business critical, so it isn't critical that I keep version history long term. I only do this in case someone accidentally deletes or overwrites something that they need.

I agree, that I shouldn't have to be doing fsck's on these arrays. There is certainly some underlining issue. Could be SATA controller/port multiplier that came with these JBOD's. They certainly weren't meant for this sort of abuse (http://www.newegg.com/Product/Produc...016R-_-Product), but it was what I was stuck with then I took this job. The systems are always on and have never lost power in the 3+ years they've been in use.

As for CrashPlan, I've read tons of complaints that they slow down your uploads. A couple of speed tests from my server show:
Code:

Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Me (999.999.999.999)...
Selecting best server based on latency...
Hosted by AT&T (Atlanta, GA) [0.40 km]: 4.22 ms
Testing download speed........................................
Download: 284.47 Mbits/s
Testing upload speed..................................................
Upload: 235.02 Mbits/s


Yes, these are consumer grade disks in the JBOD. Simple Seagate 2TB SATA drives. This backup was done on the cheap for sure.

suicidaleggroll 10-16-2014 09:55 AM

Quote:

Originally Posted by deathsfriend99 (Post 5254616)
Yes, these are consumer grade disks in the JBOD. Simple Seagate 2TB SATA drives. This backup was done on the cheap for sure.

As long as they're cooled adequately they should still last a very, very long time. The only time I've experienced rapid hard disk failure is when there was no cooling and the drives were running at over 120 F regularly. Keep them under 90 F and they should last for many years, even the regular consumer drives.

I have a cron job on my systems that probes the hard disk temperature every 10 minutes. If any drive in the system is over 104 F (40 C) it emails a warning. If any drive in the system is over 113 F (45 C) it immediately shuts down the server. Out of ~100 HDDs that are up 24/7, I typically lose one every ~2 years, and these are just your regular consumer grade 7200 RPM drives. They're all running hardware RAID 6 or 60, and I have a few backup drives in a drawer so that when one fails I just swap it out, the RAID automatically rebuilds, and everything keeps ticking.

TenTenths 10-16-2014 10:03 AM

HP DL380 chassis with the additional 8 drive cage, 16 of these(1.2Tb 5Gb SAS Drive), 2 mirrored for the OS and 14 in a RAID6 for your 14Tb storage, sorted!

deathsfriend99 10-16-2014 10:10 AM

The JBOD's currently reside in my office. It's pretty cool in here year round (a little too cool as I sit in here with a jacket on at the moment). A quick check shows the HD's at about 84F.

suicidaleggroll 10-16-2014 10:17 AM

Then I don't think your problem is the HDs, it must be the JBOD. It is not normal for drives, even consumer drives, to give out that often under normal conditions, and it's definitely not normal to have to fsck an 8-drive RAID array every month (provided there aren't power issues).

suicidaleggroll 10-16-2014 10:21 AM

Quote:

Originally Posted by TenTenths (Post 5254625)
HP DL380 chassis with the additional 8 drive cage, 16 of these(1.2Tb 5Gb SAS Drive), 2 mirrored for the OS and 14 in a RAID6 for your 14Tb storage, sorted!

Good lord...$18k for 14 TB? I get that it's using enterprise SAS drives, but that's insanely excessive for a backup solution. If it was a SOL mission-critical system then sure, but not a backup. For $18k you could build TWO 24-drive 80 TB hardware RAID 60 systems with extra drives out the wazoo. Back up to both, keep them mirrored, do whatever you like, swap out the occasional failed drive once every 2-3 years, and be on your merry.

Nogitsune 10-16-2014 02:20 PM

Quote:

Originally Posted by deathsfriend99 (Post 5254616)

I agree, that I shouldn't have to be doing fsck's on these arrays. There is certainly some underlining issue. Could be SATA controller/port multiplier that came with these JBOD's. They certainly weren't meant for this sort of abuse (http://www.newegg.com/Product/Produc...016R-_-Product), but it was what I was stuck with then I took this job. The systems are always on and have never lost power in the 3+ years they've been in use.

I did run a linux (gentoo) file server for a few years with RAID6 array. It was old hardware built from scraps, and the MB had too few SATA ports. I tossed in a fuj:tech dual port SATA II controller that uses Sil3132 chip, and from what I can tell a port multiplier. After some extended use, it always tripped off one or the other of the SATA disks connected to it, which basically showed as failed disk on the RAID. As long as I ran it with only one drive, everything was peachy and it was rock solid. So eventually I just bought another similar controller and installed them both (had 2 available PCI-E slots), and with one disk on each, everything was fine.

Later, when I upgraded a kernel I noticed there was option to specifically support SATA port multipliers. I applied it, and tested running two drives on single card afterwards, and it did seem stable, but I never ran any extended tests on it so I couldn't say for sure.

If you're running a card with a drive in both ports, then I'd suggest - if at all possible - to try having only one port in use.. and/or to check kernel options to make sure you have port multiplier support enabled ( Device Drivers -> Serial ATA and Parallel ATA drivers -> SATA Port Multiplier support ).

As for backups, for my needs it suffices to run a linux box with hardware that has 10 SATA ports on motherboard. There's 3 PCI-E slots, so I could run up to 3 of those cards on it. Assuming both ports are stable with the newer kernel that's up to 16 SATA drives. I use 3T drives (WD caviar green), in RAID6 that would be up to 42T space (minus the overhead, and the 1000* vs 1024* stuff). For versioning I use BTRFS snapshots. It's not enterprise class solution overall, but suits me fine, and it's not very spendy.

deathsfriend99 10-16-2014 02:39 PM

Quote:

Originally Posted by Nogitsune (Post 5254767)
I did run a linux (gentoo) file server for a few years with RAID6 array. It was old hardware built from scraps, and the MB had too few SATA ports. I tossed in a fuj:tech dual port SATA II controller that uses Sil3132 chip, and from what I can tell a port multiplier. After some extended use, it always tripped off one or the other of the SATA disks connected to it, which basically showed as failed disk on the RAID. As long as I ran it with only one drive, everything was peachy and it was rock solid. So eventually I just bought another similar controller and installed them both (had 2 available PCI-E slots), and with one disk on each, everything was fine.

Later, when I upgraded a kernel I noticed there was option to specifically support SATA port multipliers. I applied it, and tested running two drives on single card afterwards, and it did seem stable, but I never ran any extended tests on it so I couldn't say for sure.

If you're running a card with a drive in both ports, then I'd suggest - if at all possible - to try having only one port in use.. and/or to check kernel options to make sure you have port multiplier support enabled ( Device Drivers -> Serial ATA and Parallel ATA drivers -> SATA Port Multiplier support ).

As for backups, for my needs it suffices to run a linux box with hardware that has 10 SATA ports on motherboard. There's 3 PCI-E slots, so I could run up to 3 of those cards on it. Assuming both ports are stable with the newer kernel that's up to 16 SATA drives. I use 3T drives (WD caviar green), in RAID6 that would be up to 42T space (minus the overhead, and the 1000* vs 1024* stuff). For versioning I use BTRFS snapshots. It's not enterprise class solution overall, but suits me fine, and it's not very spendy.


That IS interesting. That sounds like the behavior I have been seeing. Sometimes it kicks a drive out, I pull the drive, run a Seagate Diagnostic on it, and it comes up good. I'd slap it back in and it'd work for a while, then another drive would fail.

Sometimes the array will just go read-only until I unmount it and run fsck.

Perhaps it's a port multiplier issue.
I was incorrect in the version of CentOS. These are running 5.7. Maybe I'll upgrade them to 6 and see what happens. I hate to do Centos 7. It's so awful!

Nogitsune 10-17-2014 01:24 AM

Quote:

Originally Posted by deathsfriend99 (Post 5254772)
That IS interesting. That sounds like the behavior I have been seeing. Sometimes it kicks a drive out, I pull the drive, run a Seagate Diagnostic on it, and it comes up good. I'd slap it back in and it'd work for a while, then another drive would fail.

Sometimes the array will just go read-only until I unmount it and run fsck.

Perhaps it's a port multiplier issue.
I was incorrect in the version of CentOS. These are running 5.7. Maybe I'll upgrade them to 6 and see what happens. I hate to do Centos 7. It's so awful!

If that's the problem then it's just matter of having recent enough kernel, and having the needed options compiled in. If you can compile the kernel from sources yourself, that would be my first stop. Well that, or preferably just running the disks in only one port of the card/multiplier - if it stays stable like that, then you know where the problem is. One thing to keep an eye out for: when a disk fails, does it happen consistently on the disks that are attached to port in multiplier?

Quote:

Originally Posted by TenTenths (Post 5254625)
HP DL380 chassis with the additional 8 drive cage, 16 of these(1.2Tb 5Gb SAS Drive), 2 mirrored for the OS and 14 in a RAID6 for your 14Tb storage, sorted!

No, not sorted at all. We're talking about roughly 20T data that's expected to grow to 30T. For backup you'd need minimum of 40T for growth and some versioning. You'd need 3 stations like that. And if you're going with that kind of hardware you're thinking of something so critical that you'll probably either want raid 60 or just two separate systems, so you'd go with 6 stations. I didn't look at the cost, but based on previous post it's 18k per system, so you're looking at about $100,000 price tag.
...
Rejected. :tisk:

SCSIraidGURU 10-17-2014 11:44 AM

RAID is not a backup solution. It is a redundancy solution. 30TB requires D2D solution or tape library solution. 30TB in disk deduplication requires 10Gbps pipe to the D2D SAN for performance. Deduplication is copying block level changes to your D2D system. You also need a powerful SAN for performance and reliability. You don't want software RAID on 30TB. You want an EMC or HP SAN with hardware controllers and 1 GB cached controllers. 30TB SAN and layer 3 10Gbps switches will set you back $150K+. My question is why do you have 30TB of data? My data center with 30+ databases is 9TB. Our Exchange server and every e-mail/attachment for 20 years and 30 databases with 15+ years of data is 9TB.

We have an HP C7000 blade center, HP 5800 series layer 3 switches, and HP P2000 G3 SAN with Microsoft data center licensing, VMware, Citrix, etc. Only cost $260,000 to start. A 30TB EMC D2D system is over $500,000. I would not do this open source. You need enterprise hardware.

suicidaleggroll 10-17-2014 12:19 PM

Quote:

Originally Posted by SCSIraidGURU (Post 5255145)
RAID is not a backup solution. It is a redundancy solution. 30TB requires D2D solution or tape library solution. 30TB in disk deduplication requires 10Gbps pipe to the D2D SAN for performance. Deduplication is copying block level changes to your D2D system. You also need a powerful SAN for performance and reliability. You don't want software RAID on 30TB. You want an EMC or HP SAN with hardware controllers and 1 GB cached controllers. 30TB SAN and layer 3 10Gbps switches will set you back $150K+. My question is why do you have 30TB of data? My data center with 30+ databases is 9TB. Our Exchange server and every e-mail/attachment for 20 years and 30 databases with 15+ years of data is 9TB.

We have an HP C7000 blade center, HP 5800 series layer 3 switches, and HP P2000 G3 SAN with Microsoft data center licensing, VMware, Citrix, etc. Only cost $260,000 to start. A 30TB EMC D2D system is over $500,000. I would not do this open source. You need enterprise hardware.

Nobody said RAID itself is a backup solution, as you said it's for redundancy. However, in this case, the RAID array itself IS the backup (the backup is not the RAID's parity, the entire RAID machine is, itself, a backup for a completely different system). There is one primary machine with 30 TB, and a separate machine for backups that happens to be running a RAID.

Your solution is so ridiculously overkill that I don't even know where to begin. You're listing "requirements" and "needs" as if you have any idea what his needs actually are (other than the amount of storage required), and are then pulling $100k+ pieces of equipment out of the woodwork as if it's a necessity when it's clearly not.

Why does it matter why he has 30 TB of data? I currently have over 70 TB of data, with 150 TB of capacity (~80 TB free). Not everybody is in the same line of work as you.

I wouldn't begin to assume I know anything about setting up a datacenter for mission-critical applications at 40+ Gb/s bandwidths. Similarly, somebody in that field shouldn't assume that every single system ever built, regardless of the application, needs to conform to those standards. Believe it or not, there ARE people out there who simply need a system to reliably back up tens or hundreds of TB at a leisurely pace of 1 or 10 GbE, and it's not the end of the world if it goes down for 5 minutes for a RAID card change or a kernel update. There's absolutely no need to spend $500k on a system to do this when a $5k system will do it just fine, albeit with an occasional drive swap every couple of years.

TB0ne 10-17-2014 02:03 PM

Quote:

Originally Posted by suicidaleggroll (Post 5255163)
Nobody said RAID itself is a backup solution, as you said it's for redundancy. However, in this case, the RAID array itself IS the backup (the backup is not the RAID's parity, the entire RAID machine is, itself, a backup for a completely different system). There is one primary machine with 30 TB, and a separate machine for backups that happens to be running a RAID.

Your solution is so ridiculously overkill that I don't even know where to begin. You're listing "requirements" and "needs" as if you have any idea what his needs actually are (other than the amount of storage required), and are then pulling $100k+ pieces of equipment out of the woodwork as if it's a necessity when it's clearly not. Why does it matter why he has 30 TB of data? I currently have over 70 TB of data, with 150 TB of capacity (~80 TB free). Not everybody is in the same line of work as you.

I wouldn't begin to assume I know anything about setting up a datacenter for mission-critical applications at 40+ Gb/s bandwidths. Similarly, somebody in that field shouldn't assume that every single system ever built, regardless of the application, needs to conform to those standards. Believe it or not, there ARE people out there who simply need a system to reliably back up tens or hundreds of TB at a leisurely pace of 1 or 10 GbE, and it's not the end of the world if it goes down for 5 minutes for a RAID card change or a kernel update. There's absolutely no need to spend $500k on a system to do this when a $5k system will do it just fine, albeit with an occasional drive swap every couple of years.

Agree with the sentiment totally..reminds of the saying "When all you have is a hammer, every problem looks like a nail".

I tend towards the 'proper' solution myself, and usually have to reel myself back in quite a bit. Yes, it's *TECHNICALLY* a better solution, but overkill for what's needed. I still think the LTO drive with a decent backup package gives most bang for the buck here, but just the term 'backup' is a VERY hairy one. Dealing with the number of versions needed, how quickly you need to access those files, how often a full backup is done vs. incremental, and you have a full time job just managing it. And all that goes into the mix, before you even get STARTED figuring out how much it costs....yes, keeping 10 versions of every file, a full backup each week and month, along with nightly incrementals, and a seven-year Sarbanes/Oxley plan IS the best way...but all those resources cost $$$. Amazing how many department heads go from "We need xxx versions of EVERYTHING!!" to "Yes, two are fine", once they see the cost to their department. :)

OP, if you've been handed this system and need to replace it, I'd suggest the first and best step you can take is to meet with the heads of each of your departments, and have an honest conversation about what their needs are. If THEY are ok with waiting until the next day to get a tape back from a vault, that's far different than them needing it *NOW*, and effects your costs (and system complexity). Plan for something that can grow, too...not only from a resource standpoint, but also from a system standpoint. While you may not start out backing up desktop systems, that MAY happen in the future, and do you really want to migrate to a different platform/system/media to accommodate things in a year or two? I'd advise getting rid of the rsync setup, in favor of something database driven like Bacula. And don't be afraid to spend money on something commercial, if it does a better job, and meets your needs better. If your entire business is down, how much will you lose? Weigh that against the $$$ spent on decent backups.

As an analogy, I remember dealing with a department years ago. Their loan system ran on an old version of HP/UX, and the vendor kept telling them to upgrade, and go to Solaris. I talked with the vendor, got recommendations, prices, etc., and came up with a quote of $26k for TWO systems (they only had ONE old one), with the latest of everything, RAID drives, the works. The department head practically coughed up his skull, and said it was too much, and they were fine. Until a few months later, when their old system croaked, with parts they didn't even MAKE anymore, on a system that was unsupported. They were down for over a week, with things getting overnighted in, etc., and LOADS of overnight work, too. That week of downtime cost the company over $3 MILLION in revenue...because they didn't want to spend $26k. Your bosses may gag a bit at the in-the-door price, but it's cheap insurance in the long run.


All times are GMT -5. The time now is 05:26 PM.