I need a new backup plan. Suggestions?
I currently have 2 files servers with about 20TB of data total on a Dell MD3600i. I expect this to grow to 30TB in the next year or 2.
I am currently backing up incrementally with an rsync script to a few backup servers outside the datacenter that have a JBOD (8 2TB disks) attached to them running software raid on CentOS 6.x. My problem is that these disks are somewhat unreliable, and I am having to replace them pretty often. On top of that, the raid array keeps getting corrupted. I have to fsck the array every month or so. I tried an online cloud backup (CrashPlan) who offer unlimited space, but they throttle their uploads, so 6 months later, it still hasn't finished an initial backup even though I have a 1Gb connection. Typical transfer speeds have been 1-2Mbps to CrashPlan. I'm looking for fairly inexpensive (hopefully offsite) backup solutions that work well and are low maintenance. Is my JBOD really the best solution? |
Quote:
Quote:
Quote:
Quote:
You CAN go the hard drive route, but you will quickly spend a LOT more money. If you want to do long term archival, or keep multiple versions of a file, it quickly adds up, and you hit a limit on how many disks you can attach using JBOD/SATA type things. If you go the SAN route, then you will need a shedload of $$$ to make it work. |
Are you using "consumer" grade disks in your backup server or are they from a known server manufacturer?
If you've the budget available I'd ditch the software RAID and get a hardware RAID card with on-board battery for the write cache that supports RAID6, that way you mitigate the loss of a single disk fairly easily and you can have a hot-spare drive in the server. I'd also consider using an LTO6 tape drive autoloader to do a weekly rotate off-site backup with a further monthly with a 1 year retention. A lot depends on the criticality of the data. |
Thanks for the suggestions so far. I hadn't thought of a LTO drive. I'll look into that.
Let me clarify some things. My backups to my offsite JBOD are slightly more sophisticated than a simple copy. I do version history up to 6 versions, then I dump the last version. The rsync is done with an update option and the backups are then shuffled so that older versions are kept. The data are mostly scientific in nature, and not business critical, so it isn't critical that I keep version history long term. I only do this in case someone accidentally deletes or overwrites something that they need. I agree, that I shouldn't have to be doing fsck's on these arrays. There is certainly some underlining issue. Could be SATA controller/port multiplier that came with these JBOD's. They certainly weren't meant for this sort of abuse (http://www.newegg.com/Product/Produc...016R-_-Product), but it was what I was stuck with then I took this job. The systems are always on and have never lost power in the 3+ years they've been in use. As for CrashPlan, I've read tons of complaints that they slow down your uploads. A couple of speed tests from my server show: Code:
Retrieving speedtest.net configuration... Yes, these are consumer grade disks in the JBOD. Simple Seagate 2TB SATA drives. This backup was done on the cheap for sure. |
Quote:
I have a cron job on my systems that probes the hard disk temperature every 10 minutes. If any drive in the system is over 104 F (40 C) it emails a warning. If any drive in the system is over 113 F (45 C) it immediately shuts down the server. Out of ~100 HDDs that are up 24/7, I typically lose one every ~2 years, and these are just your regular consumer grade 7200 RPM drives. They're all running hardware RAID 6 or 60, and I have a few backup drives in a drawer so that when one fails I just swap it out, the RAID automatically rebuilds, and everything keeps ticking. |
HP DL380 chassis with the additional 8 drive cage, 16 of these(1.2Tb 5Gb SAS Drive), 2 mirrored for the OS and 14 in a RAID6 for your 14Tb storage, sorted!
|
The JBOD's currently reside in my office. It's pretty cool in here year round (a little too cool as I sit in here with a jacket on at the moment). A quick check shows the HD's at about 84F.
|
Then I don't think your problem is the HDs, it must be the JBOD. It is not normal for drives, even consumer drives, to give out that often under normal conditions, and it's definitely not normal to have to fsck an 8-drive RAID array every month (provided there aren't power issues).
|
Quote:
|
Quote:
Later, when I upgraded a kernel I noticed there was option to specifically support SATA port multipliers. I applied it, and tested running two drives on single card afterwards, and it did seem stable, but I never ran any extended tests on it so I couldn't say for sure. If you're running a card with a drive in both ports, then I'd suggest - if at all possible - to try having only one port in use.. and/or to check kernel options to make sure you have port multiplier support enabled ( Device Drivers -> Serial ATA and Parallel ATA drivers -> SATA Port Multiplier support ). As for backups, for my needs it suffices to run a linux box with hardware that has 10 SATA ports on motherboard. There's 3 PCI-E slots, so I could run up to 3 of those cards on it. Assuming both ports are stable with the newer kernel that's up to 16 SATA drives. I use 3T drives (WD caviar green), in RAID6 that would be up to 42T space (minus the overhead, and the 1000* vs 1024* stuff). For versioning I use BTRFS snapshots. It's not enterprise class solution overall, but suits me fine, and it's not very spendy. |
Quote:
That IS interesting. That sounds like the behavior I have been seeing. Sometimes it kicks a drive out, I pull the drive, run a Seagate Diagnostic on it, and it comes up good. I'd slap it back in and it'd work for a while, then another drive would fail. Sometimes the array will just go read-only until I unmount it and run fsck. Perhaps it's a port multiplier issue. I was incorrect in the version of CentOS. These are running 5.7. Maybe I'll upgrade them to 6 and see what happens. I hate to do Centos 7. It's so awful! |
Quote:
Quote:
... Rejected. :tisk: |
RAID is not a backup solution. It is a redundancy solution. 30TB requires D2D solution or tape library solution. 30TB in disk deduplication requires 10Gbps pipe to the D2D SAN for performance. Deduplication is copying block level changes to your D2D system. You also need a powerful SAN for performance and reliability. You don't want software RAID on 30TB. You want an EMC or HP SAN with hardware controllers and 1 GB cached controllers. 30TB SAN and layer 3 10Gbps switches will set you back $150K+. My question is why do you have 30TB of data? My data center with 30+ databases is 9TB. Our Exchange server and every e-mail/attachment for 20 years and 30 databases with 15+ years of data is 9TB.
We have an HP C7000 blade center, HP 5800 series layer 3 switches, and HP P2000 G3 SAN with Microsoft data center licensing, VMware, Citrix, etc. Only cost $260,000 to start. A 30TB EMC D2D system is over $500,000. I would not do this open source. You need enterprise hardware. |
Quote:
Your solution is so ridiculously overkill that I don't even know where to begin. You're listing "requirements" and "needs" as if you have any idea what his needs actually are (other than the amount of storage required), and are then pulling $100k+ pieces of equipment out of the woodwork as if it's a necessity when it's clearly not. Why does it matter why he has 30 TB of data? I currently have over 70 TB of data, with 150 TB of capacity (~80 TB free). Not everybody is in the same line of work as you. I wouldn't begin to assume I know anything about setting up a datacenter for mission-critical applications at 40+ Gb/s bandwidths. Similarly, somebody in that field shouldn't assume that every single system ever built, regardless of the application, needs to conform to those standards. Believe it or not, there ARE people out there who simply need a system to reliably back up tens or hundreds of TB at a leisurely pace of 1 or 10 GbE, and it's not the end of the world if it goes down for 5 minutes for a RAID card change or a kernel update. There's absolutely no need to spend $500k on a system to do this when a $5k system will do it just fine, albeit with an occasional drive swap every couple of years. |
Quote:
I tend towards the 'proper' solution myself, and usually have to reel myself back in quite a bit. Yes, it's *TECHNICALLY* a better solution, but overkill for what's needed. I still think the LTO drive with a decent backup package gives most bang for the buck here, but just the term 'backup' is a VERY hairy one. Dealing with the number of versions needed, how quickly you need to access those files, how often a full backup is done vs. incremental, and you have a full time job just managing it. And all that goes into the mix, before you even get STARTED figuring out how much it costs....yes, keeping 10 versions of every file, a full backup each week and month, along with nightly incrementals, and a seven-year Sarbanes/Oxley plan IS the best way...but all those resources cost $$$. Amazing how many department heads go from "We need xxx versions of EVERYTHING!!" to "Yes, two are fine", once they see the cost to their department. :) OP, if you've been handed this system and need to replace it, I'd suggest the first and best step you can take is to meet with the heads of each of your departments, and have an honest conversation about what their needs are. If THEY are ok with waiting until the next day to get a tape back from a vault, that's far different than them needing it *NOW*, and effects your costs (and system complexity). Plan for something that can grow, too...not only from a resource standpoint, but also from a system standpoint. While you may not start out backing up desktop systems, that MAY happen in the future, and do you really want to migrate to a different platform/system/media to accommodate things in a year or two? I'd advise getting rid of the rsync setup, in favor of something database driven like Bacula. And don't be afraid to spend money on something commercial, if it does a better job, and meets your needs better. If your entire business is down, how much will you lose? Weigh that against the $$$ spent on decent backups. As an analogy, I remember dealing with a department years ago. Their loan system ran on an old version of HP/UX, and the vendor kept telling them to upgrade, and go to Solaris. I talked with the vendor, got recommendations, prices, etc., and came up with a quote of $26k for TWO systems (they only had ONE old one), with the latest of everything, RAID drives, the works. The department head practically coughed up his skull, and said it was too much, and they were fine. Until a few months later, when their old system croaked, with parts they didn't even MAKE anymore, on a system that was unsupported. They were down for over a week, with things getting overnighted in, etc., and LOADS of overnight work, too. That week of downtime cost the company over $3 MILLION in revenue...because they didn't want to spend $26k. Your bosses may gag a bit at the in-the-door price, but it's cheap insurance in the long run. |
All times are GMT -5. The time now is 05:26 PM. |