huge data archival
Our company has a requirement to archive all mysql data and bin-logs indefinitely, so we can do point-in-time recovery for any time. Our compressed data / bin-logs are appx 300GB per day, or on the order of 100TB per year. This data will be archived in 2 separate locations - one in our current datacenter (an east coast colo) and some other colo very far away, like on the west coast somewhere.
If hard drives are used, we'd mirror pairs of drives, and JBOD those mirrors. The data is absolutely critical - our company's existance, a couple hundred jobs, all depends on not losing 1 single byte, *EVER*. On 4TB sata drives, that'd be 100 drives per year, plus the chassis's to put them in. + colo costs, + bandwidth, etc, we're looking at something like $100k to get started, and another $100k each year. Is there a more cost efficient, equally safe method of archiving huge amounts of data like this? |
for that much data on corporate servers ,in a somewhat large company .
I would NOT!!!!! trust what anyone here has to say that is what paid and INSURED/ bonded consultants are for that said Quote:
|
I would look into something along the lines of Data Domain or something similar. We are using that where I am at for DB backups. The Data Domain takes the backups uncompressed to an NFS share. It then deduplicates and compresses the data.
Also the Data Domain will replicate as well. I don't know what the cost is, but it can't be cheap. Just offering up something to look at. I am not sure what other options are out there on the scale that you are talking about. |
Quote:
Even with Sarbanes/Oxley rules in place, you only have to keep data for 7 years, and there are guidelines on WHAT data you need to keep. Add into the equation that old data is just that (old)...how often are you realistically going to need to access it? And you don't mention a timeline for data access either. Keeping ALL the data online and accessible 24/7/365 (disaster-recovery center, manned, with equipment) is far different than having it offline and accessible next-business-day (data on tape in a vault w/courier service), or having it nearline, and being able to access it in a short time (robotic tape system w/indexer). And you also need to identify WHAT you need to bring back...if you only need one or two records from those databases, that's far different than bringing back a whole MySQL cluster, and that will modify your storage requirements as well. To muddy the waters further, you also need to take personnel into account; if you have an IT staff of 25, and can dedicate someone to managing your backups/restorations, that's fine...if it's just you, you'll either need to hire someone to assist, or you need to spend the big $$$ on something VERY automated, with lots of hardware/software behind the scenes. It's a complex issue, with no easy answer. It's going to cost you big no matter what. Last time I had to lay something like this out, I wound up doing an entire SAN replication via BCV snapshots to the disaster recovery site over an OC-12 line. From there, we had point-in-time backups set up to shovel things onto LTO tapes, and keep them rotated/filled. The DR site was hosted at an IBM facility, which provided staff to do physical tasks (like shoveling tapes in/out of the library). It was VERY expensive...but by setting realistic goals, we had everything needed to go back as far as we needed to, with minimal impact on the local admins. When business-unit owners would complain that they wanted full backups every day, we'd show them the bottom-line cost of tapes/disk/SAN and BCV space/data domain space, and we had VERY few takers. And this: Quote:
|
Quote:
I'm just trying to figure out if there's a good tape-drive solution, or even if tapes are as reliable.. I've never been a fan, never used them in the past. Are they easy to damage? For the $$ can I get 10x as much storage in a tape carousel? Or is there a good hosted cloud storage solution with that much capacity? Might be worth it, idk, thats why Im asking and investigating options. |
Quote:
Also, you need to determine how quickly they need their data, and what the expectations are. The difference in cost between offline/nearline/online storage is MASSIVE. And if all you're backing up is MySQL data, you need to have hot-backup agents running, or take DB dumps....then consider how you'll RESTORE those files. Databases aren't just files/data..you need to import these things back to the DB server. And in doing so..are you going to overwrite data in the CURRENT database? Modify things? Do you need to restore to a recovery server, to pull off what they want? Quote:
|
All times are GMT -5. The time now is 11:47 PM. |