LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   huge data archival (https://www.linuxquestions.org/questions/linux-server-73/huge-data-archival-4175500694/)

suprstar 04-05-2014 02:30 PM

huge data archival
 
Our company has a requirement to archive all mysql data and bin-logs indefinitely, so we can do point-in-time recovery for any time. Our compressed data / bin-logs are appx 300GB per day, or on the order of 100TB per year. This data will be archived in 2 separate locations - one in our current datacenter (an east coast colo) and some other colo very far away, like on the west coast somewhere.

If hard drives are used, we'd mirror pairs of drives, and JBOD those mirrors. The data is absolutely critical - our company's existance, a couple hundred jobs, all depends on not losing 1 single byte, *EVER*.

On 4TB sata drives, that'd be 100 drives per year, plus the chassis's to put them in. + colo costs, + bandwidth, etc, we're looking at something like $100k to get started, and another $100k each year.

Is there a more cost efficient, equally safe method of archiving huge amounts of data like this?

John VV 04-05-2014 05:03 PM

for that much data on corporate servers ,in a somewhat large company .
I would NOT!!!!! trust what anyone here has to say

that is what paid and INSURED/ bonded consultants are for


that said
Quote:

Is there a more cost efficient, equally safe method of archiving huge amounts of data like this?
look into tape drives

grim76 04-08-2014 03:17 PM

I would look into something along the lines of Data Domain or something similar. We are using that where I am at for DB backups. The Data Domain takes the backups uncompressed to an NFS share. It then deduplicates and compresses the data.

Also the Data Domain will replicate as well. I don't know what the cost is, but it can't be cheap.

Just offering up something to look at. I am not sure what other options are out there on the scale that you are talking about.

TB0ne 04-08-2014 04:14 PM

Quote:

Originally Posted by suprstar (Post 5147242)
Our company has a requirement to archive all mysql data and bin-logs indefinitely, so we can do point-in-time recovery for any time. Our compressed data / bin-logs are appx 300GB per day, or on the order of 100TB per year. This data will be archived in 2 separate locations - one in our current datacenter (an east coast colo) and some other colo very far away, like on the west coast somewhere.

If hard drives are used, we'd mirror pairs of drives, and JBOD those mirrors. The data is absolutely critical - our company's existance, a couple hundred jobs, all depends on not losing 1 single byte, *EVER*.

On 4TB sata drives, that'd be 100 drives per year, plus the chassis's to put them in. + colo costs, + bandwidth, etc, we're looking at something like $100k to get started, and another $100k each year.

Is there a more cost efficient, equally safe method of archiving huge amounts of data like this?

I think you need to have some more conversations with your company, to be honest. Yes, the 'want' is always to keep everything, forever...but I think setting realistic goals is better for everyone involved.

Even with Sarbanes/Oxley rules in place, you only have to keep data for 7 years, and there are guidelines on WHAT data you need to keep. Add into the equation that old data is just that (old)...how often are you realistically going to need to access it? And you don't mention a timeline for data access either. Keeping ALL the data online and accessible 24/7/365 (disaster-recovery center, manned, with equipment) is far different than having it offline and accessible next-business-day (data on tape in a vault w/courier service), or having it nearline, and being able to access it in a short time (robotic tape system w/indexer). And you also need to identify WHAT you need to bring back...if you only need one or two records from those databases, that's far different than bringing back a whole MySQL cluster, and that will modify your storage requirements as well. To muddy the waters further, you also need to take personnel into account; if you have an IT staff of 25, and can dedicate someone to managing your backups/restorations, that's fine...if it's just you, you'll either need to hire someone to assist, or you need to spend the big $$$ on something VERY automated, with lots of hardware/software behind the scenes.

It's a complex issue, with no easy answer. It's going to cost you big no matter what. Last time I had to lay something like this out, I wound up doing an entire SAN replication via BCV snapshots to the disaster recovery site over an OC-12 line. From there, we had point-in-time backups set up to shovel things onto LTO tapes, and keep them rotated/filled. The DR site was hosted at an IBM facility, which provided staff to do physical tasks (like shoveling tapes in/out of the library). It was VERY expensive...but by setting realistic goals, we had everything needed to go back as far as we needed to, with minimal impact on the local admins. When business-unit owners would complain that they wanted full backups every day, we'd show them the bottom-line cost of tapes/disk/SAN and BCV space/data domain space, and we had VERY few takers.

And this:
Quote:

Originally Posted by suprstar
The data is absolutely critical - our company's existance, a couple hundred jobs, all depends on not losing 1 single byte, *EVER*.

...is absolutely ludicrous. If you really, truly think that a company that employs a couple hundred people can't survive if a single hard-drive dies, you are either misinformed or mistaken. You mentioned the $100K figure above...if what you say here is true, then you are ALREADY spending FAR more than that on a backup solution for your in-house needs, not to mention the hardware RAID array's in each workstation that you have, in case of failure, right?

suprstar 04-10-2014 10:12 AM

Quote:

Originally Posted by TB0ne (Post 5148993)
...is absolutely ludicrous.

No it isnt - This isn't in-house backups, it's customer data for clients with deep pockets, and that's their requirement. We'll certainly be passing the cost down to them, and they arent complaining about the cost. We'll be under contract to have this data, if we lose it we'll be in breech of contract and have some very serious consequences to deal with.

I'm just trying to figure out if there's a good tape-drive solution, or even if tapes are as reliable.. I've never been a fan, never used them in the past. Are they easy to damage? For the $$ can I get 10x as much storage in a tape carousel? Or is there a good hosted cloud storage solution with that much capacity? Might be worth it, idk, thats why Im asking and investigating options.

TB0ne 04-10-2014 10:32 AM

Quote:

Originally Posted by suprstar (Post 5150122)
No it isnt - This isn't in-house backups, it's customer data for clients with deep pockets, and that's their requirement. We'll certainly be passing the cost down to them, and they arent complaining about the cost. We'll be under contract to have this data, if we lose it we'll be in breech of contract and have some very serious consequences to deal with.

I understand the TOS standpoint...but "one single byte" is a standard that NO ONE, ANYWHERE can meet, nor is the fact that the one single byte will cause the company to go out of business. If that's their expectation, it can't be realistically met, and you're going to find yourselves in deep, very quickly.

Also, you need to determine how quickly they need their data, and what the expectations are. The difference in cost between offline/nearline/online storage is MASSIVE. And if all you're backing up is MySQL data, you need to have hot-backup agents running, or take DB dumps....then consider how you'll RESTORE those files. Databases aren't just files/data..you need to import these things back to the DB server. And in doing so..are you going to overwrite data in the CURRENT database? Modify things? Do you need to restore to a recovery server, to pull off what they want?
Quote:

I'm just trying to figure out if there's a good tape-drive solution, or even if tapes are as reliable.. I've never been a fan, never used them in the past. Are they easy to damage? For the $$ can I get 10x as much storage in a tape carousel? Or is there a good hosted cloud storage solution with that much capacity? Might be worth it, idk, thats why Im asking and investigating options.
If your client has deep pockets, then you need to not create your own solution. You need to look at Tivoli Storage Manager (or Netbackup), with data domain, on a decent SAN platform (either IBM if you go the Tivoli route, or EMC for Netbackup). It will cost a LOT of money, but it is specifically designed to do exactly what you're after. That's why huge banks, NASA, JPL, and other similar institutions use them....they store LOTS of data, and keep it available, and handle housekeeping. A real, enterprise-sized backup solution isn't something you cobble together, especially if you're entering into it with the 'one single byte' mindset.


All times are GMT -5. The time now is 11:47 PM.