LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 04-05-2014, 02:30 PM   #1
suprstar
Member
 
Registered: Aug 2010
Location: Atlanta
Distribution: ubuntu, debian
Posts: 142
Blog Entries: 2

Rep: Reputation: 23
huge data archival


Our company has a requirement to archive all mysql data and bin-logs indefinitely, so we can do point-in-time recovery for any time. Our compressed data / bin-logs are appx 300GB per day, or on the order of 100TB per year. This data will be archived in 2 separate locations - one in our current datacenter (an east coast colo) and some other colo very far away, like on the west coast somewhere.

If hard drives are used, we'd mirror pairs of drives, and JBOD those mirrors. The data is absolutely critical - our company's existance, a couple hundred jobs, all depends on not losing 1 single byte, *EVER*.

On 4TB sata drives, that'd be 100 drives per year, plus the chassis's to put them in. + colo costs, + bandwidth, etc, we're looking at something like $100k to get started, and another $100k each year.

Is there a more cost efficient, equally safe method of archiving huge amounts of data like this?

Last edited by suprstar; 04-05-2014 at 03:39 PM.
 
Old 04-05-2014, 05:03 PM   #2
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,624

Rep: Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651
for that much data on corporate servers ,in a somewhat large company .
I would NOT!!!!! trust what anyone here has to say

that is what paid and INSURED/ bonded consultants are for


that said
Quote:
Is there a more cost efficient, equally safe method of archiving huge amounts of data like this?
look into tape drives
 
Old 04-08-2014, 03:17 PM   #3
grim76
Member
 
Registered: Jun 2007
Distribution: Debian, SLES, Ubuntu
Posts: 308

Rep: Reputation: 50
I would look into something along the lines of Data Domain or something similar. We are using that where I am at for DB backups. The Data Domain takes the backups uncompressed to an NFS share. It then deduplicates and compresses the data.

Also the Data Domain will replicate as well. I don't know what the cost is, but it can't be cheap.

Just offering up something to look at. I am not sure what other options are out there on the scale that you are talking about.
 
Old 04-08-2014, 04:14 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,632

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by suprstar View Post
Our company has a requirement to archive all mysql data and bin-logs indefinitely, so we can do point-in-time recovery for any time. Our compressed data / bin-logs are appx 300GB per day, or on the order of 100TB per year. This data will be archived in 2 separate locations - one in our current datacenter (an east coast colo) and some other colo very far away, like on the west coast somewhere.

If hard drives are used, we'd mirror pairs of drives, and JBOD those mirrors. The data is absolutely critical - our company's existance, a couple hundred jobs, all depends on not losing 1 single byte, *EVER*.

On 4TB sata drives, that'd be 100 drives per year, plus the chassis's to put them in. + colo costs, + bandwidth, etc, we're looking at something like $100k to get started, and another $100k each year.

Is there a more cost efficient, equally safe method of archiving huge amounts of data like this?
I think you need to have some more conversations with your company, to be honest. Yes, the 'want' is always to keep everything, forever...but I think setting realistic goals is better for everyone involved.

Even with Sarbanes/Oxley rules in place, you only have to keep data for 7 years, and there are guidelines on WHAT data you need to keep. Add into the equation that old data is just that (old)...how often are you realistically going to need to access it? And you don't mention a timeline for data access either. Keeping ALL the data online and accessible 24/7/365 (disaster-recovery center, manned, with equipment) is far different than having it offline and accessible next-business-day (data on tape in a vault w/courier service), or having it nearline, and being able to access it in a short time (robotic tape system w/indexer). And you also need to identify WHAT you need to bring back...if you only need one or two records from those databases, that's far different than bringing back a whole MySQL cluster, and that will modify your storage requirements as well. To muddy the waters further, you also need to take personnel into account; if you have an IT staff of 25, and can dedicate someone to managing your backups/restorations, that's fine...if it's just you, you'll either need to hire someone to assist, or you need to spend the big $$$ on something VERY automated, with lots of hardware/software behind the scenes.

It's a complex issue, with no easy answer. It's going to cost you big no matter what. Last time I had to lay something like this out, I wound up doing an entire SAN replication via BCV snapshots to the disaster recovery site over an OC-12 line. From there, we had point-in-time backups set up to shovel things onto LTO tapes, and keep them rotated/filled. The DR site was hosted at an IBM facility, which provided staff to do physical tasks (like shoveling tapes in/out of the library). It was VERY expensive...but by setting realistic goals, we had everything needed to go back as far as we needed to, with minimal impact on the local admins. When business-unit owners would complain that they wanted full backups every day, we'd show them the bottom-line cost of tapes/disk/SAN and BCV space/data domain space, and we had VERY few takers.

And this:
Quote:
Originally Posted by suprstar
The data is absolutely critical - our company's existance, a couple hundred jobs, all depends on not losing 1 single byte, *EVER*.
...is absolutely ludicrous. If you really, truly think that a company that employs a couple hundred people can't survive if a single hard-drive dies, you are either misinformed or mistaken. You mentioned the $100K figure above...if what you say here is true, then you are ALREADY spending FAR more than that on a backup solution for your in-house needs, not to mention the hardware RAID array's in each workstation that you have, in case of failure, right?

Last edited by TB0ne; 04-09-2014 at 09:11 AM.
 
1 members found this post helpful.
Old 04-10-2014, 10:12 AM   #5
suprstar
Member
 
Registered: Aug 2010
Location: Atlanta
Distribution: ubuntu, debian
Posts: 142

Original Poster
Blog Entries: 2

Rep: Reputation: 23
Quote:
Originally Posted by TB0ne View Post
...is absolutely ludicrous.
No it isnt - This isn't in-house backups, it's customer data for clients with deep pockets, and that's their requirement. We'll certainly be passing the cost down to them, and they arent complaining about the cost. We'll be under contract to have this data, if we lose it we'll be in breech of contract and have some very serious consequences to deal with.

I'm just trying to figure out if there's a good tape-drive solution, or even if tapes are as reliable.. I've never been a fan, never used them in the past. Are they easy to damage? For the $$ can I get 10x as much storage in a tape carousel? Or is there a good hosted cloud storage solution with that much capacity? Might be worth it, idk, thats why Im asking and investigating options.
 
Old 04-10-2014, 10:32 AM   #6
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,632

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by suprstar View Post
No it isnt - This isn't in-house backups, it's customer data for clients with deep pockets, and that's their requirement. We'll certainly be passing the cost down to them, and they arent complaining about the cost. We'll be under contract to have this data, if we lose it we'll be in breech of contract and have some very serious consequences to deal with.
I understand the TOS standpoint...but "one single byte" is a standard that NO ONE, ANYWHERE can meet, nor is the fact that the one single byte will cause the company to go out of business. If that's their expectation, it can't be realistically met, and you're going to find yourselves in deep, very quickly.

Also, you need to determine how quickly they need their data, and what the expectations are. The difference in cost between offline/nearline/online storage is MASSIVE. And if all you're backing up is MySQL data, you need to have hot-backup agents running, or take DB dumps....then consider how you'll RESTORE those files. Databases aren't just files/data..you need to import these things back to the DB server. And in doing so..are you going to overwrite data in the CURRENT database? Modify things? Do you need to restore to a recovery server, to pull off what they want?
Quote:
I'm just trying to figure out if there's a good tape-drive solution, or even if tapes are as reliable.. I've never been a fan, never used them in the past. Are they easy to damage? For the $$ can I get 10x as much storage in a tape carousel? Or is there a good hosted cloud storage solution with that much capacity? Might be worth it, idk, thats why Im asking and investigating options.
If your client has deep pockets, then you need to not create your own solution. You need to look at Tivoli Storage Manager (or Netbackup), with data domain, on a decent SAN platform (either IBM if you go the Tivoli route, or EMC for Netbackup). It will cost a LOT of money, but it is specifically designed to do exactly what you're after. That's why huge banks, NASA, JPL, and other similar institutions use them....they store LOTS of data, and keep it available, and handle housekeeping. A real, enterprise-sized backup solution isn't something you cobble together, especially if you're entering into it with the 'one single byte' mindset.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Why needs the Upgrade to 17 a huge amount of data? JZL240I-U Fedora 6 08-08-2012 03:07 AM
Disk is Full but really does not contain huge data kalpeer Linux - Server 2 06-25-2012 08:43 AM
Solutions for huge data backups MALDATA General 7 07-05-2011 11:37 AM
Starting Over: Best Practices for Data Archival, etc. WhatsOnYourBrain Linux - General 8 03-15-2007 11:57 AM
What is best 2D graphics for huge scientific data hill0093 Linux - Software 2 02-20-2006 10:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 07:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration