LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices

Reply
 
LinkBack Search this Thread
Old 07-01-2011, 02:37 PM   #1
MALDATA
Member
 
Registered: Mar 2005
Posts: 103

Rep: Reputation: 17
Solutions for huge data backups


I've completely run out of ideas here, so I'm hoping someone else can help. I work in a small lab that produces a lot of high-speed video files during experiments. We have a lot of data (~30 TB) sitting on our server, and we really should have it backed up offsite.

We tried an online service for the last year or so because it's only $5 a month, but what they claim is "unlimited storage" actually isn't. Eventually one of the files that indexes your machine gets so huge it refuses to run anymore. So, after several months of continuous backups, we got something like half of it uploaded before it quit on us.

So, what do other people do? If you have a huge amount of data that you need backed up, what's the solution? Does everyone just cobble together their own in-house scheme? Is there a decent cloud service that isn't too limited or expensive? This can't be a problem unique to us.

Thanks
 
Old 07-01-2011, 04:35 PM   #2
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 13,787

Rep: Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359
Quote:
Originally Posted by MALDATA View Post
I've completely run out of ideas here, so I'm hoping someone else can help. I work in a small lab that produces a lot of high-speed video files during experiments. We have a lot of data (~30 TB) sitting on our server, and we really should have it backed up offsite.

We tried an online service for the last year or so because it's only $5 a month, but what they claim is "unlimited storage" actually isn't. Eventually one of the files that indexes your machine gets so huge it refuses to run anymore. So, after several months of continuous backups, we got something like half of it uploaded before it quit on us.

So, what do other people do? If you have a huge amount of data that you need backed up, what's the solution? Does everyone just cobble together their own in-house scheme? Is there a decent cloud service that isn't too limited or expensive? This can't be a problem unique to us.
Thanks
Sorry, but you've got limited options.

30TB won't stream anywhere with decent speed..unless you spring for an OC3 (or 12!!) line, to your offsite location. And if you're spending that kind of money, then you shouldn't be cobbling together ANYTHING, but purchasing a real, robust enterprise backup solution like Networker, Tivoli Storage Manager, or the like.

A "cloud" service for company data is, in my opinion, a very shaky solution. One of your payments doesn't go through? Too bad, so sad...we've got your data, and you can't have it until you pay up. Our service got hacked? Oh well...limited liability, tell your story to someone who cares. Downtime? Hmm...are you SURE it was us? Be sure you get detailed logs from your ISP, phone company(s), etc., etc., before you complain...because if you can't PROVE it was our fault, it sure as heck wasn't.

That said, how you do it will depend on several factors. How much data is changed on a daily/weekly/monthly basis?? How many versions of a file do you want to keep? How big are your average files? How quickly do you need to recover all/some of your data if it's lost?? Does it need to be ONLINE 24/7, NEARLINE, or can it be OFFLINE? And how do you have it stored locally now?

If I needed to archive things, with a not-too-quick recovery time, I'd invest in some 1TB or so drives, with ESATA enclosures. Copy your data to them, take them off site, and park them, LABELED NEATLY, along with DVD copies of all the same data. Add more drives as needed. But that just deals with your storage issue on a very low budget. Up a bit higher, and you should shove a server or two into a rack, and get some SAN disk attached to it. Get a DEDICATED net connection just for backups, and let rsync do the heavy lifting. An isolated network connection means it can go 24/7, without impacting your users, so who cares if you saturate the pipe? Bear in mind, that these are just suggestions..a REAL backup solution will keep track of what files are on what media, do versioning, disk/tape cleanup and housekeeping, etc. Lowballing it will take you only so far, and is only really good if you're talking about one copy of what you've got NOW. Check into bacula or zmanda..they're two good options
 
Old 07-01-2011, 04:53 PM   #3
MALDATA
Member
 
Registered: Mar 2005
Posts: 103

Original Poster
Rep: Reputation: 17
Quote:
If I needed to archive things, with a not-too-quick recovery time, I'd invest in some 1TB or so drives, with ESATA enclosures. Copy your data to them, take them off site, and park them, LABELED NEATLY, along with DVD copies of all the same data. Add more drives as needed. But that just deals with your storage issue on a very low budget.
Well, at least we got that part right. That's exactly what we did. We got a couple computers, hooked up some ESATA enclosures, and stuffed them with 16 TB of storage each. Most of the data isn't mine, so I don't know what needs to be constantly available... if some of it can go, I was thinking we could just stash them offsite, but like you said, that doesn't help with the daily backups.

I was just hoping that this wasn't such a rare occurrence that there wouldn't be a "typical" solution...

Thanks for the help though, and if anyone has suggestions, please let me know!
 
Old 07-01-2011, 05:09 PM   #4
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 13,787

Rep: Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359Reputation: 2359
Quote:
Originally Posted by MALDATA View Post
Well, at least we got that part right. That's exactly what we did. We got a couple computers, hooked up some ESATA enclosures, and stuffed them with 16 TB of storage each. Most of the data isn't mine, so I don't know what needs to be constantly available... if some of it can go, I was thinking we could just stash them offsite, but like you said, that doesn't help with the daily backups.

I was just hoping that this wasn't such a rare occurrence that there wouldn't be a "typical" solution...

Thanks for the help though, and if anyone has suggestions, please let me know!
Yeah, unless you do a good data survey, and identify what you need, and how long you need it, you're just guessing. And I don't know how many users you have, but DO NOT ask them directly! You'll get "I need EVERY FILE I'VE EVER TOUCHED, to be kept FOREVER, and backed up 27 times daily!!!". Ask the managers, if you're driving this.

Also, do some homework, and identify a good estimate of cost per GB stored...do it for multiple media (i.e. DVD for 'offline/may-take-a-day-to-restore-it', online, etc.), then double it. That should give you a good starting point, since the doubling will factor in such costs as power, floor space, network, labor, etc. When they give you some outlandish estimate, figure out how much they want to store, and hand them a bill for storing it. You'll be amazed at how quickly it drops.

Seriously, good luck. It's a tough nut to crack.
 
Old 07-02-2011, 12:21 AM   #5
MALDATA
Member
 
Registered: Mar 2005
Posts: 103

Original Poster
Rep: Reputation: 17
Thanks for your thoughts, TB0ne. Sometimes I just need to hear someone else say these things so I know I'm not missing something blindingly obvious...
 
Old 07-04-2011, 11:55 AM   #6
choogendyk
Senior Member
 
Registered: Aug 2007
Location: Massachusetts, USA
Distribution: Solaris 9 & 10, Mac OS X, Ubuntu Server
Posts: 1,189

Rep: Reputation: 105Reputation: 105
30TB, offsite as well as daily backups -- if the managers really care about their data and getting redundancy in backups with daily incrementals, then they need to be wiling to consider spending some money on it. DVDs just aren't going to do (calculate it). Duplicate sets of disk drives aren't going to give you your dailies or your real redundancy. What you need is an LTO5 Tape Library and real backup software that can handle real backup policies. You also need serious network speeds backside and redundant and a decent machine for backup server. LTO5 is high speed (max 140MB/s), but will choke (shoe shine) if your other components can't keep up with it.

I like Amanda, because it's backup planner will spread fulls over a cycle you choose (e.g. a week), with incrementals interspersed. So, for example, if you had 100 users with home directories and could break them down alphabetically, so those A-C were in one group, D-F in another, etc. (chosen based on numbers of users and volume usage), then Amanda would do fulls on some of the groups one day and on other groups on another day, stretching it out over the week. This means the peak backup demand is smoothed and you may only have a few TB per day that need to go to tape, even though every group is getting backed up every day (either full or incremental). When you are setting it up, you add a group per day to the disk list for Amanda so that you aren't trying to crank up fulls of absolutely everything all at once on the first run (obviously the planner can't schedule an incremental until it has a full).

How you break down your 30TB into manageable pieces obviously depends on how you have it organized. I just used the home directories as an example.
 
Old 07-04-2011, 12:33 PM   #7
Latios
Member
 
Registered: Dec 2010
Distribution: Arch
Posts: 115

Rep: Reputation: 21
You can backup however way you want. When the users want to restore the data, they won;t need it all at once. You'll be able to restore the files _being requested at the moment_ to the local site over moderate speed network connection
 
Old 07-05-2011, 11:37 AM   #8
MALDATA
Member
 
Registered: Mar 2005
Posts: 103

Original Poster
Rep: Reputation: 17
Quote:
30TB, offsite as well as daily backups -- if the managers really care about their data and getting redundancy in backups with daily incrementals, then they need to be wiling to consider spending some money on it.
I agree completely.

We're having a meeting this afternoon, so hopefully I can convince everyone that we need to survey the data and decide what really needs to be readily available. Maybe I can get everyone onboard with a tape library. I am not optimistic, but maybe if I sound like I'm panicking, I can get everyone to understand...

Thanks!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
HUGE FC6 waning. DVD+RW data loss markelo Fedora 12 01-01-2007 05:57 AM
Data extraction from a really, really huge file. thekillerbean Linux - Software 4 04-09-2006 04:18 AM
What is best 2D graphics for huge scientific data hill0093 Linux - Software 2 02-20-2006 10:36 AM
Restoring data from tape backups enygma Linux - General 8 11-11-2004 01:29 PM
best media for data storage & backups? Fascistchicken Linux - Hardware 2 04-09-2004 12:28 AM


All times are GMT -5. The time now is 04:51 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration