LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-28-2013, 02:50 PM   #1
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Rep: Reputation: Disabled
Suggestions for large data backups


Hey guys,

so I have been debating what would be the best way to protect my media collection. I just upgraded my server (hardware) and until now, I have been using this strategy to protect my data:

3X 2TB assembled as a single Large volume (LVM) and 3X 2TB also assembled as a Large volume for backups (hotplugged to do backups using RSync once a month).

While this method worked relatively well, I am concerned with some fallbacks it has:
  • I have 6X 2TB drives, so a "gross" storage capacity of 12TB, but a real usable capacity of only 6TB, so the ratio usable/gross is not so good (50%)...
  • Backups to the hotswap LVM are VERY slow. Backing up 4TB takes days..
  • If I lose a single drive on the source LVM (the one that runs 24/7), I lose EVERYTHING (the logical volume will degrade). Same goes for the backup LVM.
  • My data is important to me but losing all of it wouldnt kill me.. I can always retrieve/reencode/re-rip everything but it would take months. For critical data, my SQL databases, OS files, etc, a real offline backup is essential.
  • And finally, my 6TB LVM is almost full. Soon I will need to add 2 drives, one for the source LVM, one for the backup LVM. With the need for always 2 drives at a time, my SATA ports will be all used soon. I want as much as possible, to have the least amount of drives as possible.

I was thinking about a RAID5 or 6 to replace the LVM strategy.

Lets assume a RAID6 array with 6 drives of 2TB each. There are some issues with that too. I understand RAID6 requires all drives to be the same size otherwise it will not use the larger ones and will truncate the array to the smallest drive. So in the future, if I install a 3TB drive on the RAID6 array, the 3TB will be used only with 2TB.

Also, if I lose 2 drives, the array is GONE. What I fear the most is losing a drive, swapping it with a new one, let the array rebuild itself, then BAM! another drive goes WHILE the array rebuilds... It would be fatal.

Reliability speaking, if I lose 2 drives on my LVM strategy (one on each LVM), I lose everything. If I lose 2 drives on the RAID6 strategy, I also lose everything. So this is somehow similar... The only difference is the total runtime of the drives in LVM modes. The drives in the backup LVM have probably around 200 hours of runtime, while the drives on the running LVM are running 24/7 for 4 years now.

Ideally speaking, I'd use a RAID5 or 6 array to benefit from the largest capacity possible while prioviding a certain degree of data protection, and use a tape backup to backup my stuff. Unfortunately, tape backups are WAY OUT of my budget...

Im sure Im not the only one in this situation. I read on lots of forums (Freenas, Hardforum, and half a dozen of other forums) that people are
building crazy NAS machines with 20TB, 30TB and sometimes MORE!! How do they backup their data???? 30TB of data, you need at least 10 drives (3TB each) and likely more than that. SO unless they're having 20 to 30 drives in their machines, they wouldnt be able to backup everything.

Do they even think about that? Seriously, when you have so much data, you have to back it up, and with so much data, backups are very slow..

Im curious to hear what you guys are thinking and have to suggest...

Thanks!!!
 
Old 08-28-2013, 06:32 PM   #2
dt64
Member
 
Registered: Sep 2012
Distribution: RHEL5/6, CentOS5/6
Posts: 218

Rep: Reputation: 38
Backups of large amounts of data do always have cost implications. You have to decide depending on the value of you rdata.

You referenced some NAS builders having 30TB worth of storage space. This might well be true, but I don't believe that these guys will have weekly full backups and daily incrementals as long as they are home users. Most of these data would be movies, images etc and it should be easy enough to get it back if the storage failed.

In professional environments things look a bit different, but here we are maybe talking about SANs & Co, which are most likely out of your range anyway (You said a backup tape solution wasn't possible).

If you calculate it a tape solution might well fit in your plans. You don't need a tape roboter, a normal tape drive and some tapes sould possibly the cheapest and best-to-be-handled option. Or you just go for a online backup solution like CrashPlan ( I believe they offered unlimited backup space for one machine for about 5$/month, but there are different reports about the service quality. Check it out...).

What I would do, if I needed that much storage nowadays at home I'd go for a raid 10 or raid 5, depending on whether I need more speed or more capacity, maybe even a JBOD. This I'd back up to a tape system if slow backups are ok and swapping tapes is no issue. An external LTO4 drive you could get for less than 1500 bucks, a pack of 5 1600GB media for less than 100 bucks.

If you need instant backups you should set up a 2nd low-carb machine, just enough CPU/RAM to do the backup stuff, but same HDD capacity as your main machine. This should be located off-site and have a RAID too. This in turn should be backed up to a tape system...

That all costs money, electricity etc... only you can tell the value of your data.
 
Old 08-28-2013, 06:59 PM   #3
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Original Poster
Rep: Reputation: Disabled
Yeah, this all make sense..

Quote:
but I don't believe that these guys will have weekly full backups and daily incrementals as long as they are home users
Same here. I backup my 6TB LVM monthly with incremental backups.. Why recopy over everything if it hasnt changed!?

Really its more an archive system than a backup solution that I am searching. Like you said, if sh** hits the fan and I lose 2 drives on a RAID6 array, so be it.. I suppose.

Quote:
This I'd back up to a tape system if slow backups are ok and swapping tapes is no issue. An external LTO4 drive you could get for less than 1500 bucks, a pack of 5 1600GB media for less than 100 bucks.
I eliminated the tape backups at first because I was under the impression that these (even ultra low end ones) were $1000+.. I am totally OK with slow backups, the current backup LVM is mounted in Vantec hotswap enclosures connected to a Vantec SATA1 (!) PCI card (not PCIe!). I guess I get at BEST 30MB/s.. Perhaps not even that.

Im also OK with swapping tapes. Who cares if it takes 2 hours to backup and swap drives, Ill probably end up backing up during week nights when I am stuck at home anyways..

The price of those tape machines is offsetting me quite a bit. Lets say I was to buy a tape solution, I'd be willing to spend $350 to $500 at most. I guess that leaves me with very old (in the GB range not TB) or very low end (unreliable) tape solutions..

Maybe a simple RAID5 or 6 array would do?! More importantly, what are people doing!? Im curious.

Last edited by lpallard; 08-28-2013 at 07:07 PM.
 
Old 08-28-2013, 07:12 PM   #4
dt64
Member
 
Registered: Sep 2012
Distribution: RHEL5/6, CentOS5/6
Posts: 218

Rep: Reputation: 38
In that case you could possibly go for 2nd hand professional tape drives and new tapes. In the long run (especially when you want to archive things) tapes might be the best solution.

Just had a quick look at world largest electronic 2nd hand market: Over here you could get used external LTO4 drives for a few hundred...

Depending on your archiving needs (how long to archive, file size, how often you need access) you might even just backup files to tapes (more than one, stored at different locations) and free space on your HDDs.
 
Old 08-29-2013, 07:04 AM   #5
itlb
LQ Newbie
 
Registered: Aug 2013
Distribution: Debian
Posts: 29

Rep: Reputation: Disabled
Quote:
Originally Posted by lpallard View Post

Reliability speaking, if I lose 2 drives on my LVM strategy (one on each LVM), I lose everything. If I lose 2 drives on the RAID6 strategy, I also lose everything. So this is somehow similar... The only difference is the total runtime of the drives in LVM modes. The drives in the backup LVM have probably around 200 hours of runtime, while the drives on the running LVM are running 24/7 for 4 years now.
With RAID6 you can lose 2 drives without data loss...RAID6 is what the big storage vendors use generally for their RAID groups, I would have thought it would be more than adequate. Alternatively ZFS RAIDZ2 (or even RAIDZ3 to tolerate triple disk failure).
 
Old 09-15-2013, 10:17 AM   #6
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Original Poster
Rep: Reputation: Disabled
OK Ive decided to move my data to a RAID5 array using 6 identical 2TB hard drives.. but before I do, I have to ask the best way to do so since I have problems with data moving. I'm explaining:

I currently have a logical volume using 3 physical drives. All my data is on that LV (5.2TB). I have used another set of 3 identical disks to create a RAID5 array using 2 drives for the array, and the 3rd for parity. Array is up and doing well

I would have preferred to use all 6 drives at the same time to create the RAID5 array and size the FS, but I cant since 3 of the drives are nearly full and I cannot move my data to temporary storage.

To circumvent this problem, I was thinking to do this:

1. Use 3 of the 6 drives (the empty ones) to create the RAID5 array (2 drives for array, the 3rd for parity)
2. Create the XFS filesystem on the array

3. Move the equivalent of 1 drive (around 2TB) to the array
3a. Remove one drive from the LVM
3b. Add this drive to the array by expanding the array and the FS

4. Move the equivalent of 1 drive (around 2TB) to the array
4a. Remove one drive from the LVM
4b. Add this drive to the array by expanding the array and the FS

5. Move the equivalent of 1 drive (around 2TB) to the array
5a. Remove one drive from the LVM
5b. Add this drive to the array by expanding the array and the FS

At this point, the LVM will have no drives and will be deleted safely. All data will be moved to the new RAID5 array.

The problem I have is with tuning XFS for the RAID5 array. Best tuning requires setting parameters that depends heavily on the array chunk size and number of drives in the array. If I optimize the FS for 3 drives as the initial array will have, when I expand it to use 4 drives, 5drives and in the end 6 drives, the FS will no longer be optimized for the number of drives, etc.

Another problem is the sync time the array will require to rebuild (sync) itself every time I add a new drive. It took 5 hours to sync the array when I created it at first with 3 drives.

How should I migrate my data safely to a new raid array??
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sort large data in large file - commands smithy2010 Linux - Newbie 7 02-03-2013 09:01 AM
Best FTP app for large transfers & backups. maidenseye Linux - Software 5 04-05-2009 10:46 PM
Verifying Rsync Backups of Large Volumes of Files mcgirvanmedia Linux - Server 2 06-03-2008 11:30 PM
Large backups or Tar doesn't work jeanpba Linux - Hardware 1 10-04-2002 10:04 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 05:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration