I need a new backup plan. Suggestions?

SCSIraidGURU · 10-17-2014, 02:34 PM

You wanted a 30TB solution. Only corporations can really afford a 30TB array and backup solution.

At home, I own all my software. My web site is 185GB of vacation and family pictures and movies. I use an LTO 3 and two Exabyte VXA-320 tape drives to backup under 400 GB of data.

In my Data Center for 9TB of total storage we use LTO-5 and two LTO-4 tape backups and VMware DR corporate solution. Tapes are the best backup. Cryptolocker can't get to a tape. One of my friends EMC 30TB D2D and SANs were taken out by Cryptolocker virus in 8 minutes. 32Gbps fiber connections sped it through their data and servers. Tapes are safe and can go offsite easily.

I would recommend RAID 6 on a hardware controller for over 8TB of storage per array. Don't do software RAID. Any glitch can crash the entire array. Don't do RAID 0. LSI Logic MegaRAID SATA controllers with battery cache are worth the money. At home my main workstation has a LSI Logic 8708EM2 SAS RAID controller with 128 MB cache with 300GB SAS 15K SAS drives for maximum performance. I need performance not space. My SAS RAID is 1.097GB/s cached reads and 850 MB/s cached writes. Faster than SSD.

The problem with over 16TB of storage defragging and backing up. You can burn up SATA drives running them continuously for 8+ hours a day. So RAID 6 with spare drives on a hardware controller. A drive fails, the spare immediately rebuilds. My 300 GB drives rebuild RAID 1 in 40 minutes. RAID 6 gives you two failed drives. Over 16TB in an array needs special Linux formatting. Only a few protocols can do over 16TB.

Backing up 30TB will take days. In RAID, every drive is spinning during the backup. So your failure rate on SATA increases dramatically. You need adequate spacing and cooling or you will kill a drive during a full backup. SATA drives are not SAS. SATA has 33% duty cycle in optimal cooling and voltage conditions. I use Panflo fans and a Cooler Master Stacker chassis to cool my drives. Spacing is 1" between drives. Airflow is important.

You need disk deduplication software. I am working with Bacula now. It is not fun to configure for 3 tape drives. What you want is disk deduplication. It only copies block level changes from your files to your backup array. It speeds up backups a lot. You copy just changes instead of entire files. You do full and then differential backup. Full first time and only changes are kept on the 2nd backup and so on. Two files to restore. It will save your drives and data.

Nogitsune · 10-25-2014, 12:36 PM

Quote:

Originally Posted by deathsfriend99

That IS interesting. That sounds like the behavior I have been seeing. Sometimes it kicks a drive out, I pull the drive, run a Seagate Diagnostic on it, and it comes up good. I'd slap it back in and it'd work for a while, then another drive would fail.

Sometimes the array will just go read-only until I unmount it and run fsck.

Perhaps it's a port multiplier issue.
I was incorrect in the version of CentOS. These are running 5.7. Maybe I'll upgrade them to 6 and see what happens. I hate to do Centos 7. It's so awful!

I've spent the past few days recreating my file server. I started out with OpenSUSE, but quickly got fed up with systemd, and ended up going back to installing gentoo. Most of the things were fine, but my SiI 3132 contollers were stubbornly uppity with me. The symptoms weren't as bad as they used to be, but although they kept working, they still kept giving me these glares, like:

Code:

Oct 25 02:54:44 [kernel] [ 5410.276611] ata12.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x6 frozen
Oct 25 02:54:44 [kernel] [ 5410.276618] ata12.00: failed command: WRITE FPDMA QUEUED
Oct 25 02:54:44 [kernel] [ 5410.276627] ata12.00: cmd 61/01:28:08:08:00/00:00:00:00:00/40 tag 5 ncq 512 out
Oct 25 02:54:44 [kernel] [ 5410.276627]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 25 02:54:44 [kernel] [ 5410.276631] ata12.00: status: { DRDY }
Oct 25 02:54:44 [kernel] [ 5410.276636] ata12: hard resetting link
Oct 25 02:54:50 [kernel] [ 5415.748990] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Oct 25 02:54:50 [kernel] [ 5415.758754] ata12.00: configured for UDMA/100
Oct 25 02:54:50 [kernel] [ 5415.758760] ata12.00: device reported invalid CHS sector 0
Oct 25 02:54:50 [kernel] [ 5415.758765] ata12: EH complete

And I don't like it. It didn't knock the disks off, didn't break the raid, and didn't even put the filesystem to read-only. Something was still wrong. After reading around forums a bit more it seems that SiI 3132 revision 1 is buggy, and won't work properly. Apparently revision 5 of the same card should work without issues.

Code:

06:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)

According to lspci, what I have is rev 1 card. Since it seemed to work fine with only one SATA drive attached, for years back when port multiplier code wasn't in kernel - that's what I decided to try and go back to. I turned off the port multiplier support, recompiled, and connected just one drive to the card. Time will tell how it'll end up - but at least the system booted fine, and found the drive without issues.

SiI 3132 seems pretty common chip for eSATA card in JBOD configurations, so it's worth checking what chip lspci shows for you - and if it's 3132, what revision it is.

Here's a couple of threads that may be relevant for the case:

http://ubuntuforums.org/showthread.php?t=2061555
http://www.linuxquestions.org/questi...26-4175445070/

--- EDIT ---
After running a full RAID resync on 3TB drives (lasted around 10 hours), the SiI 3132 Rev 1 showed no errors whatsoever with just one of the drives connected to it, and the port multiplier code disabled on kernel. Unfortunately if I've understood correctly, a common setup in JBOD environment is to have 3132 with two ports (and port multiplier) providing the external SATA capabilities, each port connecting to JBOD array. If running with only one JBOD system, and thus using only one port on SiI 3132, then I imagine this would work fine with Rev 1 chip with the multiplier code turned off - assuming the JBOD array behind the 3132 shows up as a single disk. If it shows as array of individual disks, then port multiplier may be necessary - I don't know, I've never ran such a configuration.

In either case, if you need both ports on the SiI 3132, then I think Rev 1 chip is not entirely stable. In that case I'd try to either replace the card with Rev 5 card, or another model that's stable. Another option might be to try leaving the multiplier code disabled, and use two SiI 3132 Rev 1 cards with each connected to only one JBOD, if one came with each JBOD system, provided the host system has two free PCI-E slots that can support them.
--- END EDIT ---