Some more Rsync questions

lleb · 11-22-2013, 08:54 AM

Quote:

Originally Posted by sigint-ninja

ok thanks to everyone who posted...i have learned a lot over the last few weeks...i just have a last question regarding rsync...here goes

if i have server A and Server B located 20km's away...and im using Rsync to backup +- 200GB of data from A to B (only backing up changes) , should I also have a 2TB hdd connected to server B that takes a local full backup everyday??? cause surely if its only backing up the changes and a file becomes corrupt (ie it cant be opened) the changes will copy over to my only copy of the data? whats the best practice?

thanks again

sorry for slow response. A few things to keep in mind.

1. bandwidth limitation. As you are only talking 20km distance, I would make a local backup to your external HDD, then physically transport it to the other location to restore into your backup directory. This will drastically reduce your overhead once you start backing up via the internet.

2. The more places you can remotely store data of the same type of any kind of value the better. One reason for remote backups is physical harm to the original data, fire, water, electrical, etc... The same things that can happen to the original data can also happen to the remote data. With the cost of storage dropping to a much more reasonable amount per TB more is safer from this point of view.

3. More is NOT safer if the data is "sensitive" data such as persona information, medical, SSN (in the USA), financial, etc... For this type of data consider a secured physical remote backup location. Again as long as you have multiple secured locations the data will be more "reliable" for data recovery.

4. For my personal files and my web site I keep 7 days worth of backups to multiple locations. For a business you may wish to keep 2 weeks, 30 days, or even longer, but keep in mind even your 200G worth of data will quickly start to add up. In addition to the base 200G of data the data amount should be increasing daily if not weekly and definitely monthly. With that mind mind I personally follow this guideline for calculating storage needs.

4a. Take maximum amount of data to backup today. Multiple that number by 75 - 100% (allowing for growth, i typically will run it for both 75% and 100% growth for cost consideration)
4b. Multiply the number you generated from 4a by how ever many days you wish to store the data without overwriting.
4c. Increase the number from 4b by no less then 25% (overhead. you want to strive for roughly 20% overhead at the min. or you will start to suffer disk performance issues)

So for quick numbers with your 200G worth of data you have the following:

a. 200 * 0.75 = 150 + 200 = 350G ; 200 * 1 = 200 + 200 = 400G These are your DAILY min. for storage.
b. 350 * {7, 14, 30, 360} = 2450G, 4900G, 10500G, 126,000G ; 400 * {7, 14, 30, 360} = 2800G, 5600G, 12,000G, 144,000G
c. (for this you can finish out the math, ill do it for you 7 days worth of storage)
2450 * .25 = 613 + 2450 = 3063 = 3100G = 3.1T ; 2800 * .25 = 700 + 2800 = 3500G = 3.5T

So for only 7 days worth of storage you can get away with 4TB storage, this should cover you for about a year before you start having to expand your backup storage. This is where LVMs come into play over RAID 10. Best performance, fair price, easy to manage and easy to expand via LVMs with Linux.

As for the RedHat line, that is personally what I use and support. I have used Debian and a few of its forks and really like raw Debian as a server, but one of my last contracts was supporting roughly 3500 RHEL servers in the field. I used Fedora as my desktop and have stuck with it. The new RHEL 7 that should be out later this year (in a month or 2, its slated for end of 2013 release) will be a HUGE upgrade in both performance and maintenance due to the newer kernel (same as being used in Fedora 19) and the addition of systemd that is replacing the old init.d (that debian still uses but is starting to move to systemd) or sysV.

If you do not already have a server in place, then It would be worth the wait for RHEL 7 to be released, then about a month later to start using it. CentOS is the free, as in beer, fork of RHEL and as long as you dont start adding 3rd party repositories it will remain 1:1 binary with RHEL for its entire general life cycle. typically 5 years.

The people who require long term support for 3rd party application (typically customer for a business) are the ones who really get the most benefit from RHEL as it will never move off of its current kernel for the entire 5 years. That does mean folks who demand newer/bleeding edge technology typically start having fits with RHEL as they near the end of their life cycle. The current RHEL 6.x is now only 2 years old but is still using the 2.6.32 kernel. A lot of newer applications will not even run on that old of a kernel. A good example of that is Google Chrome. Google made the choice over a year ago to stop supporting the 2.6.x kernel line. But if you are running a GUI on a server with a web browser, you server some pain. Servers are not workstations and should never be treated as such.

As for easy of maintenance I have found that with the improvements to yum, the RHEL line of servers is as easy if not a bit easier then Debian to maintain and update. Debian still has the largest advantage that if you dont need long term support for 3rd party applications that you can easily upgrade from version to version. This is not possible, currently with the RHEL line and typically requires much more work to go from version to version under RHEL.

Hope that all helps.