GeneralThis forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Sorry for the confusion, the configuration I am talking about would be 12 1TB HDD's in a RAID 5. While it won't be exactly 12 TB it would be very close to it. But the good news is, if you have 12TB you won't ever have to worry about running out of space for a personal file server for at least a good 20 years, unless your stuff is Pr0n. Although Hosting movies just for your own personal theater, wouldn't take too long but it would likely take about 4 years depending what all your rip. (i.e. Movie Only or Movie and Extra Features).
Me I am going to build this, and yes I will have a ton of space, but at least I will never have to worry about running out for a long time. But I was just curious to know what most other people would use it for. Majority rules in favor of movies.
Guess the RIAA and Copyright Goons are going to have to step things up. LOL
But I doubt I will fill it with movies as I don't watch them that much. But there are a few favorites and some older shows that are nice to have. Especially those that don't yet exist on DVD.
The first thing I'd do is reformat the drives. Instead of a RAID array, I'd configure it into two aufs arrays on two different computers, preferably in two different cities. That means a total of only 6TB of space, or maybe 7-8TB if one array is made bigger than the other (i.e. only back up 4TB on the second computer; leaving the other 4TB with no backup).
A RAID5 array can have more capacity, but performance will be truly sad if a drive ever fails. With that poor performance, it will be imperative to replace the drive ASAP. It's a pain to deal with, especially if the OS drive is the one which goes out. And adding/removing drives from the array? Eh, no thanks! Too many headaches.
In contrast, aufs is easy to administer. In aufs, the data partitions are "merged" together on a file system level instead of a block device level. That means the drives can be freely mixed and matched. Drives can be added or removed with ease. If a drive fails, then all of the files which were on it are no longer available, but the rest of the files are still accessable with no performance loss. Restoring to the backup is as simple as using rsync to copy from the backup computer.
As for what I'd actually put on the computers--ripped DVDs and CDs can go in a non-backed up directory. The non-backed up directory can also be used for MythTV recordings; I can manually save any TV recordings I want to keep long term in the backed up directory. In the backed up directory will go my personal files, which notably includes photos and digicam videos. My digicam takes 640x480x30fps motion jpeg videos, which consume space at a frightful rate. Currently, I'm quite frugal with video clips, because of this. But if I had 5TB of space to devote to home videos, I'd be gobbling up a 2gig memory card all the time!
Cool, Anyone else want to share their thought on how to best use a 12 TB Array.
If you don't know how you're going to use it, I wouldn't suggest you buy it, unless the point is that you can simply brag that you've got one. It's just a waste of money.
You can't expect the hardware itself to last 20 years. You can expect each drive to last maybe 5 years before you begin seeing failures. The more drives, the more replacements, more possibility for failure, and the more money down the drain. And that doesn't include the required RAID hardware to make it all work.
Besides, I bet we'll have converted to flash technology, or similar, and gotten rid of mechanical drives well before 20 years from now. SATA is somewhat new now, but it might be old hat in 5 years, and tech will have moved on to something better and faster.
Consider how much properly used space you think you'd need for the next 5 years. Maybe double that for things you didn't consider, and buy a hard drive to match.
I however will likely be using this for my hosting company that I plan to get off the ground soon, but I need to have the hardware before I can start the company. It will start small and then grow from there. The array will be external to the system. And I can split the array between 3 systems if I want to giving each of them 4TB each. But Even at this rate I will hit my hosting limit of the system well before I hit the limit of the disk space (I think). Of course I am going to try to have web servers have at least 16GB of Memory, if I can afford it. Splitting the space between three servers each with their own array might be more economical.
The other option is to split the array between 3 servers doing different things. While the external array is nice and helps me keep my Rack size small by not having to install 4U or 5U servers, the bad thing about this is even though the array will have a redundant power supply, if the power supply completely dies, I loose three systems instead of one. But with 12, I could also use this a backup system for all my systems. Kind of a D2D2T system but while the array size is quite large I don't think it will be that hard to manage.
This would be my first Raid Array that I actually own, and it isn't about bragging rights just want to ensure that space is available for whatever I may need. One person suggested leasing out the space for people to use. While this is a good idea that I may use, I will make sure that people who use it realize that I won't backup their mp3's and movies that they put on there. Plus if they do that I would be looking at some legal issues, but I would just have to try my best to manage it. So I doubt I would lease out space.
Mostly would use it for hosting of my stuff for my company and web hosts and such. Webservers rarely take up space unless they are storing information in a database then that begins to grow quite fast especially when hosting multiple sites. So I could use it for database storage array but then again. I would likely hit hosting limits of system before I run out of space, but you never know.
The more drives, the more replacements, more possibility for failure, and the more money down the drain. And that doesn't include the required RAID hardware to make it all work.
However if I only had one or two drives fail after 5 years and the rest of the hardware continues to function I would consider it an achievement. When Solid state hard drives become better and cheaper then I will likely move over to them, but for now I will have to just maintain it to best I can. It may not be best to utilize a full 12 TB in one array but better to split array into 3 4TB sections as I explained in previous post.
Seriously, don't use RAID5. If continuous uptime is critical, use RAID1 and redundant duplicate servers. Assuming one primary server and two backup servers, that means a total of 2 terabytes of space.
Your customers will remember and care if your service goes down and they lose data as a result. They won't remember or care if their bill was reduced a few bucks because you cut corners on hard drive expenditures.
Raid 1 for the OS systems to keep the systems from going down is always smart and is the plan that I intend to utlize. But for data storage, utilizing raid 1 isn't going to help when I have two 1 TB disk and on the slim chance that both go out I am screwed anyway. Also at RAID1 you can't keep your costs down when you are trying to expand that amount of space you have.
The one thing that I can do however is take two RAID 5 Arrays and Mirror them. That would be redundant enough I would hope. Say I have two 4TB arrays in RAID 5. I take the 1st 4TB Array and Mirror it to 2nd Array. This is much smarter then taking 2 1TB Disks and Mirroring them and hoping both disks don't go out at once.
To bring down costs, I can use the same concept and use smaller disks. Example: Take 4 40GB HDD and put in RAID 5 for a total of 1.5TB then I take and build a second array in the same configuration and Mirror the two arrays. This is much better and smarter when it comes to redundancy in my opinion.
When it comes to DB Servers I would take two servers and configure them exactly alike. Say using the 40 GB example above except instead of mirroring them I would setup replication between the two and them put a heartbeat and load balancer in front of them. This way should one server stop working, the service would still be running as the other server would take over.
I am still in the planning stages of this, but I do have plans for contingencies and trying to be utilize maximum uptime. However, while I could backup my customers data to tape and so forth, this creates much more overhead and starting out it would increase my costs and thus would force me to increase pricing for my customers, but this may be a good thing but being only 1 person until I can get the revenue to hire employees I also have to think what is going to be best for my start up.
However if I was to use this for my personal storage system I wouldn't be to concerned about the Raid 5 and not having a mirror if my important stuff is going to be dumped to tap. Although a mirror would be nice as it would fit into a nice D2D2T backup plan.
Last edited by richinsc; 08-28-2008 at 11:01 AM.
Reason: Added Part about DB Servers
Again, don't use RAID5. If any drive goes down, then performance will plummet. The only use for RAID5 is for cheapskate personal users who can't bear to sacrifice 50% of their storage space to either a backup or RAID1.
For practical purposes, a drive failure in RAID5 means significant downtime. Performance is horrible when a drive fails and it gets even worse during the time that you're bringing in a replacement drive.
In contrast, RAID1 performance is not harmed when a drive goes down and performance is not impacted too badly while a replacement is brought up. Note that when a RAID5 drive is replaced, it's necessary to read THE ENTIRE ARRAY in order to build the replacement drive. When a RAID1 drive is replaced, it's only necessary to read one drive--it's mirror.
Personally, I don't use RAID for anything. I greatly prefer using rsync to maintain backups. If the primary server goes down for any reason, then I just switch to a backup server. I generally don't like RAID1 because any problem caused by a changed or deleted or corrupted file instantly gets duplicated on the mirror.
Note that when a RAID5 drive is replaced, it's necessary to read THE ENTIRE ARRAY in order to build the replacement drive.
Then I guess I would use RAID5 only for my personal stuff where it would be alright to have the whole array re-read to build drive back. But once drive is backup performance would resume as normal I assume?
And I doubt RAID 6 would do me any better? So from a hosting standpoint what raid do you use. You say RAID1 but if the Largest disk I can afford is 1TB and I have two of them When i need to expand to 2TB then I have to create another array which kind of seems a bit more of a hassle. Any suggestions would be welcome. If you were going to be a hosting provider for say websites, even database driven one's what would be you suggestion.
With everyone's posts I have now had to really stop and think about how this would best be setup. I also have to think if there is a posibility that I would ever use a full 12TB for 1 server? No, highly unlikly, unless I was google or youtube. So I find myself pondering my configuration to see if I am going to do this, what should I do how would I best utilize space. I don't want to get smaller disks, but most of this idea stemmed from a array chassis that would hold 12 HDD and could be split 3 ways or join all 12 disk as 1. I do need the redundancy and the storage but I want to utilize the 1 TB HDD was well as make full use of the chassis.
So much thinking and planning to do but it's exciting just thinking about it. I have also thought about taking the Array at 12 TB and using as a media server, but undermined at this point.
Last edited by richinsc; 08-28-2008 at 01:35 PM.
Reason: More stuff that I don't want to create a new post for.
Honestly, I don't have the knowledge for how to provide a truly high availability web hosting service. So I'd start by learning what I need to know.
Doing a web search on "apache cluster" brings up a lot of interesting information...
Looking around, it looks like DRBD is the way to go for high availability storage. This is a bit like RAID1, except the individual mirrors are on completely independent file server computers. I trust that a lot more than any RAID which can fail due to a bad mobo or CPU or memory or PSU.
DRBD seems to be designed with LVM in mind, so it's a good idea to learn LVM inside and out also.
For my personal use, all of the above is rather complex and requires a lot of initial effort. My personal solution is very simplistic; I just use rsync to backup from my "live" file server to various backup drives. I accept that if my file server goes down then I've got some downtime before I swap in a backup, and that if a hard drive fails then my last rsync backup may be somewhat out of date.
I haven't had to put this to the test yet--I haven't had a hard drive fail on me in years. Ever since I had the brilliant realization that I had been killing my hard drives by overheating them, I've given them plenty of airflow to keep them cool. Before that, I did have a lot of hard drive failures, but I had always had audible forewarning days in advance. My hard drives failed by starting to make unhappy "tok-tok" seek noises some days before total failure. During that time, I always transfered my data elsewhere in paranoia while hoping in vain that the hard drive would still stay alive anyway. (God, I was stupid.)