Best way to share files between clustered servers?
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Best way to share files between clustered servers?
I'm going to be setting up a load balanced cluster consisting of between 2 and 4 front end servers, a DB server, and a few other servers. I have everything figured out except keeping the files synced.
I was thinking of putting the web root directories on a NAS share on all the servers, but I fear this will not be fast enough and will not scale.
From what I understand of SAN, it wont work, because all servers will need to be able to write the data...
I don't really want to RSync between the servers, because that creates a PITA situation because I would need to designate one of them primary to all rsync off of, in which case if that one fails it stops syncing
(otherwise it gets really complicated having to rsync between 2 servers for each)...
The other thing I thought of, is mount a NAS share (say as /www_dist), and rsync from that to the physical server itself (say, /www on sda). So if I want to add a server, I don't need to touch any of the others (and there will be no performance hit since the HTTP servers is reading off the local SCSI drive instead of the NAS). (Oh, I would have a hot backup for the NAS server anyway)...
In other words: real, probably very good, not terribly straightforward, and not low-complexity :-)
The closest thing I have found, I think, is probably GFS, under Redhat Enterprise 5 only. It will let you write to the same disk (iSCSI perhaps, or simply SCSI even) from two independent servers. If you look up "Clustering File Systems" in Wikipedia there is more input of this kind.
I think what you want is Heartbeat (AKA Linux HA) and DRBD
for HA, yes... I'm doing load balancing here as well. I'm actually going to be using HeartBeat and DRBD on the NAS server (and a hot-standby). The front end servers are all fully redundant, so if one fails, it'll automatically switch over to use others. Plus it will distribute all incoming requests properly among all operating servers. I think I am going to go with the mounted NAS share being RSynced to the local drive...
Ok wait, I think I see what you're doing here now.
What I would do (and this isn't mine, so to each their own) is set up heartbeat and drbd on my storage systems, much as you've described. But I would then just NFS the drives to the servers rather than rsync them. You already have a 1-1 redundancy on your storage, after all.
If you're dealing with database intensive stuff, and you're databases aren't prohibitively huge, you'd do better to leave this out of the setup and just do Master/Master replication on all of the servers directly. The reason I say this is it'll allow each server to talk to localhost directly for database queries. If you try to network database queries you'll see a huge lag in query response times. For a backup, you might do Master/Slave to your storage device.
Realistically speaking, if you're doing more than 1-1 redundancy (1 live, 1 hot standby), you're not going to see any real benefit until you start implementing geographic redundancy. This is because, generally speaking, if both of your servers go down at the same time for some reason, you probably have bigger issues that are affecting all machines on that network, such as power outage or dropped internet connections (which, if you're setting things up right, would require at least 2 firewalls to go down too). Since you've already got a 1-1 redundancy on your storage, and you'll be setting up 1-1 redundancy on your servers, I don't see where you'd need rsync to come into the picture.
As I said, this isn't mine, and everyone has their own way of doing things, but offhand, I think this is the way I'd go with it (given the current level of information).
But I would then just NFS the drives to the servers rather than rsync them.
The only concern I have with this is will it be fast enough for web load. Meaning, will it be able to support 2000+ HTTP requests per second. That's why I was thinking of rsyncing between the NFS mount and the physical webroot on each webserver. Simple, but powerful...
Quote:
If you're dealing with database intensive stuff, and you're databases aren't prohibitively huge, you'd do better to leave this out of the setup and just do Master/Master replication on all of the servers directly.
Actually, I am planning to use a dedicated MySQL server, doing master/slave rep to the storage server for backup.
Quote:
Realistically speaking, if you're doing more than 1-1 redundancy (1 live, 1 hot standby), you're not going to see any real benefit until you start implementing geographic redundancy.
Well, it's not really more than 1-1 redunancy... Here's the layout
Internet
|
Load Balancer (redundant)
--|-----|-----|
Web1 Web2 Web3
.\_____|_____/
...|...|...|
..DB1..|..NAS1
...\___|___/
.......|
....Backup
Where the backup uses heartbeat to monitor NAS1, and take over nas functions should it go down, and it also monitors DB1, and switches from slave to master should it go down. The DB, NAS and Backup servers will not be connected to the internet at all (will be on a private VPN).
Redundancy here is an afterthought, the primary reason for this setup is to deal with high load. The traffic on the site presently reaches 800req/s, and with the planned additions should generate up to 2,000 req/sec.
Last edited by ircmaxell; 10-31-2007 at 10:59 AM.
Reason: Image Correction
I would definitely do as I suggested with the database servers or you're going to create a huge bottleneck there. At the very least, set up another DB server to share the load.
Rsync will work, but you'll find DRBD to be faster with replication. With that level of load, I wouldn't use NFS and I'd also be converting data to static pages wherever possible.
All of the back end servers should be on a "real" private network, preferably fiber or at least gigabit. Otherwise you'll flood your own network with backup traffic.
I would definitely do as I suggested with the database servers or you're going to create a huge bottleneck there. At the very least, set up another DB server to share the load.
I've thought about this, but right now, mysql isn't really hit all that much (I have database query caching in effect). The reason I don't really want to have multiple MySQL servers, is because of the headaches of restarting a failed server. I can get master-master-master setup fine, but if one of them goes down, it's a nightmare to bring back up. One thing I considered was using MySQL cluster on each of the front end machines... It'll use more RAM, but at least it'll be redundant...
Quote:
Rsync will work, but you'll find DRBD to be faster with replication. With that level of load, I wouldn't use NFS and I'd also be converting data to static pages wherever possible.
Does DRBD handle multiple (3 or 4) primary servers? That's why I was going to use NFS only as a median between servers, the servers themselves will use local copies...
Quote:
All of the back end servers should be on a "real" private network, preferably fiber or at least gigabit. Otherwise you'll flood your own network with backup traffic.
I am looking into using a 24 port GigE managed switch. I'll setup a VPN for the backend stuff, and use the other ports for the front end traffic...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.