LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Best way to share files between clustered servers? (https://www.linuxquestions.org/questions/linux-server-73/best-way-to-share-files-between-clustered-servers-596074/)

ircmaxell 10-31-2007 08:41 AM

Best way to share files between clustered servers?
 
I'm going to be setting up a load balanced cluster consisting of between 2 and 4 front end servers, a DB server, and a few other servers. I have everything figured out except keeping the files synced.

I was thinking of putting the web root directories on a NAS share on all the servers, but I fear this will not be fast enough and will not scale.

From what I understand of SAN, it wont work, because all servers will need to be able to write the data...

I don't really want to RSync between the servers, because that creates a PITA situation because I would need to designate one of them primary to all rsync off of, in which case if that one fails it stops syncing
(otherwise it gets really complicated having to rsync between 2 servers for each)...

The other thing I thought of, is mount a NAS share (say as /www_dist), and rsync from that to the physical server itself (say, /www on sda). So if I want to add a server, I don't need to touch any of the others (and there will be no performance hit since the HTTP servers is reading off the local SCSI drive instead of the NAS). (Oh, I would have a hot backup for the NAS server anyway)...


Any input?

this213 10-31-2007 10:12 AM

I think what you want is Heartbeat (AKA Linux HA) and DRBD

jonathanbdcs 10-31-2007 10:19 AM

Clustering file systems...DRBD...
 
I am hoping to learn real, good, straightforward, low-complexity ways to do this which are not horribly expensive! So far, my input has been:

http://en.wikipedia.org/wiki/DRBD

http://www.howtoforge.com/high_avail...drbd_heartbeat

http://sourceforge.net/projects/crablfs

In other words: real, probably very good, not terribly straightforward, and not low-complexity :-)

The closest thing I have found, I think, is probably GFS, under Redhat Enterprise 5 only. It will let you write to the same disk (iSCSI perhaps, or simply SCSI even) from two independent servers. If you look up "Clustering File Systems" in Wikipedia there is more input of this kind.

J.E.B.

ircmaxell 10-31-2007 10:22 AM

Quote:

Originally Posted by this213 (Post 2943533)
I think what you want is Heartbeat (AKA Linux HA) and DRBD

for HA, yes... I'm doing load balancing here as well. I'm actually going to be using HeartBeat and DRBD on the NAS server (and a hot-standby). The front end servers are all fully redundant, so if one fails, it'll automatically switch over to use others. Plus it will distribute all incoming requests properly among all operating servers. I think I am going to go with the mounted NAS share being RSynced to the local drive...

this213 10-31-2007 10:44 AM

Ok wait, I think I see what you're doing here now.

What I would do (and this isn't mine, so to each their own) is set up heartbeat and drbd on my storage systems, much as you've described. But I would then just NFS the drives to the servers rather than rsync them. You already have a 1-1 redundancy on your storage, after all.

If you're dealing with database intensive stuff, and you're databases aren't prohibitively huge, you'd do better to leave this out of the setup and just do Master/Master replication on all of the servers directly. The reason I say this is it'll allow each server to talk to localhost directly for database queries. If you try to network database queries you'll see a huge lag in query response times. For a backup, you might do Master/Slave to your storage device.

Realistically speaking, if you're doing more than 1-1 redundancy (1 live, 1 hot standby), you're not going to see any real benefit until you start implementing geographic redundancy. This is because, generally speaking, if both of your servers go down at the same time for some reason, you probably have bigger issues that are affecting all machines on that network, such as power outage or dropped internet connections (which, if you're setting things up right, would require at least 2 firewalls to go down too). Since you've already got a 1-1 redundancy on your storage, and you'll be setting up 1-1 redundancy on your servers, I don't see where you'd need rsync to come into the picture.

As I said, this isn't mine, and everyone has their own way of doing things, but offhand, I think this is the way I'd go with it (given the current level of information).

ircmaxell 10-31-2007 10:56 AM

Quote:

Originally Posted by this213 (Post 2943569)
But I would then just NFS the drives to the servers rather than rsync them.

The only concern I have with this is will it be fast enough for web load. Meaning, will it be able to support 2000+ HTTP requests per second. That's why I was thinking of rsyncing between the NFS mount and the physical webroot on each webserver. Simple, but powerful...
Quote:

If you're dealing with database intensive stuff, and you're databases aren't prohibitively huge, you'd do better to leave this out of the setup and just do Master/Master replication on all of the servers directly.
Actually, I am planning to use a dedicated MySQL server, doing master/slave rep to the storage server for backup.

Quote:

Realistically speaking, if you're doing more than 1-1 redundancy (1 live, 1 hot standby), you're not going to see any real benefit until you start implementing geographic redundancy.
Well, it's not really more than 1-1 redunancy... Here's the layout

Internet
|
Load Balancer (redundant)
--|-----|-----|
Web1 Web2 Web3
.\_____|_____/
...|...|...|
..DB1..|..NAS1
...\___|___/
.......|
....Backup
Where the backup uses heartbeat to monitor NAS1, and take over nas functions should it go down, and it also monitors DB1, and switches from slave to master should it go down. The DB, NAS and Backup servers will not be connected to the internet at all (will be on a private VPN).

Redundancy here is an afterthought, the primary reason for this setup is to deal with high load. The traffic on the site presently reaches 800req/s, and with the planned additions should generate up to 2,000 req/sec.

this213 10-31-2007 11:21 AM

I would definitely do as I suggested with the database servers or you're going to create a huge bottleneck there. At the very least, set up another DB server to share the load.

Rsync will work, but you'll find DRBD to be faster with replication. With that level of load, I wouldn't use NFS and I'd also be converting data to static pages wherever possible.

All of the back end servers should be on a "real" private network, preferably fiber or at least gigabit. Otherwise you'll flood your own network with backup traffic.

ircmaxell 10-31-2007 11:28 AM

Quote:

Originally Posted by this213 (Post 2943620)
I would definitely do as I suggested with the database servers or you're going to create a huge bottleneck there. At the very least, set up another DB server to share the load.

I've thought about this, but right now, mysql isn't really hit all that much (I have database query caching in effect). The reason I don't really want to have multiple MySQL servers, is because of the headaches of restarting a failed server. I can get master-master-master setup fine, but if one of them goes down, it's a nightmare to bring back up. One thing I considered was using MySQL cluster on each of the front end machines... It'll use more RAM, but at least it'll be redundant...

Quote:

Rsync will work, but you'll find DRBD to be faster with replication. With that level of load, I wouldn't use NFS and I'd also be converting data to static pages wherever possible.
Does DRBD handle multiple (3 or 4) primary servers? That's why I was going to use NFS only as a median between servers, the servers themselves will use local copies...
Quote:

All of the back end servers should be on a "real" private network, preferably fiber or at least gigabit. Otherwise you'll flood your own network with backup traffic.
I am looking into using a 24 port GigE managed switch. I'll setup a VPN for the backend stuff, and use the other ports for the front end traffic...


All times are GMT -5. The time now is 12:11 PM.