There has got to be sysadmins who deploy high availability clusters serving Apache out there.
I can do this, I even have a very expensive load balancer, however the caveat is real-time (not batched like rysnc or unison) file replication. I have been down various roads. I have demo-ed commercial versions (spent alot of time and toil doing so.):
PeerFS: Great concept. Uses proprietary filesystem and kernel module to make network drive.
CONS: It was literally an oxymoron. I almost got fired because our cluster was down so often. When one server would go (for no apparent reason) the other's i/o would freeze up. That bug was dealt with, however, after that, when you brought a failed server up (oh yes, inexplicable crashes) it does this initial datacheck which effectively brings the working peer's i/o to a screeching halt from 1 minute to 3 minutes (I had 50GB to sync). What I really didn't like was when I added a remote peer (I was about to bring it into a GSLB cluster together with the primary peers.) When the peers on my primary production site went down (inexplicably. The developers constantly assured me this was not *normal* behavior. *maybe commercial developers hate Debian...I dunno*), the remote peer became the data initiator (kahuna.) This constituted a data consistency check over remote networks. This meant that my primary peers mounted their peerfs drives in a "remote" state. In other words, all my files were being pulled from the remote peer to serve Apache requests during the 20-30 hours this consistency check took to complete. I could do nothing. The servers had already been down for an unacceptable amount of time. The end result, SUPER slow website performance for almost a day! This inconvenience (final straw) prompted me to consider another vendor: Constant Replicator.
Constant Replicator or CR doesn't use a proprietary filesystem (which seems to make it much more stable.) Getting it to work on Debian was a days worth of tech support, but I was able to get it done. Then I asked for a quote. Let's just say that I could buy 4 more servers for the price to put CR on 3 servers. It was roughly about 20-25 times more expensive than PeerFS costs for Windows servers (which makes my bosses think that *NIX is inconvenient as we are moving closer and closer to a total Microsoft shop all the time.)
Does anyone out there have some experience setting up real-time file replication (Debian experience preferred) for HA clusters. I know there are hardware solutions, but they are just a little more expensive than CR
and that's too much money. Plus the hardware solutions represent a single point of failure scenario to me also (that and the cost would never be feasible for my bosses.) My bosses want to turn my same-lan secondary peer and remote peer into rainy day paper-weights (hot standby.)
Another question, why hasn't the open source community embraced this? I know file replication can be a barrel of worms and all, but I'm surprised there aren't more midsized companies (like us) with this business requirement. DRDB was the closest project I found. However, secondary peers could not be mounted r/w or even read (making load balancing useless.)
I'll appreciate any help. I'd even consider paying consultation fees.
Thanks!!!