Linux real-time file replication woes

bretticus · 10-20-2005, 01:12 PM

There has got to be sysadmins who deploy high availability clusters serving Apache out there.

I can do this, I even have a very expensive load balancer, however the caveat is real-time (not batched like rysnc or unison) file replication. I have been down various roads. I have demo-ed commercial versions (spent alot of time and toil doing so.):

PeerFS: Great concept. Uses proprietary filesystem and kernel module to make network drive.

CONS: It was literally an oxymoron. I almost got fired because our cluster was down so often. When one server would go (for no apparent reason) the other's i/o would freeze up. That bug was dealt with, however, after that, when you brought a failed server up (oh yes, inexplicable crashes) it does this initial datacheck which effectively brings the working peer's i/o to a screeching halt from 1 minute to 3 minutes (I had 50GB to sync). What I really didn't like was when I added a remote peer (I was about to bring it into a GSLB cluster together with the primary peers.) When the peers on my primary production site went down (inexplicably. The developers constantly assured me this was not *normal* behavior. *maybe commercial developers hate Debian...I dunno*), the remote peer became the data initiator (kahuna.) This constituted a data consistency check over remote networks. This meant that my primary peers mounted their peerfs drives in a "remote" state. In other words, all my files were being pulled from the remote peer to serve Apache requests during the 20-30 hours this consistency check took to complete. I could do nothing. The servers had already been down for an unacceptable amount of time. The end result, SUPER slow website performance for almost a day! This inconvenience (final straw) prompted me to consider another vendor: Constant Replicator.

Constant Replicator or CR doesn't use a proprietary filesystem (which seems to make it much more stable.) Getting it to work on Debian was a days worth of tech support, but I was able to get it done. Then I asked for a quote. Let's just say that I could buy 4 more servers for the price to put CR on 3 servers. It was roughly about 20-25 times more expensive than PeerFS costs for Windows servers (which makes my bosses think that *NIX is inconvenient as we are moving closer and closer to a total Microsoft shop all the time.)

Does anyone out there have some experience setting up real-time file replication (Debian experience preferred) for HA clusters. I know there are hardware solutions, but they are just a little more expensive than CR

and that's too much money. Plus the hardware solutions represent a single point of failure scenario to me also (that and the cost would never be feasible for my bosses.) My bosses want to turn my same-lan secondary peer and remote peer into rainy day paper-weights (hot standby.)

Another question, why hasn't the open source community embraced this? I know file replication can be a barrel of worms and all, but I'm surprised there aren't more midsized companies (like us) with this business requirement. DRDB was the closest project I found. However, secondary peers could not be mounted r/w or even read (making load balancing useless.)

I'll appreciate any help. I'd even consider paying consultation fees.

Thanks!!!

enigmasoldier · 10-22-2005, 02:20 PM

Can we say RAID 1 Mirroring? You can do software RAID in linux that works perfectly.

bretticus · 10-22-2005, 02:33 PM

Ummm...software RAID 1 over ethernet???

micha137 · 01-12-2006, 04:54 AM

When you use RAID mirroring over ethernet you have the same problem as with drbd - you cannot mount the mirror on other machines, since their kernel will likely get confused.

mhiggins · 01-17-2006, 08:44 PM

What about AFS

stress_junkie · 01-17-2006, 09:44 PM

This works well on Solaris. I know one enterprise that was back in business soon after a total server room loss because they kept a remote server operation using Veritas volume replication. This product works with Red Hat Enterprise Server and with SuSE Linux Enterprise Server.

http://veritas.com/Products/www?c=systemreq&refId=3