I'm working with a guy who has a Joomla 2.5 site with a lot of customized code. Thankfully, most of this customized code takes care to use the Joomla database access layer (JDatabase
). His site is currently growing rapidly and it's looking like we'll need to eventually move to a replicated/clustered database to handle the traffic.
Having just read the MySQL documentation's Using Replication for Scale-Out page
, it would appear that I'm going to need to make some modifications to the code to use the replication scheme described therein for this reason:
(I'm so *glad* for this "opportunity"
While Joomla 2.5 was smart enough to build in a few methods to distinguish inserts and updates from queries (e.g., insertObject
), The code I'm dealing with has many instances where developers neglected to use these distinct methods and instead inserted or updated records by manually constructing SQL:
$sql = "INSERT INTO some_table (col1, col2, col3, col4, col5) VALUES (1, 2, 3, 4, 5)";
So I'm wondering a few things:
Q1: Is there a MySQL replication/clustering scheme where the client doesn't need to route reads to a slave and writes to the master?
I.e., is there some other type of replication/clustering (hopefully free) where I can just leave all those manual SQL inserts/updates/deletes alone and just let Joomla speak to whichever database it is using without worrying about reads vs. writes?
Q2: Is it feasible to reliably sniff a query to determine if it should go to the master?
I.e., might it be possible to modify the execute
method of the JDatabase object so that it uses pattern matching or parsing somehow to distinguish which queries should go to the master and which to a slave? While it seems simple enough to check a query to see if the first word is INSERT or UPDATE or DELETE, I expect there might be some INSERT...SELECT queries or even more bizarre selects with inserts and joins or something that might require a more nuanced approach. Any thoughts about an effective pattern-matching scheme would be much appreciated.
Q3: Must there always be a one-to-one relationship between slave db servers and application servers?
The diagram in the page
I linked above has one application server for each db slave. I can imagine a scenario where one application server is enough but the database is getting worked to death so it might be helpful to have one application server connect to one of a few slaves. Is that right? If so, would the db slave be chosen at random or can someone recommend a more enlightened scheme (e.g., based on load average or something) which doesn't hamper performance?