LinuxQuestions.org - DB clustering/replication scheme where each node handles reads & writes?

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - DB clustering/replication scheme where each node handles reads & writes? (https://www.linuxquestions.org/questions/linux-server-73/db-clustering-replication-scheme-where-each-node-handles-reads-and-writes-4175444301/)

DB clustering/replication scheme where each node handles reads & writes?

I'm working with a guy who has a Joomla 2.5 site with a lot of customized code. Thankfully, most of this customized code takes care to use the Joomla database access layer (JDatabase). His site is currently growing rapidly and it's looking like we'll need to eventually move to a replicated/clustered database to handle the traffic.

Having just read the MySQL documentation's Using Replication for Scale-Out page, it would appear that I'm going to need to make some modifications to the code to use the replication scheme described therein for this reason:

(I'm so *glad* for this "opportunity" :p)

While Joomla 2.5 was smart enough to build in a few methods to distinguish inserts and updates from queries (e.g., insertObject and updateObject), The code I'm dealing with has many instances where developers neglected to use these distinct methods and instead inserted or updated records by manually constructing SQL:

PHP Code:




    $sql = "INSERT INTO some_table (col1, col2, col3, col4, col5) VALUES (1, 2, 3, 4, 5)"; 
$jdbo->setQuery($sql); 
$jdbo->execute();

So I'm wondering a few things:
Q1: Is there a MySQL replication/clustering scheme where the client doesn't need to route reads to a slave and writes to the master?
I.e., is there some other type of replication/clustering (hopefully free) where I can just leave all those manual SQL inserts/updates/deletes alone and just let Joomla speak to whichever database it is using without worrying about reads vs. writes?

Q2: Is it feasible to reliably sniff a query to determine if it should go to the master?
I.e., might it be possible to modify the execute method of the JDatabase object so that it uses pattern matching or parsing somehow to distinguish which queries should go to the master and which to a slave? While it seems simple enough to check a query to see if the first word is INSERT or UPDATE or DELETE, I expect there might be some INSERT...SELECT queries or even more bizarre selects with inserts and joins or something that might require a more nuanced approach. Any thoughts about an effective pattern-matching scheme would be much appreciated.

Q3: Must there always be a one-to-one relationship between slave db servers and application servers?
The diagram in the page I linked above has one application server for each db slave. I can imagine a scenario where one application server is enough but the database is getting worked to death so it might be helpful to have one application server connect to one of a few slaves. Is that right? If so, would the db slave be chosen at random or can someone recommend a more enlightened scheme (e.g., based on load average or something) which doesn't hamper performance?

Are you sure his site conforms to

Quote:

works best in an environment where you have a high number of reads and low number of writes/updates.

?
You might be better off with a cluster...

Quote:

Originally Posted by chrism01 (Post 4864242)

Are you sure his site conforms to
?
You might be better off with a cluster...

I'm fairly sure the number of reads far outweights the number of writes -- at least as far as user traffic is concerned, but there is an import script that performs a fairly massive data import at night. This data import is currently handled by a single dual-core machine in about an hour. But thank you for drawing attention to that. I wonder if there is any quantitative threshold where the read/write ratio makes sense? The concept is so maddenginly vague.

What do you mean I might be better off with a cluster? Could you elaborate?

Forgive me if these have already been considered, but there's no specific mention of the current configuration. There may be performance gains without having to go to a master/slave replication scheme such as looking at:

If there is a single server running both Apache and MySQL then having MySQL data stored on a physically separate volume from the "static" website content such as the PHP code, graphics etc. Note that this should be a physically separate volume with its own disks rather than a partition on a RAID.
If this is already the case then consider using something like SSD drives for MySQL, the speed improvement is worth it.
Upgrade the RAM in the server, sometimes even an extra 2Gb can help.
If there are a large number of uploaded files / attachments then see if they are sored in the file system or the database. In general storing in the file system will increase performance. If there's a LOT of this kind of static content then consider off-loading that on to a CDN (Content Delivery Network).
Split Apache and MySQL, have a separate web server and database server. If going with this option then ensure that your data-center can give you a high-speed connection between the two machines. I've seen data-centers where they have simply provided two machines in separate parts of the data-center and used the "public" NICs to share all the traffic and with a speed limitation (10Mb!) enforced at the switch ports. If going down this route then ensure both machines have a "private" 1Gb/10Gb NIC and appropriate switching/cabling.
Check the MySQL engine to ensure that the tables are optimising storage. For example, for a transient tables (session/cache) you might want to try "MEMORY" rather than something to disk. Same with main tables, are they using MyISAM where InnoDB may be more appropriate.
Consider if Joomla really is the best platform for the job, take a look at other CMS platforms and see what's on offer. However, if there's a lot of custom software then you may end up fixed on that platform.

If going down the master/slave route then I believe (but could be wrong!) that some load-balancing devices such as Citrix NetScaler can do query analysis "on the fly" and direct writes to a single server and reads to slaves.

Also, you may want to take in to account the fact that normal master/slave replication in MySQL is loosely coupled, so a write to the master database will not necessarily be available to read immediately on all slaves. This will become important if the site is processing anything that is sensitive to the information being accurate to all visitors at all times. Or, alternatively, the website code will have to check that all slaves are consistent before finishing a "transaction".

Good options above.

What I meant by cluster is that if setup that way, all systems/disk are equally used, so you don't have to worry about read/write ratios.
You should get good throughput.
Note that a cluster doesn't absolve you from doing backups and ideally RAIDing your disks if you want good resiliency.

Quote:

Originally Posted by chrism01 (Post 4867624)

What I meant by cluster is that if setup that way, all systems/disk are equally used, so you don't have to worry about read/write ratios.
You should get good throughput.

From what I recall, last time I checked the official MySQL cluster solution kept the entire database in MEMORY, shared between machines, just something to be aware of.

Quote:

Originally Posted by chrism01 (Post 4867624)

Note that a cluster doesn't absolve you from doing backups and ideally RAIDing your disks if you want good resiliency.

Indeed, no level of clustering / RAID should ever be used as an excuse to avoid doing regular backups.