LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 01-05-2013, 04:52 PM   #1
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Rep: Reputation: 50
DB clustering/replication scheme where each node handles reads & writes?


I'm working with a guy who has a Joomla 2.5 site with a lot of customized code. Thankfully, most of this customized code takes care to use the Joomla database access layer (JDatabase). His site is currently growing rapidly and it's looking like we'll need to eventually move to a replicated/clustered database to handle the traffic.

Having just read the MySQL documentation's Using Replication for Scale-Out page, it would appear that I'm going to need to make some modifications to the code to use the replication scheme described therein for this reason:

(I'm so *glad* for this "opportunity" )

While Joomla 2.5 was smart enough to build in a few methods to distinguish inserts and updates from queries (e.g., insertObject and updateObject), The code I'm dealing with has many instances where developers neglected to use these distinct methods and instead inserted or updated records by manually constructing SQL:
PHP Code:
    $sql "INSERT INTO some_table (col1, col2, col3, col4, col5) VALUES (1, 2, 3, 4, 5)";
$jdbo->setQuery($sql);
$jdbo->execute(); 
So I'm wondering a few things:
Q1: Is there a MySQL replication/clustering scheme where the client doesn't need to route reads to a slave and writes to the master?
I.e., is there some other type of replication/clustering (hopefully free) where I can just leave all those manual SQL inserts/updates/deletes alone and just let Joomla speak to whichever database it is using without worrying about reads vs. writes?

Q2: Is it feasible to reliably sniff a query to determine if it should go to the master?
I.e., might it be possible to modify the execute method of the JDatabase object so that it uses pattern matching or parsing somehow to distinguish which queries should go to the master and which to a slave? While it seems simple enough to check a query to see if the first word is INSERT or UPDATE or DELETE, I expect there might be some INSERT...SELECT queries or even more bizarre selects with inserts and joins or something that might require a more nuanced approach. Any thoughts about an effective pattern-matching scheme would be much appreciated.

Q3: Must there always be a one-to-one relationship between slave db servers and application servers?
The diagram in the page I linked above has one application server for each db slave. I can imagine a scenario where one application server is enough but the database is getting worked to death so it might be helpful to have one application server connect to one of a few slaves. Is that right? If so, would the db slave be chosen at random or can someone recommend a more enlightened scheme (e.g., based on load average or something) which doesn't hamper performance?
 
Old 01-06-2013, 08:31 PM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,261

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Are you sure his site conforms to
Quote:
works best in an environment where you have a high number of reads and low number of writes/updates.
?
You might be better off with a cluster...
 
Old 01-07-2013, 02:57 PM   #3
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Original Poster
Rep: Reputation: 50
Quote:
Originally Posted by chrism01 View Post
Are you sure his site conforms to
?
You might be better off with a cluster...
I'm fairly sure the number of reads far outweights the number of writes -- at least as far as user traffic is concerned, but there is an import script that performs a fairly massive data import at night. This data import is currently handled by a single dual-core machine in about an hour. But thank you for drawing attention to that. I wonder if there is any quantitative threshold where the read/write ratio makes sense? The concept is so maddenginly vague.

What do you mean I might be better off with a cluster? Could you elaborate?
 
Old 01-08-2013, 10:43 AM   #4
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6
Posts: 1,449

Rep: Reputation: 438Reputation: 438Reputation: 438Reputation: 438Reputation: 438
Forgive me if these have already been considered, but there's no specific mention of the current configuration. There may be performance gains without having to go to a master/slave replication scheme such as looking at:
  • If there is a single server running both Apache and MySQL then having MySQL data stored on a physically separate volume from the "static" website content such as the PHP code, graphics etc. Note that this should be a physically separate volume with its own disks rather than a partition on a RAID.
  • If this is already the case then consider using something like SSD drives for MySQL, the speed improvement is worth it.
  • Upgrade the RAM in the server, sometimes even an extra 2Gb can help.
  • If there are a large number of uploaded files / attachments then see if they are sored in the file system or the database. In general storing in the file system will increase performance. If there's a LOT of this kind of static content then consider off-loading that on to a CDN (Content Delivery Network).
  • Split Apache and MySQL, have a separate web server and database server. If going with this option then ensure that your data-center can give you a high-speed connection between the two machines. I've seen data-centers where they have simply provided two machines in separate parts of the data-center and used the "public" NICs to share all the traffic and with a speed limitation (10Mb!) enforced at the switch ports. If going down this route then ensure both machines have a "private" 1Gb/10Gb NIC and appropriate switching/cabling.
  • Check the MySQL engine to ensure that the tables are optimising storage. For example, for a transient tables (session/cache) you might want to try "MEMORY" rather than something to disk. Same with main tables, are they using MyISAM where InnoDB may be more appropriate.
  • Consider if Joomla really is the best platform for the job, take a look at other CMS platforms and see what's on offer. However, if there's a lot of custom software then you may end up fixed on that platform.
If going down the master/slave route then I believe (but could be wrong!) that some load-balancing devices such as Citrix NetScaler can do query analysis "on the fly" and direct writes to a single server and reads to slaves.

Also, you may want to take in to account the fact that normal master/slave replication in MySQL is loosely coupled, so a write to the master database will not necessarily be available to read immediately on all slaves. This will become important if the site is processing anything that is sensitive to the information being accurate to all visitors at all times. Or, alternatively, the website code will have to check that all slaves are consistent before finishing a "transaction".
 
Old 01-11-2013, 04:07 AM   #5
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,261

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Good options above.

What I meant by cluster is that if setup that way, all systems/disk are equally used, so you don't have to worry about read/write ratios.
You should get good throughput.
Note that a cluster doesn't absolve you from doing backups and ideally RAIDing your disks if you want good resiliency.
 
Old 01-11-2013, 04:25 AM   #6
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6
Posts: 1,449

Rep: Reputation: 438Reputation: 438Reputation: 438Reputation: 438Reputation: 438
Quote:
Originally Posted by chrism01 View Post
What I meant by cluster is that if setup that way, all systems/disk are equally used, so you don't have to worry about read/write ratios.
You should get good throughput.
From what I recall, last time I checked the official MySQL cluster solution kept the entire database in MEMORY, shared between machines, just something to be aware of.
Quote:
Originally Posted by chrism01 View Post
Note that a cluster doesn't absolve you from doing backups and ideally RAIDing your disks if you want good resiliency.
Indeed, no level of clustering / RAID should ever be used as an excuse to avoid doing regular backups.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Java File reads and writes manolakis Programming 2 04-08-2009 12:52 PM
Ratio of Reads/Writes on disk erimar77 Linux - Server 2 08-22-2006 11:16 PM
CD R/RW: reads or writes but not both hatha Linux - General 6 04-17-2005 07:12 PM
editor that reads & writes msword doc files leobing Linux - General 3 04-08-2004 11:35 AM
cdburner reads not writes NGraphiX Linux - Hardware 1 02-28-2003 04:29 AM


All times are GMT -5. The time now is 11:21 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration