I have configured two MySQL servers (MySQL-1
, MySQL-2
) in master-master
replication in the same datacenter using a local backend connection with the following options:
innodb_flush_log_at_trx_commit=1
sync_binlog=1
We use a load balancer to round robin MySQL requests back and forth equally between the two MySQL servers. This works great, but I am concerned about replication lag. For example, if user A inserts a row into MySQL-1, then user A selects from MySQL-2, the data may have not been successfully replicated.
Basically, my question is, how much lag should be expected (milliseconds, seconds)? Are their anymore MySQL options to set to prevent/reduce lag?
This depends on your servers performance which is related to how many queries each server has to process, how big are your tables, and so on. Using such replication solution should be synchronous which will definitely imposes some delay during transactions. This is simply because each transaction should not be considered committed completely unless it is done so on both nodes.
I think a safer option is to balance the requests based on the source IP of the client (if this is supported/possible). In this case, all requests coming from the same client will be forwarded to the same DB server.
I encountered exactly this problem with a master -> many slave setup. It was even worse than your situation because the reads were guaranteed to come from the slave rather than having a 50/50 chance.
Every time a user wrote to the database (such as a forum post or clicking a "like" button) they would get an HTTP redirect to the page that should display their post but it never did. The round trip time of the redirect and the follow up request was shorter than the replication lag.
Watching the lag with
SHOW SLAVE STATUS
showed that it was nearly always under a second. It did get higher than that occasionally, however. Since the replication SQL is single-threaded, a 10-second slow query will cause a 10 second lag in your replication.The solution for us ended up being to modify all our "just posted" pages to always read from the master instead of a slave. In your case, making sure every web server knows which database the previous request just wrote to will be difficult.
A better solution might be to stick recently written data into a memcached instance. Even if your memcached data has a 10 second expiry time, that should be enough to cover for replication lag.
The lag depends primary from:
It is impossible calculate the lag before. I can suggest to make some intensive (for you) tests with concurrency write e read operations.