We have a MySQL 5.0.77
Master-Slave replication. The replication was not properly running for the past few weeks and it was giving Duplicate entry error 1062
. The Set Global Skip-counter
option didn't help, so I had to skip the error no.1062
by adding it to the /etc/my.cnf
file and then it reported table doesn't exist error in one particular database.
I have then taken a mysqldump
of that database and restored in Slave last weekend. Then the Slave IO_Thread
and Slave_SQL
both started running fine, and it looked like the replication was back on track. The Seconds_behind_master
value was very high and then it started reducing for the past 4 days.
When I checked the slave replication status today, I found that the seconds_behind_master
value is keep on increasing since morning. I stopped the slave IO_Thread
once and then the seconds_behind_master became Null. Then after I started the IO_thread
the value became the same and kept on increasing.
I see one process is running from morning
system user gss-app Connect 9535736 copy to tmp table ALTER TABLE
queue_clicksADD INDEX(
puid)
Please help me to fix this issue. Thanks.
#mysql> show slave status\G;
`*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 203.x.x.x
Master_User: replication
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000990
Read_Master_Log_Pos: 185674180
Relay_Log_File: mysqld-relay-bin.000224
Relay_Log_Pos: 9286354
Relay_Master_Log_File: mysql-bin.000774
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 472142385
Relay_Log_Space: 112995681998
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 9533355
1 row in set (0.00 sec)`
I won't worry about it if the IO and SQL are running, as well as the
Relay_Master_Log_File
is catching up with theMaster_Log_File
. I believe the delay is in the fact that your total relay log file is huge, approx. 105G -Relay_Log_Space: 112995681998
and considering that the Slave is at 000774 position and master is at 000990 position, there are a total of 214 binary logs of approximately 468M each (105G/214) waiting to be replayed on the slave.My advice is to keep an eye on the
Relay_Master_Log_File
and make sure that is is going up and catching up with theMaster_Log_File
. I also see that the master host is in a public IP address, is this replication taking place over a public network or slow WAN? That could be introducing a delay, that faster the link the better.When you stop the IO_Thread and SQL_Thread catches up with the available relay log data, Seconds_Behind_Master will turn to NULL and hence does not reflect the right delay under these circumstances. You can find the right ways to interpret the values of Seconds_Behind_Master under different : https://scalegrid.io/blog/mysql-tutorial-understanding-the-seconds-behind-master-value/
I've experienced the exact same issue before.
Please make the following changes to rectify "Seconds_Behind_Master" increasing once slave has started.
Firstly check the "trx" value on slave server
if the value of "innodb_flush_log_at_trx_commit" is "1" as below
Variable_name | Value |
innodb_api_trx_level | 0 |
innodb_flush_log_at_trx_commit | 1 |
innodb_use_global_flush_log_at_trx_commit | ON |
then change this value to "2"
Once you've changed this in your mysql client also change this in your my.cnf file to ensure that once mysql restarts this value persists.
search for "innodb_flush_log_at_trx_commit=1"
and change its value to "2" as well.
This will resolve your "Seconds_Behind_Master" issue.