Ping a Specific Port

Question

B14D3

Asked: 2013-09-28 07:56:41 +0800 CST2013-09-28 07:56:41 +0800 CST 2013-09-28 07:56:41 +0800 CST

Large lag on mysql replication (Relay_Log_Pos and Exec_Master_Log_Pos does not increase)

772

Today my two slave's (one mysql 5.1 and second MariaDB 5.5, master is mysql 5.1) started lagging. Similar situation are quite often with lags rises to even 10000 seconds, because slaves have worse hardware configuration then master but now I'm quite stressed. Lags on both server are still rising and at this point it reches 25K seconds behind master. So I started investigating what is going wrong. Getting through mysql logs on master and slave gived me nothing. Servers are on Centos 5 and Mariadb is on Centos 6.

This is output from MariaDB slave status:

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: masterserevr
                  Master_User: slaveuser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.006778
          Read_Master_Log_Pos: 401041447
               Relay_Log_File: relay-bin.020343
                Relay_Log_Pos: 14867924
        Relay_Master_Log_File: mysqld-bin.006777
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: ses,phar
           Replicate_Do_Table: 
       Replicate_Ignore_Table: portal.aaa_jm_tmp,portal.newsletter
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 14867639
              Relay_Log_Space: 1474785535
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 26484
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
1 row in set (0.00 sec)

From few outputs I noticed that Relay_Log_Pos and Exec_Master_Log_Pos does not increase. I tried to restart slave processes but that changed nothing and lags still increase. Next step was to see on what query replication has stopped.

Using mysqlbinlog

mysqlbinlog relay-bin.020343 > /root/RelayLogQueries1.txt

In RelayLogQueries1.txt I founded position 14867924:

# at 14867924
#130927 10:03:21 server id 1  end_log_pos 14867709      Query   thread_id=160780134     exec_time=3     error_code=0
SET TIMESTAMP=1380269001/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=9/*!*/;
BEGIN
/*!*/;
# at 14867994
# at 14868101
# at 14868669
# at 14869417
# at 14869873
# at 14870663
# at 14871697
# at 14872055
# at 14872845
# at 14873747
# at 14874591
# at 14875387
# at 14876265
# at 14877039
# at 14877985
# at 14878299
# at 14879091
# at 14879853
# at 14880255
# at 14881029
.
.
.
# at 117398235
# at 117399219
# at 117400203
# at 117401191
# at 117402179
# at 117403167
# at 117403969
# at 117404957
# at 117405945
# at 117406933
# at 117407921
# at 117408909
# at 117409897
# at 117410885
# at 117411873
# at 117412861
# at 117413849
# at 117414837
# at 117415785
# at 117416797
# at 117417839
# at 117418595
# at 117419585
#130927 10:03:21 server id 1  end_log_pos 14867816      Table_map: `test`.`pac_list` mapped to number 216570427
#130927 10:03:21 server id 1  end_log_pos 14868384      Update_rows: table id 216570427
#130927 10:03:21 server id 1  end_log_pos 14869132      Update_rows: table id 216570427
#130927 10:03:21 server id 1  end_log_pos 14869588      Update_rows: table id 216570427
#130927 10:03:21 server id 1  end_log_pos 14870378      Update_rows: table id 216570427
#130927 10:03:21 server id 1  end_log_pos 14871412      Update_rows: table id 216570427
#130927 10:03:21 server id 1  end_log_pos 14871770      Update_rows: table id 216570427
#130927 10:03:21 server id 1  end_log_pos 14872560      Update_rows: table id 216570427
#130927 10:03:21 server id 1  end_log_pos 14873462      Update_rows: table id 216570427
.
.
.

Now I'm confused because first I have no idea how to interpret this log (is it ok or wrong), and second don't know how to fix this.

Sometimes when I get some replication errors this trick was helpfull :

SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

But now I have no errors and both IO and SQL slave proceses are running.

Could setting SQL_SLAVE_SKIP_COUNTER=1 bring back replication on??

What can I do to diagnose more this problem and fix it without setting replica from scratch (that last one scenario I want avoid )

EDIT: The lag startet when one of developers accidentaly copied one of tables pac_list (200MB with 600000 records) and he copy named it test.pac_list (it has dot in table name) he want to create copy in database test but he did something wrong and createt table test.pac_list in the same database the orginal table is. After he find out his mistake he droped table test.pac_list and created tables pac_list in new database. Could this be reason of such big lag?

2 Answers

Voted

nandoP · Answer 1 · 2013-09-28T09:13:23+08:00

nandoP

2013-09-28T09:13:23+08:002013-09-28T09:13:23+08:00

do a show full processlist to see what query is hanging replication.

also, i see there is a begin:

14867924
SET TIMESTAMP=1380269001/!/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=9/!/;
BEGIN

so it may be that replication is locked until END?

http://dev.mysql.com/doc/refman/5.0/en/begin-end.html

0

Gordan Bobić · Answer 2 · 2020-04-14T15:29:13+08:00

Gordan Bobić

2020-04-14T15:29:13+08:002020-04-14T15:29:13+08:00

You have a huge transaction that the slaves are chewing through. You will need to wait for it to make it's way through. Don't skip the transaction or you may have to re-initialise the slaves.

There are various things you can do to speed up slaves if you are willing to accept some risk of having to re-initialise a slave if it suffers an OS crash / power failure.

You should also be rather concerned about how far beyond EOL on 5.1 you are.

0

Large lag on mysql replication (Relay_Log_Pos and Exec_Master_Log_Pos does not increase)

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?