Last night one of the developer ran a wrong mySQL statement (Query inserting a row with duplicate primary key) on master machine. Suddenly the replication to its slave stopped. Slave started writing it to mysql_relay_logs and before we could monitor and fix it, the disk ran out of space.
Master is configured to save bin_logs for last 3 days so we still have not completely lost the replication. Can someone help me fix this mess now.
My Master Status:
mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mysql-bin.000017
Position: 30844254
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
Slave Status: mysql
> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.140.110
Master_User: replication_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000014
Read_Master_Log_Pos: 61132382
Relay_Log_File: mysqld-relay-bin.000037
Relay_Log_Pos: 1405976
Relay_Master_Log_File: mysql-bin.000014
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Error 'Duplicate entry '13676-1-2' for key 1' on query. Default database: 'XXX'. Query: 'YYY'
Skip_Counter: 0
Exec_Master_Log_Pos: 1405839
Relay_Log_Space: 8974037551
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
1 row in set (0.00 sec)
Can someone give me exact commands to make everything back to normal? as I said I am sure we have lost data on slave as it ran out of disk but we must be having data on master.
My next question is: How to ensure this doesnt happen again.
Thanks a ton Sparsh Gupta
The straight-forward course of action is to wipe out the data on the slave and start fresh with a new copy of the master. The master can be copied by using mysqldump or through something like rsync. Depending on how large your database is, you may experience downtime. This will always work.
There's a chance that if you clear space off the slave drive you can start replication from where the error occurred by using (from inside mysql):
start slave
If the master's bin.log file is in place, there's a chance the replication will start right up. I haven't had very many experiences where this has happened.
The last time I did this was when a developer added a new table to a database, and didn't create it on the slave first. I created the table and had to use sql_slave_skip_counter to skip over the bits that were lost from the bin.log. I then manually inserted the 20 rows of data I had skipped over.
just stop slave, and skip the sql statement that makes duplicate entry using (http://dev.mysql.com/doc/refman/5.0/en/set-global-sql-slave-skip-counter.html)
if the error is different after you are running the above command, just repeat it again until the slave working. but if this way not working, you must recreate slave from beginning.