The title is a bit confusing, but I can't think of a better one.
What I have is a simple vanilla MySQL replication, with the slave occasionally failing, with this error: Error 'Can't find record in 'my_tbl'' on query. Default database: 'my_db'. Query: 'UPDATE my_tbl SET ... WHERE ...'
(columns omitted for clarity).
What I'm assuming this error means, is that the slave sql thread executed this update, and received 0 rows affected
. This was not what it expected when comparing the result of 1 rows affected
from the relay log, thus generating an error.
When running this same update transaction manually, it works. Same thing when running START SLAVE
- it just starts working, and goes back to normal.
This doesn't make sense to me at all - if all it takes is a "retry" to fix this, how could this happen in the first place? Everything is executed in a serialized fashion, and nothing else is writing to the slave mysql server.
Can someone provide an explanation?
Some technicalities - this is a mixed replication setup from 5.5.7-rc to 5.5.12.
There is a filed MySQL bug #60091 regarding the replication of InnoDB tables that may meet your conditions - take a look at it, check if your version is affected and update it eventually to check if it helps matters.
Another explanation for this would be out-of-order execution - when the
UPDATE my_tbl SET ... WHERE ...
is run, the WHERE condition can not yet be met by any row since it has still to happen. I can't think of a reason for that though - this would be something to ask about on MySQL mailing lists.I've discovered the reason behind this problem - an event which was running on the master and on the slave as well. The solution is simple -
alter event event_name disable on slave;
Something to keep in mind when creating a slave withmysqldump
.