One of my MySQL slaves will at one moment, report 57 seconds behind master, and the next will show 0. I am also monitoring with mk-heartbeat which shows an average of less than 1 second. MySQL and system dates are correct. How exactly does MySQL calculate slave lag and what could possibly be causing this reporting error?
To be clear, running show slave status
will report 57 seconds and running show slave status
again (within 1 second) will show 0. This continues flopping indefinitely until the slave thread is restarted. Typically, the server would take at least 10 seconds to recover from a one minute lag.
It's not a reporting error, that's just how MySQL replication works. It transfers over the query log, then runs the queries. If one of the queries take a long time to run, then all the other queries clog up until that one completes. That's why you see the spikes. The fact that mk-heartbeat shows a low average just means that it's not a general overload problem, just a few big queries (or, less likely, an occasional monster load spike on the slave).
Instantaneous "seconds behind master" is a fairly useless figure (except when you want to know Right Now how far behind you are). The mk-heartbeat stats are far better for getting an idea of how overloaded your replication is -- anything higher than about 2-3 seconds on average over a dozen or so pings and you're pooched.