I have a three node Percona XtraDB Cluster (5.5) setup.
Every night, we shut down MySQL on one randomly selected node in order to take backups of the data directory.
When our traffic is reasonably busy, this causes a couple (2-4) error alerts along the lines of SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '140577' for key 'PRIMARY'
. Primary key conflict, obviously, except we're using auto_increment
columns as the primary key on these tables. As each node has an offset assigned by the cluster engine, this shouldn't be occurring.
My suspicion is that taking the node out of the cluster causes the other two nodes to change their auto increment offsets, in a way that's causing them to potentially conflict while the change takes place. I'm at a loss as to why this wouldn't be an atomic action as well as how I might fix it.
Has anyone encountered this? Is there a way to temporarily freeze the auto increment settings in the cluster so they don't shuffle around during the backup process or some other solution I'm not thinking of?
From top of my head I'd say add slave to the mix and have backups done on slave without removing node from cluster.
What happens is when you put node back in it does not know how the numbers of primary key advanced on active nodes before it manages to get a new write because it will not be cooling down until it catches up with other two replicas. So it writes new record and gives it a primary key ID which already exists on two replicas that were not out. In essence node should not receive writes until it's in sync.
I know it's an old post and you may already have done it but to someone that sees this behaviour with PXC (Percona XtraDB Cluster) it seems you hit the bug https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1366997
Update your PXC to the latest version you're currently in. For instance if you are on 5.5.19 to the latest 5.5 update. If you're on 5.6.x then upgrade to the most recent update.
Had this problem on production.