We had a recent election in our replica set (2 read nodes; 1 write node) that changed the primary node. Curious as to why this occurred, I started looking through the logs to find out what happened.
It appears that mongoNode2 could not communicate with mongoNode3. When both nodes could not communicate, it appears that this caused the services on mongoNode2 and mongoNode3 to restart, eventually resulting in a new primary after the services had been started again.
Thu Jun 23 08:27:28 [ReplSetHealthPollTask] DBClientCursor::init call() failed
Thu Jun 23 08:27:28 [ReplSetHealthPollTask] replSet info mongoNode3:27017 is down (or \
slow to respond): DBClientBase::findOne: transport error: mongoNode3:27017 query: { \
replSetHeartbeat: "myReplSet", v: 3, pv: 1, checkEmpty: false, from: \
"mongoNode2:27017" }
Thu Jun 23 08:27:29 got kill or ctrl c or hup signal 15 (Terminated), will \
terminate after current cmd ends
Thu Jun 23 08:27:29 [interruptThread] now exiting
Thu Jun 23 08:27:29 dbexit:
Is there any reason that the mongo service would restart due to a DBClientCursor::init call() failure? Is this a known bug?
It should be noted that mongoNode2 and mongoNode3 are VMs on the same VMware host. MongoNode1 is not on the same host, and it did not have any issues with the service. However, I did not have any other reports of issues with other VMs on the VMware host.
Yes. In client/dbclient.cpp there's a
uassert()
call which is what likely resulted in the process restart. The root cause of that was the transport error that occurred while the replSet code was checking for heartbeats (the assertion is called infindN()
).The code here seems to differ greatly between the code I checked out in March and what's currently in Github [1] so include your version information when you file the bug report in JIRA [2].
There are a couple of similar reports in MongoDB's bug tracking system but it doesn't look like anyone else has followed through with providing enough information.
[1] https://github.com/mongodb/mongo/blob/master/client/dbclient.cpp
[2] https://jira.mongodb.org/secure/Dashboard.jspa