Currently I've got server A and server B running with the following mongod instances:
Server A
- mongod server (usually PRIMARY)
- mongod arbiter
Server B
- mongod server (usually SECONDARY)
When server A goes down, server B fails to elect itself as primary to take everything over. As such, my entire application goes offline as the database is unavailable.
My question is, without increasing the amount of physical servers, how can I make sure that server B takes over properly when server A goes down?
Would the following be a good idea?
Server A
- mongod server 1A
- mongod server 2A
- mongod arbiterA
Server B
- mongod server 1B
- mongod server 2B
- mongod arbiterB
Where I have not added an arbiter to B because that would make the total number of servers even. The question is: is this the most efficient way to let server B take over when server A powers off? Or can I remove some servers to save RAM/CPU/HDD?
It's not really effective to have arbiters on the same machine as your mongod processes. Do you have a third unrelated server to run the Arbiter on?
(Documented here: http://www.mongodb.org/display/DOCS/Replica+Set+Tutorial#ReplicaSetTutorial-Runningwithtwonodes)
Running multiple mongod processes on the same server is going to cause performance problems. Besides, having two usable mongod processes and an arbiter on the same machine means that if the two physical servers become disconnected from one-another, they'll each elect a local Primary.
The most efficient way to have Server B take over when A powers off is to move your current arbiter to B. However, that would mean that A would stop being primary if B failed because it would be unable to form a majority.
There are two options - have another arbiter instance ready to go on A/B and reconfigure the set when there is a failure to add that into the set and remove the other arbiter, or relaunch A/B as a standalone mongod outside of a replica set while the other is down and reconfigure once things are healthy again.
With 2 machines you will consistently run into this problem of having to manually intervene to get the set back up. Every automatic solution I can think of involves having another machine.