I have a Pulsar cluster of 3 machines. Each one running Pulsar broker, Zookeeper and Bookkeeper. I have the following in my broker.conf:
managedLedgerDefaultEnsembleSize=2
managedLedgerDefaultWriteQuorum=2
managedLedgerDefaultAckQuorum=2
So I should be able to take any one of the 3 machines down for a while without any disruption in service right? And when I bring it up will it get copies of all the message it missed? I just want to make sure I am understanding things correctly before I do this to our live cluster. I don’t want to have a very bad weekend!
Oh sorry for missed the configure of (EnsembleSize, writeQuorum, AckQuorum) quorum value of (2,2,2) in previous answer. If only with 3 bookies, it will not support one machine down under quorum (3,3,2).
But even with quorum (2,2,2)before taking one machine off, be sure turn bookkeeper auto-recovery off by using command
bin/bookkeeper shell autorecovery -disable
, and turn it on when machine come back by usingbin/bookkeeper shell autorecovery -enable
.If not set off, bookkeeper will do auto-recovery once a machine is offline, because bookkeeper was expected to have 3 data copies, but it only have 2 copies now. And since it will not success to find a third available machine to place the recovered copy, so auto-recovery will be fail.
For more information of bookeeper auto-recovery, you could check this link. Here is part of the content: