I am looking at a cassandra cluster, but the administration effort seems to be quite high. Is there any way I can configure Cassandra to rebalance nodes automically as new machines are added, some are turned off or temporarily unavailable, etc?
I am looking at a cassandra cluster, but the administration effort seems to be quite high. Is there any way I can configure Cassandra to rebalance nodes automically as new machines are added, some are turned off or temporarily unavailable, etc?
Cassandra actually does rebalance nodes automatically as you add new ones; it's just not a very sophisticated approach. It picks the node with the highest "load" (see nodetool ring output) and places the new node on the ring to take over around half of the heaviest-loaded node's work. This doesn't perform a rebalancing of the cluster overall, but it does minimize the streaming load necessary for cluster expansion. This auto-balancing strategy tends to work best if you nearly double the cluster's size with each expansion.
If you need more nuanced rebalancing, you can move a node's position on the ring with the "nodetool move" command (which is really a wrapper for decommissioning and re-adding the node).
Not yet since token assignment is currently static. You have a choice of scripting the balancing act following http://wiki.apache.org/cassandra/Operations#Ring_management or doubling the size of the cluster at once with auto-bootstrap. Neither option is currently exactly appealing at the moment but it's not horrendous to add nodes and move token around as long as you allow enough time for data to migrate.
One thing to watch for
nodetool loadbalance
is not going to do what you think.