I have a basic Cassandra deployment that contains a single node. I want to add a second node to the deployment, and I want clients to have access to the same data regardless of which node they happen to be talking with (i.e. inside of a given keyspace, a particular query should produce the same result on any node, unless there are recent updates that haven't fully propagated yet).
My keyspace has a replication factor of 2.
So anyways, I followed the instructions here (though I'm not sure if I'm using 'virtual' nodes or not...I should be doing whatever is default under Cassandra 2.1) and the nodes appear to be communicating with each other:
# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xxx.xxx.234.252 563.02 MB 1024 ? xxxxxxxx-0b3e-4fd3-9e63-xxxxxxxxxxxx RAC1
UN xxx.xxx.194.188 923.45 KB 1024 ? xxxxxxxx-84cb-4260-84df-xxxxxxxxxxxx RAC2
However I'm not really seeing any evidence of data propagating over onto the new node. For instance, its cfstats look like this:
Read Count: 290
Read Latency: 0.1124551724137931 ms.
Write Count: 35
Write Latency: 0.12919999999999998 ms.
Pending Flushes: 0
Table: assetproperties
SSTable count: 0
Space used (live): 0
Space used (total): 0
Space used by snapshots (total): 0
Off heap memory used (total): 0
SSTable Compression Ratio: 0.0
Number of keys (estimate): 34
...while on the original node, they look like this:
Read Count: 90
Read Latency: 1.674811111111111 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: assetproperties
SSTable count: 3
Space used (live): 305561510
Space used (total): 305561510
Space used by snapshots (total): 0
Off heap memory used (total): 773076
SSTable Compression Ratio: 0.22460684186840507
Number of keys (estimate): 416712
And if I connect to the new node using cqlsh
I get very inconsistent results. Querying for keys that I know are present in the dataset produces either no results, or variable results. As in, sometimes a row is returned with the correct data, and sometimes Cassandra informs me that there are no rows that match the query. If I connect to the original node, everything works as it should.
Is this just a side-effect of Cassandra's 'eventual consistency'? And if so, approximately how long should it take for the new node to start reliably returning useful data?
Or are there some additional steps that need to be done manually in order to get the new node working in a more reasonable/consistent way?
I suspect I might get better results if I set consistency all
in cqlsh
, but attempting to do so just gives me the following error:
ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses]
message="Operation timed out - received only 1 responses."
info={'received_responses': 1, 'required_responses': 2,
'consistency': 'ALL'
}
Would that be because the data hasn't been replicated onto the new node yet?
I believe I've found the answer. It was necessary to run
nodetool repair
on the original node in order to get the new node working correctly.Running
nodetool repair
on the new node may seem more intuitively correct, but attempting to do that just caused the repair process to hang forever with no log output.Once the repair process completed, the data was consistently available on the new node, and also setting
consistency all
in cqlsh started working properly.I also got a bunch of "Lost notification" messages when running
nodetool repair
. Those appear to be harmless, however.