Ping a Specific Port

Question

P. Bender

Asked: 2016-10-25 07:59:12 +0800 CST2016-10-25 07:59:12 +0800 CST 2016-10-25 07:59:12 +0800 CST

Cassandra snapshot restoring : random missing data

772

I'm having a hard time restoring snapshot on Apache Cassandra (version 3.0.9). As far as I can say, I'm following the procedure described on datastax blog, along with several other ones (for instance : http://datascale.io/cloning-cassandra-clusters-fast-way/). Yet I may be missing something, and everytime I make a restore, data is missing.

Setup : 6 nodes cluster (1 DC, 3 racks with 2 nodes each) with a replication factor set to 3. Machines are hosted on AWS.

Backup procedure (on each node) :

nodetool snapshot mykeyspace
cqlsh -e 'DESCRIBE KEYSPACE mykeyspace' > /tmp/mykeyspace.cql
nodetool ring | grep "$(ifconfig | awk '/inet /{print $2}' | head -1)" | awk '{print $NF ","}' | xargs > /tmp/tokens

I get the files generated by the nodetool snapshot command and backup them along with tokens and cql on S3.

Restore procedure (for each node unless it's specified) :

(after having created new VMs)

Download snapshots, tokens and keyspace
Stop service cassandra
Delete /var/lib/cassandra/commitlog/* and /var/lib/cassandra/system/
Insert tokens into cassandra.yaml
Start service cassandra
Restore mykeyspace from mykeyspace.cql on one node only
Wait for replication and stop service cassandra
Delete .db files in folder /var/lib/cassandra/data/mykeyspace/
For each table copy snapshots files (.db, .crc32, .txt) into /var/lib/cassandra/data/mykeyspace/$table/
Restart service cassandra
Run nodetool repair mykeyspace -full, one node at a time

Result :

There are always missing rows, approximately the same quantity for each table but never the same ones. I tried to "mix up" a bit the procedure, like restoring keyspace before tokens, running nodetool refresh before repair, but I meet the same issue everytime.

Since I'm not far from having a "good" restore, I think that I'm missing something pretty obvious. Analyzing logs didn't really help, as they don't show any error/failure messages.

Any help would be welcomed :) I can of course give more information if needed.

edit : no one ? I updated the question with cassandra version (3.0.9), which I forgot in the first place. I tried again to restore, but no luck. I don't have any more idea really :(

2 Answers

Voted

Josh · Answer 1 · 2016-11-12T08:58:22+08:00

Josh

2016-11-12T08:58:22+08:002016-11-12T08:58:22+08:00

The sed command in that blog post, which is supposed to append -Dcassandra.load_ring_state=false to the $JVM_OPTS, has no effect in its current form.

If you were copying that command directly from the blog post, it's possible that's the issue. You could try this one instead which places it at the bottom of the file:

sudo sed -i '$ a\JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"' /etc/cassandra/cassandra-env.sh

Also you'll need to do a nodetool repair -pr <ks> on each node, one by one, after following this procedure.

0

P. Bender · Answer 2 · 2016-11-18T02:38:33+08:00

Best Answer

P. Bender

2016-11-18T02:38:33+08:002016-11-18T02:38:33+08:00

Ok, end of story, stupid me ! The initial_token line in cassandra.yaml was wrongly "seded" during my restore procedure. If there is no space after the ':' for the initial_token key, then cassandra fails to launch. therefore the line was kept commented and tokens not interpreted !

tldr :

initial_token:<values> = WRONG
initial_token: <values> = GOOD

Thanks a lot to you Josh Purvis for insisting on the high importance of this parameter :-)

0

Cassandra snapshot restoring : random missing data

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?