Wednesday, January 23, 2019

Configure a Multi-Node Cluster with Cassandra

How-To Configure a Multi-Node Cluster with Cassandra

Introduction:
This article aims at helping you with setting up a multi-node cluster with Cassandra, a highly scalable open source database system that could be used as DB for OTK. In order to do so, first we have to make sure JDK is properly installed, then install Cassandra on each node, and finally configure the cluster.

Background:
Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra is used as the non-SQL based datastore for OAuth toolkit on CA API gateway.

Environment:
Cassandra is supported on Mac OS X, and a variety of Linux distributions. Oracle Java VM is highly recommended for Cassandra. I have used CentOS Linux release 7.2.1511 (Core), and Oracle JDK 1.8.0_1-1 in this article.
Instructions:
0- The Cassandra documentation highly recommends the Oracle Java VM 8. Installing the Oracle JAVA VM on single nodes is detailed in TEC1354854.



1- Installing Cassandra on single nodes

Download and extract the package:

$cd /tmp

$wget https://archive.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz



$tar -zxf apache-cassandra-2.2.8-bin.tar.gz



Move it to a proper folder:

$sudo mv apache-cassandra-2.2.8/ /opt/



Next, make sure that the folders Cassandra accesses, such as the log folder, exists and that Cassandra has the right to write on it:

$sudo mkdir /var/lib/cassandra

$sudo mkdir /var/log/cassandra

$sudo chown -R $USER:$GROUP /var/lib/cassandra

$sudo chown -R $USER:$GROUP /var/log/cassandra



To setup Cassandra environment variables, add the following lines to /etc/profile.d/cassandra.sh using vi or cat:



export CASSANDRA_HOME=/opt/apache-cassandra-2.2.8

export PATH=$PATH:$CASSANDRA_HOME/bin





You should now reboot the node, so everything is updated:

$sudo reboot



Log back in and and confirm that everything is set properly:

$sudo sh $CASSANDRA_HOME/bin/cassandra

$sudo sh $CASSANDRA_HOME/bin/cqlsh



2- Setting up Cassandra cluster

Before configuring each node, make sure Cassandra is not running:

$pkill cassandra



You'll also need to clear data:

$sudo rm -rf /var/lib/cassandra/*



As an example, I'll use a 3 node. single data center, single seed Cassandra cluster:

cas1.ca.com

cas2.ca.com

cas3.ca.com



At this point you will calculate token numbers for all cluster nodes. For a single data center with 3 nodes, using the default Murmur3Partitioner:

$python -c 'number_of_tokens=3; print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)]'

['-9223372036854775808', '-3074457345618258603', '3074457345618258602']



Configuration on nodes is done through customizing cassandra.yaml file:

$vi $CASSANDRA_HOME/conf/cassandra.yaml



The information you'll need to edit either can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token, listen_address, broadcast_address and broadcast_rpc_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:



cluster_name: 'clusterName'

seed_provider:

- seeds: "seedNode1IP, seedNode2IP, ..."

rpc_address: 0.0.0.0

endpoint_snitch: snitchTopology





Same on all 3 example nodes:

cluster_name: 'Hecuba'

seed_provider:

- seeds: "cas1.ca.com"

rpc_address: 0.0.0.0

endpoint_snitch: SimpleSnitch





For cas1.ca.com:

initial_token: -9223372036854775808

listen_address: localhost

broadcast_rpc_address: localhost



For cas2.ca.com:

initial_token: -3074457345618258603

listen_address: localhost

broadcast_rpc_address: localhost



For cas3.ca.com:

initial_token: 3074457345618258602

listen_address: localhost

broadcast_rpc_address: localhost



Once you have adjusted cassandra.yaml on all the nodes, start cassandra on nodes, doing it on the seed node first:

$sudo sh $CASSANDRA_HOME/bin/cassandra



At this point you have a functional multi-node Cassandra cluster.

Additional Information:
Testing the cluster

Get a description of the cluster:

$$CASSANDRA_HOME/bin/nodetool describecluster



Confirm that all the nodes are up:

$$CASSANDRA_HOME/bin/nodetool status



You should get something like this:



Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address        Load       Tokens  Owns (effective)  Host ID                               Rack

UN  10.151.21.124  278.38 KB  256     100.0%            d194719f-6e81-4d68-a3d0-d3d94c6a056f  rack1

UN  10.151.20.139  193.29 KB  256     100.0%            eabf4e11-35f9-4de2-a3ad-d2de51ebefdf  rack1

UN  10.151.21.137  253.45 KB  256     100.0%            775edae0-db3c-4f16-8bc5-2c41cef8b445  rack1





Create a test keyspace and table on the first node:

$$CASSANDRA_HOME/bin/cqlsh

cqlsh> CREATE KEYSPACE Hector WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '3' }; 

cqlsh> USE Hector;

cqlsh:Hector> CREATE TABLE planet( catalog int PRIMARY KEY, name text, mass double, density float, albedo float );

cqlsh:Hector> INSERT INTO planet (catalog, name, mass, density, albedo) VALUES ( 3, 'Earth', 5.9722E24, 5.513, 0.367);



Confirm you have the table replicated by running the following command on all nodes:

$$CASSANDRA_HOME:/bin/cqlsh -e "SELECT * FROM Hector.planet;"



 catalog | albedo | density | mass       | name

---------+--------+---------+------------+-------

       3 |  0.367 |   5.513 | 5.9722e+24 | Earth



Notes

Updating the replication factor:

Update a keyspace in the cluster and change its replication strategy options:
ALTER KEYSPACE "Hector" WITH REPLICATION =  { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

On each affected node, repair the node:
$$CASSANDRA_HOME/bin/nodetool repair

Wait until repair completes on a node, then move to the next node.


Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. To have one copy on each node, you need a replication factor equal to the number of nodes. Consistency level deals with number of replicas, not number of nodes.



SimpleStrategy is used only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location). NetworkTopologyStrategy is used when you have (or plan to have) your cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter.



For writes, with WC=ONE, if the node for the single replica is up and is the one you're connect to, the write will succeed. If it's for the other node, the write will fail.



The way Cassandra deletes data differs from the way a relational database deletes data. A relational database might spend time scanning through data looking for expired data and throwing it away or an administrator might have to partition expired data by month, for example, to clear it out faster. Data in a Cassandra column can have an optional expiration date called TTL (time to live). Use CQL to set the TTL in seconds for data. Cassandra marks TTL data with a tombstone after the requested amount of time has expired. Tombstones exist for a period of time defined by gc_grace_seconds. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.



Facts about deleted data to keep in mind:

Cassandra does not immediately remove data marked for deletion from disk. The deletion occurs during compaction.
If you use the sized-tiered or date-tiered compaction strategy, you can drop data immediately by manually starting the compaction process. Before doing so, understand the documented disadvantages of the process.
A deleted column can reappear if you do not run node repair routinely.


Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in compaction:

Logging data in the commit log
Writing data to the memtable
Flushing data from the memtable
Storing data on disk in SSTables
Compaction


Compaction options are configured at the table level via CQLSH. This allows each table to be optimised based on how it will be used. If a compaction strategy is not specified, SizeTieredCompactionStrategy will be used.



 ALTER TABLE mytable

  WITH compaction =

  { 'class' : 'SizeTieredCompactionStrategy' }



This is the default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty min_threshold. A minor compaction does not involve all the tables in a keyspace. Additional parameters allow STCS to be tuned to increase or decrease the number of compactions it performs and how tombstones are handled.



You can manually start compaction using the nodetool compact command.