Data Engineering: January 2019

How-To Configure a Multi-Node Cluster with Cassandra

Introduction:
This article aims at helping you with setting up a multi-node cluster with Cassandra, a highly scalable open source database system that could be used as DB for OTK. In order to do so, first we have to make sure JDK is properly installed, then install Cassandra on each node, and finally configure the cluster.

Background:
Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra is used as the non-SQL based datastore for OAuth toolkit on CA API gateway.

Environment:
Cassandra is supported on Mac OS X, and a variety of Linux distributions. Oracle Java VM is highly recommended for Cassandra. I have used CentOS Linux release 7.2.1511 (Core), and Oracle JDK 1.8.0_1-1 in this article.
Instructions:
0- The Cassandra documentation highly recommends the Oracle Java VM 8. Installing the Oracle JAVA VM on single nodes is detailed in TEC1354854.

1- Installing Cassandra on single nodes

Download and extract the package:

$cd /tmp

$wget https://archive.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz

$tar -zxf apache-cassandra-2.2.8-bin.tar.gz

Move it to a proper folder:

$sudo mv apache-cassandra-2.2.8/ /opt/

Next, make sure that the folders Cassandra accesses, such as the log folder, exists and that Cassandra has the right to write on it:

$sudo mkdir /var/lib/cassandra

$sudo mkdir /var/log/cassandra

$sudo chown -R $USER:$GROUP /var/lib/cassandra

$sudo chown -R $USER:$GROUP /var/log/cassandra

To setup Cassandra environment variables, add the following lines to /etc/profile.d/cassandra.sh using vi or cat:

export CASSANDRA_HOME=/opt/apache-cassandra-2.2.8

export PATH=$PATH:$CASSANDRA_HOME/bin

You should now reboot the node, so everything is updated:

$sudo reboot

Log back in and and confirm that everything is set properly:

$sudo sh $CASSANDRA_HOME/bin/cassandra

$sudo sh $CASSANDRA_HOME/bin/cqlsh

2- Setting up Cassandra cluster

Before configuring each node, make sure Cassandra is not running:

$pkill cassandra

You'll also need to clear data:

$sudo rm -rf /var/lib/cassandra/*

As an example, I'll use a 3 node. single data center, single seed Cassandra cluster:

cas1.ca.com

cas2.ca.com

cas3.ca.com

At this point you will calculate token numbers for all cluster nodes. For a single data center with 3 nodes, using the default Murmur3Partitioner:

$python -c 'number_of_tokens=3; print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)]'

['-9223372036854775808', '-3074457345618258603', '3074457345618258602']

Configuration on nodes is done through customizing cassandra.yaml file:

$vi $CASSANDRA_HOME/conf/cassandra.yaml

The information you'll need to edit either can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token, listen_address, broadcast_address and broadcast_rpc_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:

cluster_name: 'clusterName'

seed_provider:

- seeds: "seedNode1IP, seedNode2IP, ..."

rpc_address: 0.0.0.0

endpoint_snitch: snitchTopology

Same on all 3 example nodes:

cluster_name: 'Hecuba'

seed_provider:

- seeds: "cas1.ca.com"

rpc_address: 0.0.0.0

endpoint_snitch: SimpleSnitch

For cas1.ca.com:

initial_token: -9223372036854775808

listen_address: localhost

broadcast_rpc_address: localhost

For cas2.ca.com:

initial_token: -3074457345618258603

listen_address: localhost

broadcast_rpc_address: localhost

For cas3.ca.com:

initial_token: 3074457345618258602

listen_address: localhost

broadcast_rpc_address: localhost

Once you have adjusted cassandra.yaml on all the nodes, start cassandra on nodes, doing it on the seed node first:

$sudo sh $CASSANDRA_HOME/bin/cassandra

At this point you have a functional multi-node Cassandra cluster.

Additional Information:
Testing the cluster

Get a description of the cluster:

$$CASSANDRA_HOME/bin/nodetool describecluster

Confirm that all the nodes are up:

$$CASSANDRA_HOME/bin/nodetool status

You should get something like this:

Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

-- Address Load Tokens Owns (effective) Host ID Rack

UN 10.151.21.124 278.38 KB 256 100.0% d194719f-6e81-4d68-a3d0-d3d94c6a056f rack1

UN 10.151.20.139 193.29 KB 256 100.0% eabf4e11-35f9-4de2-a3ad-d2de51ebefdf rack1

UN 10.151.21.137 253.45 KB 256 100.0% 775edae0-db3c-4f16-8bc5-2c41cef8b445 rack1

Create a test keyspace and table on the first node:

$$CASSANDRA_HOME/bin/cqlsh

cqlsh> CREATE KEYSPACE Hector WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '3' };

cqlsh> USE Hector;

cqlsh:Hector> CREATE TABLE planet( catalog int PRIMARY KEY, name text, mass double, density float, albedo float );

cqlsh:Hector> INSERT INTO planet (catalog, name, mass, density, albedo) VALUES ( 3, 'Earth', 5.9722E24, 5.513, 0.367);

Confirm you have the table replicated by running the following command on all nodes:

$$CASSANDRA_HOME:/bin/cqlsh -e "SELECT * FROM Hector.planet;"

catalog | albedo | density | mass | name

---------+--------+---------+------------+-------

3 | 0.367 | 5.513 | 5.9722e+24 | Earth

Notes

Updating the replication factor:

Update a keyspace in the cluster and change its replication strategy options:
ALTER KEYSPACE "Hector" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

On each affected node, repair the node:
$$CASSANDRA_HOME/bin/nodetool repair

Wait until repair completes on a node, then move to the next node.

Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. To have one copy on each node, you need a replication factor equal to the number of nodes. Consistency level deals with number of replicas, not number of nodes.

SimpleStrategy is used only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location). NetworkTopologyStrategy is used when you have (or plan to have) your cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter.

For writes, with WC=ONE, if the node for the single replica is up and is the one you're connect to, the write will succeed. If it's for the other node, the write will fail.

The way Cassandra deletes data differs from the way a relational database deletes data. A relational database might spend time scanning through data looking for expired data and throwing it away or an administrator might have to partition expired data by month, for example, to clear it out faster. Data in a Cassandra column can have an optional expiration date called TTL (time to live). Use CQL to set the TTL in seconds for data. Cassandra marks TTL data with a tombstone after the requested amount of time has expired. Tombstones exist for a period of time defined by gc_grace_seconds. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.

Facts about deleted data to keep in mind:

Cassandra does not immediately remove data marked for deletion from disk. The deletion occurs during compaction.
If you use the sized-tiered or date-tiered compaction strategy, you can drop data immediately by manually starting the compaction process. Before doing so, understand the documented disadvantages of the process.
A deleted column can reappear if you do not run node repair routinely.

Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in compaction:

Logging data in the commit log
Writing data to the memtable
Flushing data from the memtable
Storing data on disk in SSTables
Compaction

Compaction options are configured at the table level via CQLSH. This allows each table to be optimised based on how it will be used. If a compaction strategy is not specified, SizeTieredCompactionStrategy will be used.

ALTER TABLE mytable

WITH compaction =

{ 'class' : 'SizeTieredCompactionStrategy' }

This is the default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty min_threshold. A minor compaction does not involve all the tables in a keyspace. Additional parameters allow STCS to be tuned to increase or decrease the number of compactions it performs and how tombstones are handled.

You can manually start compaction using the nodetool compact command.

Wednesday, January 23, 2019

Configure a Multi-Node Cluster with Cassandra