How-To Configure a Multi-Node Cluster with Cassandra
Introduction:
This article aims at helping you with setting up a multi-node cluster with Cassandra, a highly scalable open source database system that could be used as DB for OTK. In order to do so, first we have to make sure JDK is properly installed, then install Cassandra on each node, and finally configure the cluster.
Background:
Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra is used as the non-SQL based datastore for OAuth toolkit on CA API gateway.
Environment:
Cassandra is supported on Mac OS X, and a variety of Linux distributions. Oracle Java VM is highly recommended for Cassandra. I have used CentOS Linux release 7.2.1511 (Core), and Oracle JDK 1.8.0_1-1 in this article.
Instructions:
0- The Cassandra documentation highly recommends the Oracle Java VM 8. Installing the Oracle JAVA VM on single nodes is detailed in TEC1354854.
1- Installing Cassandra on single nodes
Download and extract the package:
$cd /tmp
$wget https://archive.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz
$tar -zxf apache-cassandra-2.2.8-bin.tar.gz
Move it to a proper folder:
$sudo mv apache-cassandra-2.2.8/ /opt/
Next, make sure that the folders Cassandra accesses, such as the log folder, exists and that Cassandra has the right to write on it:
$sudo mkdir /var/lib/cassandra
$sudo mkdir /var/log/cassandra
$sudo chown -R $USER:$GROUP /var/lib/cassandra
$sudo chown -R $USER:$GROUP /var/log/cassandra
To setup Cassandra environment variables, add the following lines to /etc/profile.d/cassandra.sh using vi or cat:
export CASSANDRA_HOME=/opt/apache-cassandra-2.2.8
export PATH=$PATH:$CASSANDRA_HOME/bin
You should now reboot the node, so everything is updated:
$sudo reboot
Log back in and and confirm that everything is set properly:
$sudo sh $CASSANDRA_HOME/bin/cassandra
$sudo sh $CASSANDRA_HOME/bin/cqlsh
2- Setting up Cassandra cluster
Before configuring each node, make sure Cassandra is not running:
$pkill cassandra
You'll also need to clear data:
$sudo rm -rf /var/lib/cassandra/*
As an example, I'll use a 3 node. single data center, single seed Cassandra cluster:
cas1.ca.com
cas2.ca.com
cas3.ca.com
At this point you will calculate token numbers for all cluster nodes. For a single data center with 3 nodes, using the default Murmur3Partitioner:
$python -c 'number_of_tokens=3; print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)]'
['-9223372036854775808', '-3074457345618258603', '3074457345618258602']
Configuration on nodes is done through customizing cassandra.yaml file:
$vi $CASSANDRA_HOME/conf/cassandra.yaml
The information you'll need to edit either can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token, listen_address, broadcast_address and broadcast_rpc_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:
cluster_name: 'clusterName'
seed_provider:
- seeds: "seedNode1IP, seedNode2IP, ..."
rpc_address: 0.0.0.0
endpoint_snitch: snitchTopology
Same on all 3 example nodes:
cluster_name: 'Hecuba'
seed_provider:
- seeds: "cas1.ca.com"
rpc_address: 0.0.0.0
endpoint_snitch: SimpleSnitch
For cas1.ca.com:
initial_token: -9223372036854775808
listen_address: localhost
broadcast_rpc_address: localhost
For cas2.ca.com:
initial_token: -3074457345618258603
listen_address: localhost
broadcast_rpc_address: localhost
For cas3.ca.com:
initial_token: 3074457345618258602
listen_address: localhost
broadcast_rpc_address: localhost
Once you have adjusted cassandra.yaml on all the nodes, start cassandra on nodes, doing it on the seed node first:
$sudo sh $CASSANDRA_HOME/bin/cassandra
At this point you have a functional multi-node Cassandra cluster.
Additional Information:
Testing the cluster
Get a description of the cluster:
$$CASSANDRA_HOME/bin/nodetool describecluster
Confirm that all the nodes are up:
$$CASSANDRA_HOME/bin/nodetool status
You should get something like this:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.151.21.124 278.38 KB 256 100.0% d194719f-6e81-4d68-a3d0-d3d94c6a056f rack1
UN 10.151.20.139 193.29 KB 256 100.0% eabf4e11-35f9-4de2-a3ad-d2de51ebefdf rack1
UN 10.151.21.137 253.45 KB 256 100.0% 775edae0-db3c-4f16-8bc5-2c41cef8b445 rack1
Create a test keyspace and table on the first node:
$$CASSANDRA_HOME/bin/cqlsh
cqlsh> CREATE KEYSPACE Hector WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '3' };
cqlsh> USE Hector;
cqlsh:Hector> CREATE TABLE planet( catalog int PRIMARY KEY, name text, mass double, density float, albedo float );
cqlsh:Hector> INSERT INTO planet (catalog, name, mass, density, albedo) VALUES ( 3, 'Earth', 5.9722E24, 5.513, 0.367);
Confirm you have the table replicated by running the following command on all nodes:
$$CASSANDRA_HOME:/bin/cqlsh -e "SELECT * FROM Hector.planet;"
catalog | albedo | density | mass | name
---------+--------+---------+------------+-------
3 | 0.367 | 5.513 | 5.9722e+24 | Earth
Notes
Updating the replication factor:
Update a keyspace in the cluster and change its replication strategy options:
ALTER KEYSPACE "Hector" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
On each affected node, repair the node:
$$CASSANDRA_HOME/bin/nodetool repair
Wait until repair completes on a node, then move to the next node.
Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. To have one copy on each node, you need a replication factor equal to the number of nodes. Consistency level deals with number of replicas, not number of nodes.
SimpleStrategy is used only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location). NetworkTopologyStrategy is used when you have (or plan to have) your cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter.
For writes, with WC=ONE, if the node for the single replica is up and is the one you're connect to, the write will succeed. If it's for the other node, the write will fail.
The way Cassandra deletes data differs from the way a relational database deletes data. A relational database might spend time scanning through data looking for expired data and throwing it away or an administrator might have to partition expired data by month, for example, to clear it out faster. Data in a Cassandra column can have an optional expiration date called TTL (time to live). Use CQL to set the TTL in seconds for data. Cassandra marks TTL data with a tombstone after the requested amount of time has expired. Tombstones exist for a period of time defined by gc_grace_seconds. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.
Facts about deleted data to keep in mind:
Cassandra does not immediately remove data marked for deletion from disk. The deletion occurs during compaction.
If you use the sized-tiered or date-tiered compaction strategy, you can drop data immediately by manually starting the compaction process. Before doing so, understand the documented disadvantages of the process.
A deleted column can reappear if you do not run node repair routinely.
Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in compaction:
Logging data in the commit log
Writing data to the memtable
Flushing data from the memtable
Storing data on disk in SSTables
Compaction
Compaction options are configured at the table level via CQLSH. This allows each table to be optimised based on how it will be used. If a compaction strategy is not specified, SizeTieredCompactionStrategy will be used.
ALTER TABLE mytable
WITH compaction =
{ 'class' : 'SizeTieredCompactionStrategy' }
This is the default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty min_threshold. A minor compaction does not involve all the tables in a keyspace. Additional parameters allow STCS to be tuned to increase or decrease the number of compactions it performs and how tombstones are handled.
You can manually start compaction using the nodetool compact command.
Introduction:
This article aims at helping you with setting up a multi-node cluster with Cassandra, a highly scalable open source database system that could be used as DB for OTK. In order to do so, first we have to make sure JDK is properly installed, then install Cassandra on each node, and finally configure the cluster.
Background:
Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra is used as the non-SQL based datastore for OAuth toolkit on CA API gateway.
Environment:
Cassandra is supported on Mac OS X, and a variety of Linux distributions. Oracle Java VM is highly recommended for Cassandra. I have used CentOS Linux release 7.2.1511 (Core), and Oracle JDK 1.8.0_1-1 in this article.
Instructions:
0- The Cassandra documentation highly recommends the Oracle Java VM 8. Installing the Oracle JAVA VM on single nodes is detailed in TEC1354854.
1- Installing Cassandra on single nodes
Download and extract the package:
$cd /tmp
$wget https://archive.apache.org/dist/cassandra/2.2.8/apache-cassandra-2.2.8-bin.tar.gz
$tar -zxf apache-cassandra-2.2.8-bin.tar.gz
Move it to a proper folder:
$sudo mv apache-cassandra-2.2.8/ /opt/
Next, make sure that the folders Cassandra accesses, such as the log folder, exists and that Cassandra has the right to write on it:
$sudo mkdir /var/lib/cassandra
$sudo mkdir /var/log/cassandra
$sudo chown -R $USER:$GROUP /var/lib/cassandra
$sudo chown -R $USER:$GROUP /var/log/cassandra
To setup Cassandra environment variables, add the following lines to /etc/profile.d/cassandra.sh using vi or cat:
export CASSANDRA_HOME=/opt/apache-cassandra-2.2.8
export PATH=$PATH:$CASSANDRA_HOME/bin
You should now reboot the node, so everything is updated:
$sudo reboot
Log back in and and confirm that everything is set properly:
$sudo sh $CASSANDRA_HOME/bin/cassandra
$sudo sh $CASSANDRA_HOME/bin/cqlsh
2- Setting up Cassandra cluster
Before configuring each node, make sure Cassandra is not running:
$pkill cassandra
You'll also need to clear data:
$sudo rm -rf /var/lib/cassandra/*
As an example, I'll use a 3 node. single data center, single seed Cassandra cluster:
cas1.ca.com
cas2.ca.com
cas3.ca.com
At this point you will calculate token numbers for all cluster nodes. For a single data center with 3 nodes, using the default Murmur3Partitioner:
$python -c 'number_of_tokens=3; print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)]'
['-9223372036854775808', '-3074457345618258603', '3074457345618258602']
Configuration on nodes is done through customizing cassandra.yaml file:
$vi $CASSANDRA_HOME/conf/cassandra.yaml
The information you'll need to edit either can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token, listen_address, broadcast_address and broadcast_rpc_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:
cluster_name: 'clusterName'
seed_provider:
- seeds: "seedNode1IP, seedNode2IP, ..."
rpc_address: 0.0.0.0
endpoint_snitch: snitchTopology
Same on all 3 example nodes:
cluster_name: 'Hecuba'
seed_provider:
- seeds: "cas1.ca.com"
rpc_address: 0.0.0.0
endpoint_snitch: SimpleSnitch
For cas1.ca.com:
initial_token: -9223372036854775808
listen_address: localhost
broadcast_rpc_address: localhost
For cas2.ca.com:
initial_token: -3074457345618258603
listen_address: localhost
broadcast_rpc_address: localhost
For cas3.ca.com:
initial_token: 3074457345618258602
listen_address: localhost
broadcast_rpc_address: localhost
Once you have adjusted cassandra.yaml on all the nodes, start cassandra on nodes, doing it on the seed node first:
$sudo sh $CASSANDRA_HOME/bin/cassandra
At this point you have a functional multi-node Cassandra cluster.
Additional Information:
Testing the cluster
Get a description of the cluster:
$$CASSANDRA_HOME/bin/nodetool describecluster
Confirm that all the nodes are up:
$$CASSANDRA_HOME/bin/nodetool status
You should get something like this:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.151.21.124 278.38 KB 256 100.0% d194719f-6e81-4d68-a3d0-d3d94c6a056f rack1
UN 10.151.20.139 193.29 KB 256 100.0% eabf4e11-35f9-4de2-a3ad-d2de51ebefdf rack1
UN 10.151.21.137 253.45 KB 256 100.0% 775edae0-db3c-4f16-8bc5-2c41cef8b445 rack1
Create a test keyspace and table on the first node:
$$CASSANDRA_HOME/bin/cqlsh
cqlsh> CREATE KEYSPACE Hector WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '3' };
cqlsh> USE Hector;
cqlsh:Hector> CREATE TABLE planet( catalog int PRIMARY KEY, name text, mass double, density float, albedo float );
cqlsh:Hector> INSERT INTO planet (catalog, name, mass, density, albedo) VALUES ( 3, 'Earth', 5.9722E24, 5.513, 0.367);
Confirm you have the table replicated by running the following command on all nodes:
$$CASSANDRA_HOME:/bin/cqlsh -e "SELECT * FROM Hector.planet;"
catalog | albedo | density | mass | name
---------+--------+---------+------------+-------
3 | 0.367 | 5.513 | 5.9722e+24 | Earth
Notes
Updating the replication factor:
Update a keyspace in the cluster and change its replication strategy options:
ALTER KEYSPACE "Hector" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
On each affected node, repair the node:
$$CASSANDRA_HOME/bin/nodetool repair
Wait until repair completes on a node, then move to the next node.
Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. To have one copy on each node, you need a replication factor equal to the number of nodes. Consistency level deals with number of replicas, not number of nodes.
SimpleStrategy is used only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location). NetworkTopologyStrategy is used when you have (or plan to have) your cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter.
For writes, with WC=ONE, if the node for the single replica is up and is the one you're connect to, the write will succeed. If it's for the other node, the write will fail.
The way Cassandra deletes data differs from the way a relational database deletes data. A relational database might spend time scanning through data looking for expired data and throwing it away or an administrator might have to partition expired data by month, for example, to clear it out faster. Data in a Cassandra column can have an optional expiration date called TTL (time to live). Use CQL to set the TTL in seconds for data. Cassandra marks TTL data with a tombstone after the requested amount of time has expired. Tombstones exist for a period of time defined by gc_grace_seconds. After data is marked with a tombstone, the data is automatically removed during the normal compaction process.
Facts about deleted data to keep in mind:
Cassandra does not immediately remove data marked for deletion from disk. The deletion occurs during compaction.
If you use the sized-tiered or date-tiered compaction strategy, you can drop data immediately by manually starting the compaction process. Before doing so, understand the documented disadvantages of the process.
A deleted column can reappear if you do not run node repair routinely.
Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in compaction:
Logging data in the commit log
Writing data to the memtable
Flushing data from the memtable
Storing data on disk in SSTables
Compaction
Compaction options are configured at the table level via CQLSH. This allows each table to be optimised based on how it will be used. If a compaction strategy is not specified, SizeTieredCompactionStrategy will be used.
ALTER TABLE mytable
WITH compaction =
{ 'class' : 'SizeTieredCompactionStrategy' }
This is the default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty min_threshold. A minor compaction does not involve all the tables in a keyspace. Additional parameters allow STCS to be tuned to increase or decrease the number of compactions it performs and how tombstones are handled.
You can manually start compaction using the nodetool compact command.