Tech Blog

Creating a Cassandra Database Cluster for Local Development

Background

Cassandra is an open source NoSQL database in large-scale production use at companies such as Apple, Netflix and eBay. One of the main advantages of Cassandra is that it can be run as a multiple node cluster in which data is replicated across several nodes. The system is therefore highly resilient, where nodes can be added to and removed from the cluster without downtime or data loss.

In many cases, it is useful to experiment and test your code before deploying to a company wide development environment (if you have one). This blog post describes how to set up a local 2-node Cassandra cluster on your own machine using VirtualBox.

Creating the first node

If you don’t already have it, start by downloading VirtualBox from the downloads page.

To create the first virtual machine (VM), open up VirtualBox and click the “new” button on the menu bar. You should see a box similar to the one below, fill in the fields for the VM you want to create.

Go through the various configuration options:

2GB should be sufficient for RAM
“Create a virtual hard disk now”
- VDI
- Dynamically allocated
Name the hard-disk (default of the vm name and)
8GB should be sufficient for size

You should then be left with a new “powered off” virtual machine

Install the operating system

You will also need to download an ISO for a GNU/Linux operating system to be run inside the virtual machine. Lubuntu is often a good choice as being relatively lightweight but with enough out-of-the-box functionality to be used immediately. Lubuntu can be downloaded here.

Double click on the newly created VM; you will be presented with a box to select which operating system to install. Using the folder icon next to the drop-down menu, navigate to and select the ISO you have downloaded.

Select install Lubuntu (or whichever OS you have selected) and follow the instructions. It is safe to “Erase disk and install Lubuntu” as this only happens within the VM, not on your computer.

Remember the username and password you select as we will need these to log in later.

Once the installation has finished and the VM rebooted, we should have a working GNU/Linux system.

Installing and configuring Cassandra

Log into the VM and open up the terminal by clicking ‘Start’ → ‘System Tools’ → ‘LXTerminal’. Perform the following steps to install Cassandra:

Update package lists to latest: sudo apt get update
Install Java 8 (required for Cassandra): sudo apt-get install openjdk-8-jdk
Install Cassandra (docs):
echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA
sudo apt-get update
sudo apt-get install cassandra

Copying the VirtualBox image

We should now have a working Cassandra installation on the virtual machine. To create more nodes in the cluster, power down the machine we have just created, right click on the name and select clone.

Select a new name for the VM and perform a “full clone”.

Configuring networking

For each of the VMs, a network must be created to allow communication both between the VMs and the host computer. To do this, click ‘Settings → Network → Adapter 2 → Enable Network Adapter’ and select ‘Host-only adapter’ from the ‘Attached to’ dropdown. Perform this operation for both the VMs.

Configuring Cassandra

To configure Cassandra, the following steps are required on each of the VMs.

Log into the VM
Get the IP address assigned to the VM
- ip addr show | grep 192, take a note of the IP address just after ‘inet’
- I have cassandra_demo1: 192.168.99.100 and cassandra_demo2: 192.168.99.101
Open the Cassandra configuration file
- sudo nano /etc/cassandra/cassandra.yaml
- Update listen_address to the IP address of the VM obtained in the previous step
- Update rpc_address to the IP address of the VM obtained in the previous step
- Update seeds with the IP address of both VMs, comma separated
Save and close the file

As we cloned the VM, the Cassandra nodes will appear identical on startup and cause errors in the ring joining process. To remove these, do the following on the second VM:

sudo rm -rf /var/lib/cassandra/data/*

sudo rm -rf /var/lib/cassandra/commitlog/*

Starting up Cassandra

Start up Cassandra on the first node with sudo systemctl restart cassandra and follow the logs with tail -f /var/log/cassandra/system.log.

There should be a message JOINING: Finished joining ring to indicate success.

Do the same on the second node, again, a message with JOINING: Finished joining ring should appear in the logs. You can quickly test for this by using the command grep JOINING /var/log/cassandra/system.log

To confirm both nodes are in the ring, you should see both IP addresses present when

nodetool status

is issued.

Accessing the cluster from the host machine

To access the cluster from the host machine, you can use either of the IP addresses of the nodes. For example,

cqlsh 192.168.99.100

Conclusion

This blog post has outlined how to get a Cassandra cluster up and running for local development. If you have any comments about the post or would like to know more about Cassandra, please contact us.