MayaData Blog

Deploy DataStax Cassandra on EKS using OpenEBS LVM LocalPV

Written by Naresh Deshaveni | Aug 2, 2021 1:15:34 PM

This guide explores how OpenEBS LocalPV devices can be used to provision storage for data generated by DataStax Cassandra - an Enterprise NoSQL database solution.

Most modern applications constantly generate large amounts of data, which requires flexible schemas and specific data models to manage and access such data. A NoSQL database offers more ease of development, scaling, functionality and flexibility compared to a traditional relational database. These databases are built to offer analytics over unstructured or semi-structured data using a wide variety of data access patterns built for specific purposes. 

DataStax Cassandra is a cloud-native NoSQL database built over Apache Cassandra that offers proven reliability, enterprise-grade scalability, and availability. DataStax Cassandra simplifies NoSQL operations by leveraging simple APIs that make Cassandra easier to adopt, extend and use. The platform also adds capabilities like indexing, JSON support, ACID and joins to the Cassandra Query Language, bringing an SQL-like experience into Cassandra.

Being Kubernetes-ready, DataStax Cassandra allows organizations to work with open-source, cloud and cloud-native tooling for better data handling. To help with this, the OpenEBS storage solution allows for dynamic provisioning of storage for containers, and allows for the intrinsic management of Local Persistent Volumes. OpenEBS LocalPV is a highly scalable, flexible Container Attached Storage (CAS) solution that creates Persistent Storage on Kubernetes worker nodes using host paths or local disks. The OpenEBS Local Volume Manager (LVM) also allows administrators to provision LocalPV storage for Cassandra containers on Nodes in a Kubernetes cluster. 

This guide demonstrates how to deploy Cassandra on OpenEBS LocalPV volumes using LVM. 

Getting Started with DataStax Cassandra

DataStax extends the proven Apache Cassandra database using the Stargate open-source API platform to offer organizations the simplicity, freedom of choice and scalability of a cloud-native solution. This offers a more stable, supported version of Cassandra with improved management, better replication, search analytics, performance and security. DataStax Cassandra allows organizations to store, process and manage terabytes of data with enterprise support and much lower latencies. Some benefits for enterprises to adopt DataStax Cassandra include:  

  • The DataStax suite works beyond the database layer, serving the organization’s end-to-end big data needs
  • Modern APIs that allow developers to experiment with innovations like Serverless computing
  • DataStax Cassandra is cloud-native and Kubernetes-ready, allowing organizations to work with open-source, cloud and cloud-native tooling for better data handling 
  • Detailed documentation that allows any team to get started with NoSQL databases
  • Simple to install, set up and start using
  • DataStax Enterprise (DSE) includes specialized tools, quick response customer teams and great support 

DataStax Cassandra vs Apache Cassandra

Apache Cassandra is an open-source Distributed Database Management System designed for storage and management of large data volumes across multiple deployment environments. Written in Java, Apache Cassandra is a popular NoSQL database that provides unique scaling, availability and fault tolerance capabilities. DataStax Cassandra is a managed database platform that is built on top of Apache Cassandra. The platform makes Cassandra easier to operate by making the storage engine pluggable, and integrates with other plugins to make the database work on different deployment environments. 

There are, however, a few differences between the two NoSQL solutions which inlude:

  • Apache Cassandra is open-source while DataStax is a commercial offering
  • Apache Cassandra supports Windows, Linux, BSD and OS X server operating systems while DataStax only supports Linux and OS X
  • DataStax supports secondary indexes which are restricted in Apache Cassandra
  • Apache Cassandra provides replication with selectable factors, while DataStax replication is datacenter aware, configurable and includes advanced options for edge computing
  • DataStax offers in-memory storage capabilities while Apache Cassandra doesn’t

Kubernetes with DataStax Cassandra

Kubernetes orchestration matches well with Cassandra’s big data processing capabilities. Both solutions can run across multiple deployment environments and are highly scalable. As Kubernetes has evolved to support containerization of stateful workloads and persisting storage in the data plane, organizations use operators that make it easy to deploy and manage Cassandra in Kubernetes.

The DataStax Kubernetes Operator for Cassandra

DataStax collaborated with the Cassandra community to develop K8ssandra - the operator that simplifies lifecycle management for Cassandra clusters in Kubernetes. The operator abstracts the Cassandra architecture concepts as data centres to be expressed within Kubernetes. The operator also includes a data plane controller that lets Kubernetes administrators monitor and maintain Cassandra clusters. This makes it easy to run Cassandra on managed Kubernetes services, self-managed Kubernetes services or even on local machines.

Installing DataStax Cassandra Operators on OpenEBS LocalPV Devices

OpenEBS offers Logical Volume Management (LVM) capabilities that allow for the provision of flexible and scalable Persistent Storage. The OpenEBS LVM CSI driver creates various layers of abstraction between physical storage and the volume abstractions presented to Kubernetes. 

This means that physical storage can be rearranged to cater for growing data needs without having to reconfigure the volumes and storage classes for workloads in PODs. 

DataStax Cassandra with OpenEBS LVM

Installing DataStax Cassandra Operators on OpenEBS LocalPV offers several benefits for a dynamic data store like Cassandra, including:

  • Dynamic resizing of volumes can handle workloads that scale up and down frequently
  • LVM provides isolation
  • LVM helps create redundant partitions that enable backup and disaster recovery
  • With Logical Volume Management, teams can easily create smaller storage environments for testing and development 

Provisioning OpenEBS Local PV Storage for DataStax

OpenEBS has developed this solution guide outlining how to set up a DataStax Cassandra Database using LVM. The guide goes through everything from basic LVM configuration, attaching disks to nodes, installing OpenEBS LocalPV operators to creating storage classes. It also explores the procedures for installing DataStax Cassandra, accessing the database and resizing PV volumes for Cassandra. The guide outlines the steps to deploy a functional DataStax Cassandra cluster, store and retrieve data, and monitor the deployment using Prometheus operators.

Summary

Kubernetes can be used with Cassandra to enable seamless replication of data across various deployment locations. This allows organizations to create flexible, scalable and highly available NoSQL databases for big data applications. OpenEBS LVM helps extend the flexibility of DataStax Cassandra through dynamic resizing, allowing for expansion that is not limited by physical storage. The article was an essential guide on how the OpenEBS LVM CSI driver can be used to provide scalable storage for Cassandra clusters. 

To understand more on how OpenEBS LocalPV volumes can help manage Cassandra for your organization, drop us a message.

Related Blogs:

Container Attached Storage (CAS) vs. Software-Defined Storage - Which One to Choose?