Provisioning High-Performance Storage for NoSQL databases with OpenEBS

Provisioning High-Performance Storage for NoSQL databases with OpenEBS

Modern applications rely on multiple data models that generate different data types. To help such use cases, NoSQL (Not only SQL) databases allow for the storage and processing of different data types in a non-tabular fashion. NoSQL databases process unstructured data using flexible schemas to enable efficient storage and analysis for distributed, data-driven applications. By relaxing data consistency restrictions of SQL-based databases, NoSQL databases enable low latency, scalability and high performance for data access.  

The performance of SQL databases varies with the cluster size, the application’s configuration and network latency. This means that developers don’t have to worry about optimizing data structures, indexes and queries to achieve peak performance from the storage subsystem. NoSQL databases are popular with modern cloud applications since they use APIs for the storage and retrieval of data structures. This makes it easy to interface NoSQL databases with a host of microservice-based applications for agile, cloud-native deployment.

OpenEBS is a leading Container Attached Storage (CAS) that helps developers deploy workloads efficiently by turning storage available on worker nodes into dynamically provisioned volumes. With OpenEBS, developers can implement granular policies per workload, reduce storage costs, avoid vendor lock-in and develop persistent storage for applications with high availability. This post delves into how OpenEBS can be used to provision high-performance storage for NoSQL databases. 

A Deep Dive into NoSQL Databases 

Before the mid-2000s, relational databases dominated application development. The Structured Query Language (SQL) was used to store and retrieve data structures, with popular databases being MySQL, Oracle, PostgreSQL, DB2 and SQL Server. The decrease in the cost of storage options eliminated the need to use strict, complex data models allowing for data applications to store and query more types of data. NoSQL databases evolved to handle the flexible schema needed to process different combinations of structured, semi-structured and polymorphic data. This section explores various features, types and popular distributions of NoSQL databases.

Features of NoSQL Databases

Though there are various flavours of NoSQL databases, here’s a list of common features that distinguish them from SQL databases:

Multi-Model

NoSQL databases are built for flexibility to handle large amounts of data. Unlike SQL-based databases that access & analyze data in tables and columns, NoSQL databases ingest all types of data with relative ease. NoSQL databases enable the creation of specific data models for each application, enhancing agility in handling different data types without having to manage separate databases.

Schema Agnostic

Schemas are used to instruct a relational database on what data to expect and how to store the data. Any changes in data structures or data paths require extensive re-architecting of the database using a modified schema. On the contrary, NoSQL databases require no upfront design work before data is stored. These databases shorten development time by allowing developers to start coding and accessing data without having to understand the internal implementation of the database.

Non-Relational

Relational databases have strict restrictions on how tables associate with each other while relying on traditional master-slave architecture. On the other hand, NoSQL databases do not rely on tabularized data with fixed row and column records while running on peer-to-peer networks with no concept of relationships between their records. The databases, therefore, facilitate easy storage & retrieval and fast query speed for data-driven applications.

Distributable

NoSQL database systems are designed to use multiple locations involving different data centres/regions for global scalability. Thanks to their masterless architecture, NoSQL databases can maintain continuous availability through the replication of data in multiple read/write locations.

Easily Scalable

While relational databases are also scalable, their scalability is costly and complex since their architecture requires the addition of larger, more powerful hardware for scaling. NoSQL databases allow for linear scalability both vertically and horizontally, so developers can either provide more processors or powerful ones for increased workloads.

Types of NoSQL Databases

Document Databases

In this database, data is represented as a JSON-type document or object for efficient and intuitive data modelling. Document-type NoSQL databases simplify application development since developers can query and store data with the same document model format used in application source code. Document databases are flexible, hierarchical and semi-structured, so they scale to match with an application’s evolving needs.

Graph NoSQL Databases 

These databases store data in edges and nodes. A node stores information about entities while the edge stores information about the relationship between these entities. These databases are mostly used in applications that analyze relationships between users and other entities to identify patterns. These include social networking, knowledge graphs, recommendation engines and fraud detection.

Key-Value Databases

These store data in simple key-value pairs. Values are retrieved by referencing their keys, which simplifies how developers query for specific data entries. These databases are highly applicable in applications that need to store large amounts of data without needing complex query operations for retrieval. The most common use-case for key-value databases is the storage of user preferences

Wide-Column Stores

These databases use dynamic columns, rows and tables to store data. They are more flexible than relational databases since each row can have a different number of columns. Wide column stores are used for large amounts of data with predictable query patterns.

Popular NoSQL Databases

Some popular NoSQL Database management solutions include:

Cassandra

A free, open-source, distributed, wide-column store that powers mission-critical deployments with fault-tolerant replication, hybrid cloud support and audit logging. Apache Cassandra is trusted by major brands, such as Facebook, Netflix, Macy’s and MobilePay for scalability, high availability and improved performance. Follow this guide to explore the steps to deploy Cassandra StatefulSets with OpenEBS storage.

MongoDB

MongoDB is popular with modern application development as it relies on the document data model to simplify database management for developers. MongoDB includes functionality such as data-geolocation, horizontal scaling and automatic failover to accelerate development and adapt to application changes. Read this guide to discover how to provision Persistent Volumes for MongoDB StatefulSets in Kubernetes with OpenEBS.

Redis

An open-source, distributed, in-memory key-value data store that can be used in application development as a message broker, cache and database. Redis supports multiple types of abstract data structures and provides high availability through automatic disk partitioning and the Redis Sentinel framework. This guide demonstrates how OpenEBS can be used to provide persistent storage for Redis StatefulSets. 

OpenEBS for NoSQL Databases

OpenEBS simplifies the deployment of stateful Kubernetes workloads using a collection of data engines to implement persistent volumes. The OpenEBS control plane is deeply integrated with Kubernetes, and uses Kubernetes-friendly constructs to manage how volumes are provisioned, scheduled and maintained. With OpenEBS, cluster administrators can take advantage of dynamic provisioning for local and distributed volumes, depending on workload and cluster size. These features make OpenEBS a popular choice to orchestrate storage for stateful applications. The following section explores the steps taken to provision OpenEBS storage for NoSQL databases.

Why Adopt OpenEBS for NoSQL?

Many organizations and users have adopted OpenEBS to deploy and provision storage for their stateful workloads, including those who use NoSQL. Some of the following are reasons to adopt OpenEBS for NoSQL databases include:

Open-Source, Kubernetes-Centric Cloud Native Storage

OpenEBS follows a loosely coupled Container Attached Storage (CAS) architecture. OpenEBS itself is deployed as a workload on Kubernetes nodes, bringing the DevOps benefits of container orchestration in the application layer to the data layer. This allows developers to leverage the cloud-native benefits of Kubernetes such as agility and scalability for developing reliable, effective data-driven applications.

No Cloud Lock-in

With OpenEBS, application data is written into storage engines, creating a data abstraction layer. This allows developers to move data easily between multiple Kubernetes environments. OpenEBS can be deployed on-premises, local storage or managed cloud services -- thereby allowing NoSQL applications to simultaneously access data stored on different deployment platforms.

OpenEBS Enables Granular Deployment and Management of Workloads

The cloud-native, loosely-coupled architecture of OpenEBS clusters enable multiple teams to deliver faster since they are free of cross-functional dependencies. OpenEBS also makes it easier to declare policies and settings on a per-workload or per-volume basis, with constant monitoring to ensure workloads achieve desired results. Granularity makes it easier for developers to segregate large amounts of data based on data type, structure or use-case.

Reduced Storage Costs

The dynamic nature of NoSQL data often necessitates the over-provision of cloud storage resources to achieve higher performance and a lower risk of disruption. To help with this, OpenEBS relies on thin provisioning mechanisms to pool storage and grow data volumes when the NoSQL database needs it. While adjusting storage on the fly without disrupting volumes attached to workloads, OpenEBS enables cost savings of up to 60% leveraging a thin, dynamic provisioning.

Configuration Workflow

To provision high-performance storage for NoSQL databases using OpenEBS, the following steps are undertaken:

Installing OpenEBS - OpenEBS integrates seamlessly into the Kubernetes workflow and is installed into the cluster using installation modes available to Kubernetes applications, such as helm and with YAML files via kubectl. This activates a declarative storage control plane that can be managed from within the cluster.

  1. Selecting the OpenEBS storage engine - The storage engine represents OpenEBS data plane components. OpenEBS includes a number of storage engines and may automatically choose one that suits the storage available on nodes and application requirements. OpenEBS comes with three storage engines: cStor, Jiva, Local PV and Mayastor.
  2. Creating a Storage Class - The StorageClass is used to provision volumes on physical storage devices. These classes consume storage pools created on the disks attached to cluster nodes.
  3. Launching the NoSQL database - The database is then deployed into the cluster either using operators or helm charts

NoSQL database with OpenEBS

A typical architecture of NoSQL database with OpenEBS Persistent Volumes

Monitoring Deployment Metrics

OpenEBS includes post-deployment recommendations to ensure a successful deployment for NoSQL workloads. While developers are advised to allocate sufficient volume size during initial configuration, the volume size should be constantly monitored to ensure seamless operation. Additionally, developers and administrators should watch pool capacity and add more physical disks once the workload hits an 80% threshold. OpenEBS integrates with Kubernetes centric monitoring & logging tools such as Prometheus and Grafana for easier metric collection, analysis and visualization.

NoSQL databases enable data-driven application development since they facilitate global scalability, flexibility and delivery agility. Cloud-native application architectures are considered perfect for NoSQL databases since they deliver on-demand infrastructure for dynamic workloads. 

As an essential enabler, OpenEBS can orchestrate stateful Kubernetes workloads including NoSQL databases, offering multiple benefits such as reduced storage costs and zero lock-in. To know more on how OpenEBS can help to manage your organization’s stateful workloads, contact us here.

Murat Karslioglu
VP @OpenEBS & @MayaData_Inc. Murat Karslioglu is a serial entrepreneur, technologist, and startup advisor with over 15 years of experience in storage, distributed systems, and enterprise hardware development. Prior to joining MayaData, Murat worked at Hewlett Packard Enterprise / 3PAR Storage in various advanced development projects including storage file stack performance optimization and the storage management stack for HPE’s Hyper-converged solution. Before joining HPE, Murat led virtualization and OpenStack integration projects within the Nexenta CTO Office. Murat holds a Bachelor’s Degree in Industrial Engineering from the Sakarya University, Turkey, as well as a number of IT certifications. When he is not in his lab, he loves to travel, advise startups, and spend time with his family. Lives to innovate! Opinions my own!
Ranjith Raveendran
Ranjith is a Software Engineer in MayaData and has worked on the OpenEBS project from its beginning. He has 5+ years of experience in the Storage industry. Ranjith is interested in different solution approaches and has excellent knowledge of LocalPV and disk management. In his free time, he listens to music, watches movies, and goes for bike rides.
Naresh Deshaveni
Naresh Deshaveni is the Director of Solution Engineering, MayaData. He is passionate about developing, deployment solutions for application clusters by taking advantage of Kubernetes and providing scalable, highly available application stack. His past experience include building database scalability solution and building application and network security solution.