Back before Kubernetes supported stateful sets, a number of engineers including the founders of the OpenEBS project anticipated the potential of Kubernetes as a means to build and operate scalable, easy to use storage and data orchestration software. Fast forward a few years and according to recent surveys OpenEBS has become the most popular open source Container Attached Storage; we estimate that nearly 1 million nodes of OpenEBS are deployed per week 1. OpenEBS has become a particular favorite when deploying and operating resilient workloads on Kubernetes such as Cassandra and Elastic, with reference users including ByteDance / TikTok, Flipkart and Bloomberg.
In this blog I am pleased to announce the release of OpenEBS 3.0 - a major release that includes several enhancements and new features both at the storage layer and at the data orchestration layer.
Before I explain more about OpenEBS 3.0, let me share some background on the use of Kubernetes for data.
First, it is worth emphasizing that Kubernetes is popular in part because it enables small teams to be largely autonomous and to remain loosely coupled from the rest of the organization much as the cloud native workloads they built also remain loosely coupled from other workloads.
Secondly, the desire for small team and workload autonomy has led to the widespread use of Directed Attached Storage and similar patterns for cloud native stateful workloads. Workloads such as Cassandra and Elastic and dozens of others and the teams that run them would rather have their own systems dedicated to these workloads than rely on the vagaries and shared blast radius of traditional shared storage. Shared storage in many ways is redundant or moot when the workloads themselves are highly resilient and loosely coupled - especially when these workloads run on Kubernetes which provides a common approach to scalable deployment and operations.
Container Attached Storage, of which OpenEBS is the leading open source example, emerged in response to the needs for autonomy and control by these small teams and workloads. You can think of Container Attached Storage as providing “just enough” per workload capabilities without introducing the sort of shared dependencies that today’s organizations, and API-first cloud native architectures, are built to avoid.
1Container pulls of OpenEBS are in excess of 600k per week and especially larger OpenEBS users do not deploy directly from the internet, instead proxying container pulls and running them through a secure pipeline and hosting them in their own secure repository.
And yet, the expertise of the engineers that have built shared storage systems and the operators that have kept them running remain invaluable. One common pattern that we see at the forefront of the use of Kubernetes for data is the need to address use cases that were common when discrete shared storage systems were the foundation of architectures. Many of these use cases, such as capacity management and building various policies around the requirements of workloads, are at least as important as before. One theme of OpenEBS 3.0 is that these and related use cases are being reimagined and implemented in a way that I see as extremely promising, delivering both the autonomy needed by users of Kubernetes for data and the control and visibility required by enterprises increasingly relying upon Kubernetes to build and operate the software that runs their business.
In the sections to follow, I highlight the various enhancements in OpenEBS 3.0 and the benefits of using OpenEBS for Kubernetes Storage.
We are grateful for the support and contributions of the vibrant open-source community that OpenEBS has received. We are also thankful to the Cloud Native Computing Foundation (CNCF) for including OpenEBS as one of its storage projects. And a special thanks to the CNCF for being a reference user of OpenEBS as well - you can read about their experience and that of others including TikTok / ByteDance and Verizon / Yahoo on Adopters.md. Collectively, these aspects have helped my team to notice challenges and opportunities and of course to resolve bugs and improve the polish of OpenEBS with each release.
One can visualize the direction of the additions to OpenEBS as going in two directions or dimensions:
Horizontally - enabling the use of resilient workloads at significant scale - often using LVM or a variety of similar alternatives on local nodes - here we see users like Flipkart and Bloomberg helping us understand what it is like to run dozens of workloads on thousands of nodes with the help of Kubernetes.
Vertically - pushing down to the layers of NVMe, IOring, SPDK and so on to provide a software defined data storage layer for replicated storage that is refactored, written in Rust, and performant. This is of course a multi-year project - which we call OpenEBS Mayastor. We are a couple of years into working on this project and so far, so good. Use cases include the Kubernetes edge, including running on ARM form factors, and on clouds and in the DC where workloads need additional resilience and data services.
Advances in OpenEBS 3.0 across the horizontal dimension, including local node management, capacity based scheduling, and other operational improvements include:
OpenEBS uses LocalPV provisioners to connect applications directly with storage from a single node. This storage object, known as LocalPV, is subject to the availability of the node on which it is mounted, making it a handy feature for fault-tolerant applications who prefer local storage over traditional shared storage. The OpenEBS LocalPV provisioner enables Kubernetes-based stateful applications to leverage several types of local storage features ranging from raw block devices to using capabilities of filesystems on top of those devices like LVM and ZFS.
OpenEBS 3.0 includes the following enhancement to the LocalPV provisioner:
OpenEBS can also use ReplicatedPV provisioners to connect applications to volumes - whose data is synchronously replicated to multiple storage nodes. This storage object, known as ReplicatedPV, is highly available and can be mounted from multiple nodes in the clusters. OpenEBS supports three types of ReplicatedPVs Jiva (based on Longhorn and iSCSI), CStor (based on ZFS and iSCSI) and Mayastor (based on SPDK and NVMe).
Some enhancements to replicated storage engines in OpenEBS 3.0 include:
Advances in OpenEBS 3.0 in the vertical dimension, including addition resilience with performance via Mayastor, (beta) include:
Beyond the improvements to the data engines and their corresponding control plane, there are several new enhancements that will help with ease of use of OpenEBS engines:
OpenEBS is used as a persistent storage solution for many stateful Kubernetes applications as it offers benefits such as:
Open-source cloud-native storage
Built fully in Kubernetes, OpenEBS follows a loosely-coupled architecture that brings the benefits of cloud-native computing to storage. The solution runs in the Kubernetes userspace, which makes it portable enough to run on any platform/operating system.
Eliminate vendor lock-in
When an Open EBS Storage Engine (such as cStor, Mayastor or Jiva) is used, it acts as a data abstraction layer. The data can easily be moved between different Kubernetes environments, whether it's on-premise, traditional storage or in the cloud.
Granular policies for stateful workloads
Since OpenEBS volumes are managed independently, organizations can enable collaboration between small, loosely coupled teams. Storage parameters for each volume and workload can be monitored independently, allowing for the granular management of storage policies.
Reduced Storage Costs
Microservices-based storage orchestration allows for thin provisioning of pooled storage, and data volumes can be grown as needed. This also means that storage volumes can be added instantaneously without disrupting applications or volumes exposed to workloads.
High Availability
With Container Attached Storage (CAS), storage controllers are rescheduled in case of node failures. This allows OpenEBS to survive pod restarts while the data stored is protected through synchronous replication of data engines. When a node fails, only the volume replicas in that node are lost.
Disks Managed Natively on Kubernetes
OpenEBS includes the Node Disk Manager (NDM) that enables administrators to manage disks using inherent Kubernetes constructs. NDM also allows administrators to automate storage needs such as performance planning, volume management, and capacity planning using efficient pool and volume policies.
With an upgrade to OpenEBS 3.0, you also get:
Storage orchestration in Kubernetes requires a novel approach since the platform was initially built to manage stateless, ephemeral containers. This was one of the most critical problems that we had on our minds while we started developing OpenEBS. We helped to build various projects, such as NDM, LocalPV and Read-Write-Many (RWX) PVCs to ensure the platform is helpful for Kubernetes administrators to handle common storage challenges - including the use of Kubernetes for cloud native, resilient applications. Besides this, our idea of leveraging the CAS architecture was to enrich Kubernetes storage with additional benefits such as a lower blast radius, vendor and cloud provider agnostic storage, small team agility and control, and granularity of storage policies to embrace the specific and highly varied needs of workloads.
With its 3.0 release, OpenEBS is further enhanced and mature to support stateful applications. But the journey doesn’t stop here and there is more to come!
You can also join me on September 30th at the CNCF On-Demand Webinar: OpenEBS 3.0 : What's in it where I will talk about the new features, upgrades & bug fixes and give glimpses into what is coming in OpenEBS 3.1 and 4.0. Feel free to register & attend: https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cncf-on-demand-webinar-openebs-30-whats-in-it/
To know more about recent updates, developer documentation, or scheduled releases, please feel free to use any of the following resources.
Read the release notes here: https://github.com/openebs/openebs/releases/tag/v3.0.0
To read Adopter use-cases or contribute your own, visit: https://github.com/openebs/openebs/blob/master/ADOPTERS.md.
Several OpenEBS users are active on the Kubernetes #OpenEBS channel. Feel free to join us, ask questions, and share your experience
For those who are interested in Rust and the performance of data on Kubernetes, there is also a discord server dedicated for OpenEBS Mayastor: https://discord.gg/zsFfszM8J2