Kubernetes supports a powerful storage architecture that is often complex to implement unless done right. The Kubernetes orchestrator relies on volumes-abstracted storage resources - that help to save and share data between ephemeral containers. Since these storage resources abstract the underlying infrastructure, volumes enable dynamic provisioning of storage for containerized workloads.
In Kubernetes, shared storage is typically achieved by mounting volumes and connecting to an external filesystem or block storage solution. Container Attached Storage (CAS) is a relatively newer solution that allows Kubernetes administrators to deploy storage as containerized microservices in a cluster. The CAS architecture makes workloads more portable and simpler to modify storage based on application needs. Because CAS is deployed per workload or per cluster, it also eliminates the cross workload and cluster blast radius of traditional shared storage.
This article compares CAS with traditional shared storage to explore their similarities, differences and architecture overview.
Container Attached Storage (CAS) is a solution for stateful workloads that deploys storage as a cluster running in the cloud or on-premises. Unlike traditional storage options where storage is a shared filesystem or block storage running externally, CAS enables storage controllers that can be managed by Kubernetes. These storage controllers can run anywhere with a Kubernetes distribution, whether on top of traditional shared storage systems, or managed storage services like Amazon EBS. Data stored in CAS is accessed directly from containers within the cluster, thereby significantly reducing Read/Write times.
CAS leverages the container orchestrator’s environment to enable persistent storage. The CAS software has storage targets in containers that run as services. If desired, these services are replicated as microservice-based storage replicas that can easily be scheduled and scaled independently of each other. CAS services can be orchestrated using Kubernetes or any other orchestration platform as containerized workloads, ensuring the autonomy and agility of software development teams.
For any CAS solution, the cluster is typically divided into two layers:
Container Attached Storage is steadily becoming the de-facto standard for persistent storage of stateful Kubernetes workloads. CAS is most like the Direct Attached Storage that many current workloads expect, such as NoSQL, logging, machine learning pipelines, Kafka and Pulsar. Many workload communities and users have embraced CAS. CAS also allows small teams to retain control over their workloads. In short, CAS may be preferred where:
Shared storage was designed to allow multiple users/machines to access and store data in a pool of devices. Shared storage provided additional availability to workloads that themselves were unable to provide for their own availability; additionally, shared storage was able to work around the poor performance of underlying disk which at the time were able to deliver no more than 150 I/O operations per second. Today’s underlying drives can be 10,000 times more performant; massively faster than the performance requirements of most workloads.
A shared storage infrastructure typically consists of block storage systems in Storage Area Networks (SANs) or file system based storage devices in Network Attached Storage (NAS) configurations.
While the storage industry was once a rapidly growing industry, with growth rates in excess of 30% - 50% YoY in the late 1990s and early 2000s. In the 2010s this growth rate moderated and in certain years stopped entirely. In the 2020s growth started again, however, at a rate much slower than the exponential growth in the amount of data storage. Meanwhile, Direct Attached Storage and Cloud storage each grew more quickly in terms of capacity shipped and overall spending.
In traditional shared storage, all nodes in a network share the same physical storage resources but have their own private memory and processing devices. Files and other data can be accessed by any machine connected to the central storage.
For a Kubernetes application, traditional shared storage is first implemented by using monolithic storage software to virtualize physical storage resources, which could either be bare-metal servers, SAN/NAS networks or block storage solutions. The software then connects to Persistent Volumes that store cluster data. Each Persistent Volume (PV) is bound to a Persistent Volume Claim (PVC) which application PODs use to request a portion of the shared storage.
Both CAS and shared storage can utilize the Container Storage Interface (CSI). CSI is used to issue the commands to the underlying storage such as the need to provision a PV or to expand or snapshot that capacity.
A typical Traditional Shared Storage architecture
To scale up traditional shared storage, additional storage devices should be deployed and configured into the existing array.
Shared storage is used to manage large amounts of data generated and accessed by a number of different machines. This is because traditional shared storage enables high performance for large files with no bottlenecks or downtimes. Shared storage is also the go-to storage solution for organizations that depend on collaboration between teams. As data and files are managed centrally, shared storage allows efficient version control and consolidated information management. Traditional Shared Storage is also used to eliminate the need for multiple drives containing the same information, which helps reduce redundancies, thus increasing storage capacity.
The two storage options vary greatly in how they persist application data. While traditional shared storage relies on an external array of storage devices to persist data, CAS uses containers within an orchestrated environment.
Following are a few similarities and differences between CAS and Traditional Shared Storage:
Summary
Designed in Kubernetes, CAS enables agility, granularity and linear scalability, making it a favourite for cloud-native applications. Traditional shared storage offers a mature stack of storage technology that mainly falls short in persisting storage for stateful applications due to the inherent lack of linear scalability. CAS is a novel solution that enables the implementation of storage controllers to exist in userspace, allowing maximum scalability.
OpenEBS, a popular CAS based storage solution, has helped several enterprises run stateful workloads. Originally developed by MayaData, OpenEBS is now a CNCF project with a vibrant community of organizations and individuals alike. This was also evident from CNCF’s 2020 Survey Report that highlighted MayaData (OpenEBS) in the top-5 list of popular storage solutions.
Canonical definition of Container Attached Storage:
https://www.cncf.io/blog/2020/09/22/container-attached-storage-is-cloud-native-storage-cas/
To read Adopter use-cases or contribute your own, visit: https://github.com/openebs/openebs/blob/master/ADOPTERS.md.
CNCF 2020 Survey Report:
https://www.cncf.io/wp-content/uploads/2020/11/CNCF_Survey_Report_2020.pdf
OpenEBS LocalPV Quick Start Guide:
https://docs.openebs.io/docs/next/localpv.html