An Overview of Disaggregated Storage
Infrastructure disaggregation allows organizations to use computing resources more effectively by enabling scalability and flexibility. Disaggregation involves decoupling data center resources of memory, compute and storage, so that each resource can be scaled and provisioned independently. Disaggregation is already widely adopted in cloud computing platforms, with various clouds offering storage systems that are completely independent of compute instances.
Disaggregated storage is a version of composable disaggregated infrastructure that offers scalable storage by connecting various physical storage devices over a network fabric to form a logical storage pool, in some cases on demand. Disaggregated storage enables the creation of dynamic environments where the compute and storage resources scale elastically based on the application’s workload. Separating storage and compute instances also means that storage can be scaled and managed without interfering with the availability of an application’s services.
Disaggregated Storage Architecture
Traditionally, disaggregated storage decouples storage from computing by combining multiple storage devices into a logical pool, then provisioning storage to server instances. The storage devices are connected in a network fabric, emulating a Storage Area Network (SAN) that allows for flexible scaling of storage resources available to an application. However, unlike a traditional SAN which can tightly couple workloads together in a shared storage dependency, today’s disaggregated storage can offer non-local storage that retains the per workload design and scalability of Direct Attached Storage while also offering scalability, resource utilization, manageability and other benefits of non-local storage.
One prominent trend in disaggregated storage is the use of high-speed Non-Volatile Memory Express over Fabric (NVME-oF) and in particular, NVMe over TCP to connect storage devices over a network. NVMe improves the speed and performance of Solid State Storage Devices (SSDs) using flash memory and uses the PCI-Express bus to connect SSDs to servers. Storage Disaggregation with NVMe-oF isolates high-performance SSDs from server CPUs then makes them available to remote compute nodes across this low latency and low jitter protocol. NVMe over TCP was introduced to the Linux kernel in the fall of 2020; details on the specification including the end to end latency guarantees it offers can be found here:
Disaggregated Storage Adoption
The growing emphasis on high performance and low latency for modern applications has led to the dominant role of Direct Attached Storage by many workloads, including machine learning, NoSql such as Cassandra, and logging such as ElasticSearch. Disaggregated storage over NVMe-oF promises to increase the use of disaggregated storage for the reasons explained above. Using shared infrastructure Kubernetes as extended by per workload storage it is possible to assign just the right amount of storage and compute resources for each workload as needed. For software platforms using disaggregated, pooled storage, it is possible to borrow storage or CPU resources from lower priority applications so that high-performance applications can automatically scale seamlessly with changes in workloads.
Requirements for Optimum Performance of Disaggregated Storage:
To allow independent application optimization and resource scaling, certain storage disaggregation requirements should be met. These include:
- High-Speed Network Fabric - Disaggregated storage should meet stringent Quality of Service (QoS) requirements in regards to access speeds and latency. The network connecting the pooled network storage to compute servers should be highly scalable, high performance and not prone to congestion so that multiple hosts can access the storage quickly.
- Fast Storage and Networking Protocols - SSDs can be under-utilized when they are directly attached to compute nodes. Disaggregated storage requires efficiency and quick transfer protocols such as NVMe and NVMe-oF to extract the maximum Input Output operations Per Second (IOPS) from SSDs at much lower latency than prior protocols such as iSCSI.
- Fast, Secure, Extensible I/O Controllers - The storage controllers should be able to quickly and securely perform read/write operations on the underlying SSDs and ideally should scale much as the workloads they support scale - horizontally, using a loosely coupled architecture to ensure both scalability and resilience.
Types of Storage Disaggregation
There are several forms of storage disaggregation which include:
This form of disaggregation does not require constantly running controller monitors as storage abstraction is performed during service-level configuration. The storage pool can be reconfigured to handle different workloads either during deployment or when a storage rack is rebuilt for a different application.
This is another non-dynamic form of disaggregation in which storage drives are reallocated to different hosts only during application failure. A reconfiguration also happens rarely, and this type of disaggregation further improves fault tolerance in applications.
Dynamic Elastic Disaggregation
In this case, the pooled drives are often pooled and then connected to multiple input-output (I/O) controllers, and each server can connect to more than one drive at a time. Storage reconfiguration will happen frequently as server requests and workloads vary, requesting the provision of different storage drives every couple of hours.
This form of disaggregation assumes complete abstraction of storage resources, meaning any host can connect to any storage drive through any I/O controller. This also means that infrastructure reconfiguration will happen dynamically, as the server-storage connections readjust to fit each I/O request. Kubernetes can be extended to act as the host for the I/O controllers, for example, enabling them to scale horizontally while delivering capabilities via disaggregation to workloads on demand.
Benefits of Disaggregated Storage:
Disaggregated storage offers various improvements for computing and infrastructure provision. These include:
- Improved Resource Utilization - Disaggregated storage enables dynamic allocation of storage resources based on priority and application needs. Additionally, it lets organizations take advantage of the fast I/O speeds offered by SSDs. This means organizations can put all the available storage resources to work, provisioned as needed based on the best fit between application requirements and the I/O and capacity and throughput capabilities of the devices.
- Disaggregated Storage makes SSD capacity flexible and scalable - With disaggregated storage, organizations can allocate as many SSDs as they want to an application, then grow/reduce their capacity with application requirements.
- Makes scaling simple - Disaggregated storage allows firms to create storage architecture that scales dynamically to meet changes in resource requirements using a Shared-Nothing architecture.
- Enables the creation of high-performance applications - Disaggregated storage allows for the allocation of throughput and read-write speeds as needed to meet the demand of workloads. This means that all application users can access their stored data with little latency, making applications fast and efficient.
Emerging Trends in Disaggregated Storage
Several technologies in recent times have spurred the development and adoption of Disaggregated Storage as an alternative to Direct Attached Storage. Non-Volatile Memory Express (NVMe) and Non-Volatile Memory Express Over Fabric (NVMe-oF) have enabled better utilization of SSDs through high-speed IO and networking.
Public Cloud Web Scalers such as Amazon’s EBS and Azure’s Blob Storage are built with plenty of compute instances optimized for better storage capacity and throughput. They utilize optimized hardware and software infrastructure to provide remote block storage devices to an enormous number of distributed servers.
How Kubernetes enables Disaggregated Storage:
Disaggregated storage and Kubernetes are the perfect match since Kubernetes creates a flexible, highly scalable deployment environment capable both of scaling and orchestrating workloads and, potentially, storage controllers. Kubernetes uses Persistent Volumes and Persistent Volume Claims -- abstractions that attach PODs to physical storage based on container storage needs -- to offer flexible storage for clusters. Through the Container Storage Interface (CSI), Kubernetes allows third-party storage providers to create block and file storage solutions by extending volume functionality.
With CSI, organizations can virtually separate compute and storage layers, enabling disaggregated storage for applications. Kubernetes CSI Plugins typically come in two flavors:
- Storage Drives - Can be maintained outside the Kubernetes cluster, thereby allowing the application to be configured to consume resources dynamically leveraging Storage Classes and Persistent Volume Claims.
- Container Attached Storage (CAS) - This model enables per workload storage by assigning containerized storage controllers to workloads as needed. . Storage is run in clusters with control plane elements in master nodes while data plane workloads run on worker nodes. The data plane nodes can either be local nodes or can be disaggregated storage targets that can be scheduled and scaled independently by controllers in master nodes. Detailed advantages of the CAS architecture are discussed in this CNCF blog here.
Recent surveys confirm that OpenEBS is one of the most popular CAS storage projects OpenEBS is simple to operate and use in large part because it looks and functions like other cloud-native and Kubernetes friendly projects - avoiding the shared dependencies and scalability challenges of traditional shared everything storage architectures.
By using the CAS model, each volume has a dedicated controller POD and a set of one or more replica PODs.
OpenEBS relies on a control plane to provision volumes and performs associated volume actions. It includes a PV provisioner which dynamically creates the specific deployment requirements for volume replica PODs and target controller PODs on the right nodes. The OpenEBS data plane includes a storage engine that implements the actual Input-Output (I/O) path for cluster volumes or in LocalPV mode can enable local or disaggregated access to storage devices. The storage engines run as microservices in the user space, and can be flexibly provisioned and scaled to match workload requirements.
To know more about the different components of a CAS-based OpenEBS and their functions, please refer to this link.
Disaggregated storage enables high-speed, flexible and highly scalable applications by separating compute from storage. With disaggregated storage, organizations benefit from low-latency and enhanced responsiveness of workloads by leveraging high-speed NVMe SSDs. The speed, flexibility and low latency of disaggregated storage offer a unique storage solution by optimizing resource consumption while reducing TCO.
Kubernetes enables disaggregated storage through CSI connectivity and the ability to scale and orchestrate storage software. MayaData’s OpenEBS Container Attached Storage (CAS) deploys a data management plane that architecturally mirrors Kubernetes’ application management plane. OpenEBS unifies disaggregated storage into a component in the Kubernetes application layer. For Kubernetes applications, OpenEBS creates a unified storage infrastructure on top of heterogeneous hardware and software deployed in an enterprise data center. This approach simplifies developer’s lives, gives control to DevOps and provides complete usage visibility to CxOs. OpenEBS makes the management of stateful applications across an enterprise data center easy, predictable and resilient. What Kubernetes does for distributed application management, OpenEBS does for distributed data management, especially when enabled with disaggregated storage.
To learn more about how OpenEBS implements disaggregated storage for Kubernetes, please join the OpenEBS community.