Introduction- The NVMe Protocol
Non-Volatile Memory express (NVMe) is a storage access protocol that lets the CPU access SSD memory through the Peripheral Component Interconnect Express (PCIe). Through a set of protocols and technologies, NVMe dramatically accelerates the way data is transmitted, stored and retrieved. With NVMe, the CPU accesses data on SSDs directly, enabling maximum SSD utilization and flexible scalability. NVMe allows for Storage Disaggregation and can be combined with Kubernetes for scale-out applications.
This blog explores how NVMe redefines storage orchestration in Kubernetes.
Advantages of NVMe for Distributed Storage:
By using the PCIe interface to connect CPUs to SSDs, NVMe removes layers connecting compute to storage, allowing efficient storage abstraction and disaggregation. This offers various benefits for modern data centers, including:
- Efficient Memory Transfer - NVMe uses one ring per CPU to communicate directly with SSD storage, reducing the internal locking speeds for Input-Output controllers. NVMe also supports message signaled interrupts to prevent CPU bottlenecks, making storage efficient and scalable. NVMe reduces latency by combining message signaled interrupts with the large number of cores in CPUs to enable I/O parallelism.
- NVMe offers massive Queue Parallelism - Unlike SATA which supports a maximum 32 commands per queue, NVMe utilizes a private queuing which provides up to 64 thousand commands per queue over 64 thousand queues. This is because Each I/O controller gets its own set of queues, which linearly increases throughput with the number of CPU cores available.
- NVMe offers improved Security - The NVMe over Fabric specification supports secure tunneling protocols produced by reputable security communities such as the Trusted Computing Group (TCG). This means that NVMe enables enterprise-grade security features such as Access Control, Data Encryption at REST, Purge-Level Erase and Crypto-erase among others.
- NVMe relies on an efficient command set - The protocol relies on a simple, streamlined command set which halves the number of CPU instructions needed to process I/O requests. Besides offering lower latencies, this scheme enables advanced features such as power management and reservations, which extends the benefits beyond input-output operations.
Non Volatile Memory express-over Fabrics (NVMe-oF) is a specification that allows CPUs to connect to SSD Storage devices across a network fabric. This is designed to harness the benefits of the NVMe protocol over a Storage Area Network (SAN). The host computer can target an SSD storage device using an MSI-X based command while the network can be implemented using various networking protocols, including Fiber Channel, Ethernet or Infiniband.
NVMe-oF has found wider popularity in modern networks since it allows software organizations to implement scaled out storage for highly-distributed, highly-available applications. By extending the NVMe protocol to SAN devices, NVMe-oF makes CPU usage efficient while improving connection speeds between applications on servers and storage.
NVMe-oF supports various data transfer mechanisms, such as:
- Fiber Channel
NVMe-oF interfaces networked flash storage with compute servers, enabling applications to run on shared network storage, thereby providing additional network consolidation for data centers. The SSD targets can be shared dynamically among application workloads, allowing for the efficient consumption of resources, flexibility and scalability.
Kubernetes Orchestration and Storage Persistence
While containers are transient, Kubernetes enables stateful applications by providing abstractions that reference a physical storage device. A containerized application is virtually isolated from other processes and applications running on other containers. This makes the Kubernetes environment highly flexible and scalable, as it allows applications to run in virtual machines, bare metal systems, supported cloud systems, or a combination of various deployments. While there are benefits to this approach, it also presents a challenge when there is the need to store and share data between containers.
Kubernetes offers various abstractions and options for attaching container PODs to physical storage, such as:
- Persistent Volumes & Persistent Volume Claims
- Storage Classes
- The Container Storage Interface (CSI) and Storage Plugins
Challenges of Orchestration using Direct Attached Storage (DAS)
While Direct Attached Storage (DAS) offers a simple, highly available and quick storage, DAS alone is not sufficient to run Kubernetes clusters. This is because DAS devices have a limited storage capacity that cannot be dynamically provisioned to match stateful Kubernetes workloads. Additionally, DAS doesn’t incorporate networking capabilities or facilitate data access by different user groups since storage is only directly accessible to individual servers/desktop machines, while Kubernetes orchestrates on distributed clusters.
NVMe for Kubernetes
NVMe extends the low latency of DAS to Network Attached Storage devices by connecting servers to SSDs over a high-speed PCIe-oF interface. This makes NVMe an efficient option to provide storage for dynamic, extensible and flexible stateful applications running on Kubernetes. The Container Storage Interface (CSI) standard connects these pooled NVMe devices to Kubernetes clusters running stateful applications. By combining the low-latency networked storage offered by NVMe-oF and the flexibility of the CSI plugin, organizations can provide an efficient, agile and demand driven storage solution for Kubernetes applications.
NVMe-oF Persistent Volumes
To avoid the bottlenecks of running NVMe SSDs on a single, local server, several organizations are working to enable an NVMe-oF plugin for Kubernetes storage. Kubernetes enables the use of REST APIs to allow control of the storage provisioner through the NVMe-oF protocol. The storage provisioner then creates standard Volume API objects that can be used to attach a portion of pooled NVMe SSDs to a POD. Kubernetes PODs and other resources can then read and write data onto this pooled storage like any persistent volume object.
OpenEBS created by MayaData is a popular agile storage stack for stateful Kubernetes applications that need minimal latency. The software infrastructure and plugins from OpenEBS integrate perfectly with the rapid, disaggregated physical storage offered by NVMe-oF. Integrating NVMe SSDs with OpenEBS plugins allows for simpler storage configurations for loosely coupled applications with stateful workloads.
OpenEBS is one of the popular open-source, agile storage stacks for performance-sensitive databases orchestrated by Kubernetes. Mayastor, OpenEBS’s latest storage engine, delivers very low overhead versus the performance capabilities of underlying devices. While OpenEBS Mayastor does not require NVMe devices and does not require the workloads to access data via NVMe, an end to end deployment from a workload running a container supporting NVMe over TCP through the low overhead OpenEBS Mayastor and ultimately NVMe devices will understandably perform as close as possible to the theoretical maximum performance of the underlying devices.To learn more about how OpenEBS Mayastor, leveraging NVMe as a protocol, performs when leveraging some of the fastest NVMe devices currently available on the market, visit this article.
OpenEBS Mayastor builds a foundational layer that enables workloads to coalesce and control storage as needed in a declarative, Kubernetes-native way. While doing so, the user can focus on what's important, that is, deploying and operating stateful workloads.
If you’re interested in trying out Mayastor for yourself, instructions for how to set up your own cluster, and run a benchmark like `fio` may be found at https://docs.openebs.io/docs/next/mayastor.html.
Mayastor NVMe-oF TCP performance - https://openebs.io/blog/mayastor-nvme-of-tcp-performance/
Lightning-fast storage solutions with OpenEBS Mayastor and Intel Optane -