How to provision Direct Attached Storage (DAS) for a Kubernetes Cluster

Direct Attached Storage

Introduction: What is Direct Attached Storage?

Computers and servers come with internal storage devices accessible to only one machine and not any other. In personal computers, these could be Hard Disk Drives (HDDs) or Solid State Drives (SSDs). In Servers, such local drives for storage are known as Direct Attached Storage (DAS). 

DAS can also refer to systems including drives that are directly attached to a server. In this case DAS can actually use often routed protocols however it would not be using switches or routers in the path. For example, you can connect to devices and to directly attached systems through:

  • Serial Advanced Technology Attachment (SATA)
  • Fibre Channel (FC)
  • Small Computer System Interface (SCSI)
  • iSCSI 
  • Serial-Attached SCSI (SAS).

Again, DAS storage  non-networked. External drives and enclosures that can be connected through USB, SATA, or other interfaces can also be part of DAS. A key aspect of DAS is that all storage devices are controlled by a single machine to which it is attached. Any device that intends to access data from the DAS storage has to be authenticated and authorized by the machine attached to the DAS. 

Types of DAS Storage Devices:

There are various types of devices available to expose DAS storage to servers and workstations. These include:

Single Spinning Hard Drive:

A Hard Disk Drive (HDD) is an electro-magnetic data storage device that stores and retrieves information from spinning magnetic cylinders. Hard Disks are affordable, high-capacity storage devices that allow enterprises to keep large amounts of data for little money. These drives are, however, prone to mechanical damage and should only be used in areas with little movement. Since they are made of moving, rotating platters, HDDs are prone to wear, and typically last 5-10 years.

Solid State Drive (SSD):

These drives use persistent electronic memory to store data. Electronic memory is characterized by super fast read/write times and is only limited by the interfaces used by the DAS devices to access data. There are two types of SSD devices:  

  • PCIe SSDs that are rugged, quiet, portable, and have speeds of up to 400 MBps, making them particularly useful for high definition media transfers. These disks are, however, costly compared to HDDs, and typically have small capacities. 
  • NVMe SSDs are silent, powerful, and rugged SSDs capable of read/write speeds of up to 2500 MBps or even higher. These disks get expensive with larger capacities. 

HDDs and SSDs can be grouped together to form a Redundant Arrays of Inexpensive Disks (RAID). A RAID can be thought of as a single disk composed of many smaller disks. This type of storage virtualization improves capacity, performance and can also be used to provide data redundancy to recover from the loss of a single disk. 

RAID can be enabled for both SSDs and HDDs: 

HDD RAIDs

This configuration offers the least expensive method of having a large storage capacity. RAID can help address the relative fragility of HDDs by ensuring that systems are resilient despite single or even multiple disk failures.  . As disk drives have become larger without increasing substantially their average speed - rebuild times have become a major concern.  Rebuild times of days can be possible for multi TB disks, resulting in the increased use of RAID approaches such as RAID-6 that have two parity disks for resiliency in order to reduce or eliminate the risk of data loss due to two disk failures.   

SSD RAIDs:

SSD based RAID can be much faster, reducing the risk of a second disk loss during a rebuild, with some configurations capable of achieving speeds of 3000 MBps. Since SSD memory drives cannot suffer mechanical damage, these RAIDs are rugged. They can support any form of content, from light files to large, high-definition multimedia editing. These RAIDs are, however, more expensive to set up and offer a limited storage capacity when compared to HDD RAIDs.  

Features of DAS

Benefits:

  • DAS offers high-performance I/O operations since the storage is directly connected to the machine consuming it. DAS storage is not directly affected by network latency or connectivity issues. 
  • Internal and External direct storage is simple to set up, configure, and access. Even external storage is typically ‘plug-and-play’ so long as the appropriate interface exists. 
  • DAS storage is also simpler to characterize - performance is simple to project, especially if any use of RAID is simple.  By comparison it can be extremely complex to model the performance of shared storage under load, with many workloads competing for the capabilities of the shared storage.  
  • DAS storage is perceived as less resilient than shared storage systems, however the blast radius of DAS is typically dramatically less than that of shared storage.  When a shared storage system fails it can cause the failure of all workloads, even if those workloads have been written in a cloud native manner in an effort to improve resilience.  
  • DAS storage is affordable since the architecture only consists of the storage device and a connector, which can make it cost-effective as compared to  traditional shared storage which often require specialized software and infrastructure to run.

Drawbacks:

  • DAS relies on a bank of physical storage devices, so scaling is limited to the capacity and number of available drives. Scaling requires a shutdown and physical upgrades before the system gets back up.
  • To share data with other computers, a network connection is required, and this introduces the possibility of poor performance.
  • With DAS, it is impossible to manage data centrally, or to create backups or restore applications easily when the system goes down.
  • DAS also makes it more difficult to place multiple workloads on the same storage systems resulting in decreased resource utilization and thereby higher data center and systems costs.

Setting up DAS Storage for Kubernetes using OpenEBS

The CNCF project OpenEBS is amongst the most popular means of deploying many different types of  Local Persistent Volumes to create direct-storage that is accessible from a single node of a cluster. For PODs to use the Local Volume for data storage, they are scheduled on the node into which the volume has been provisioned.

To get started in creating DAS storage for a Kubernetes node using OpenEBS Local Persistent Volume, OpenEBS should be installed on the node using the command:

$ kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml

This command installs OpenEBS in default mode, which is enough for an experimental cluster.

To install OpenEBS in custom mode, additional details on advanced configuration options can be found here.

OpenEBS uses a Dynamic Provisioner to create Persistent Volumes for Kubernetes Cluster Node. While doing so, a default approach is to save persistent data using OpenEBS Local hostPath Volumes.  

To learn more about the different flavors of LocalPV available from OpenEBS and how users like Flipkart and ByteDance (TikTok) utilize them, please read this blog from MayaData co-founder and OpenEBS project lead Kiran Mova:  https://openebs.io/blog/how-are-tiktok-flipkart-kubesphere-and-others-using-openebs-for-local-volumes/

To proceed with  the default installation mode, hostPath volumes are created under the /var/openebs/local directory.

For custom installation, a StorageClass named local-hostpath-sc.yaml with a custom BasePath is created with values similar to:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-hostpath
  annotations:
    openebs.io/cas-type: local
    cas.openebs.io/config: |
      - name: StorageType
        value: hostpath
      - name: BasePath
        value: /var/local-hostpath
    provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Note to edit the name and BasePath to match the production environment. 

This StorageClass is then created using the command:

$ kubectl apply -f local-hostpath-sc.yaml

A PersistentVolumeClaim local-hostpath-pvc.yaml that the PODs will use to request the hostPath from the provisioner is created with specification similar to:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: local-hostpath-pvc
spec:
  storageClassName: openebs-hostpath
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5G

The PVC is then created using the command:

$ kubectl apply -f local-hostpath-pvc.yaml

A POD is then created to access data from OpenEBS storage using the hostPath. The POD’s manifest file will contain specification similar to:

apiVersion: v1
kind: Pod
metadata:
  name: hello-local-hostpath-pod
spec:
  volumes:
  - name: local-storage
    persistentVolumeClaim:
      claimName: local-hostpath-pvc
  containers:
  - name: hello-container
    image: busybox
    command:
      - sh
      - -c
      - 'while true; do echo "`date` [`hostname`] Hello from OpenEBS Local PV." >> /mnt/store/greet.txt; sleep $(($RANDOM % 5 + 300)); done'
    volumeMounts:
    - mountPath: /mnt/store
      name: local-storage

Using the specification above, a POD is created using the command:

$ kubectl apply -f local-hostpath-pod.yaml

These steps collectively when run in order, create a Persistent Local Volume on the Node that the POD local-hostpath-pod.yaml can read and write data onto.

OpenEBS for Provisioning Direct Storage

DAS offers an efficient storage solution that can provide higher throughput and reduced latency. Additionally DAS can reduce the complexity and the blast radius of managing stateful workloads.In Kubernetes, Local PVs represent a directly-attached local disk or cloud volume that is attached to a single Kubernetes Node. 

OpenEBS extends the agility and affordability of DAS to Kubernetes through the Persistent Volume Dynamic Provisioner. Through the dynamic provisioner, OpenEBS initiates volume provisioning to application PODS and implements the Kubernetes specification for PVs. This solution persists data on a node so that PODs scheduled to it can consume data stored in the volume. 

While using OpenEBS as the preferred storage solution for Kubernetes stateful workloads, users have earlier reported several benefits, including: 

  • Immediate provisioning: As quick as less than one minute thanks in part to integration with Helm Charts. 
  • Per workload and per team control: Each workload and team has its own OpenEBS with its own storage policies. This approach is consistent with DevOps governance and culture. 
  • Benefits of an orchestrated architecture: These include simpler upgrades, higher velocity of development, independent scaling, and cloud independence. 

OpenEBS also offers replicated storage and other capabilities for those uses that would like more out of their Container Attached Storage than DAS or LocalPV can offer and yet do not want to use traditional shared storage.  

References: 

https://mayadata.io/assets/pdf/WP-OpenEBS-0_7.pdf

https://thenewstack.io/how-kubernetes-provides-networking-and-storage-to-applications/

https://openebs.io/blog/how-are-tiktok-flipkart-kubesphere-and-others-using-openebs-for-local-volumes/

https://docs.openebs.io/docs/next/cas.html

Kiran Mova
Kiran evangelizes open culture and open-source execution models and is a lead maintainer and contributor to the OpenEBS project. Passionate about Kubernetes and Storage Orchestration. Contributor and Maintainer OpenEBS projects. Co-founder and Chief Architect at MayaData Inc.
Murat Karslioglu
VP @OpenEBS & @MayaData_Inc. Murat Karslioglu is a serial entrepreneur, technologist, and startup advisor with over 15 years of experience in storage, distributed systems, and enterprise hardware development. Prior to joining MayaData, Murat worked at Hewlett Packard Enterprise / 3PAR Storage in various advanced development projects including storage file stack performance optimization and the storage management stack for HPE’s Hyper-converged solution. Before joining HPE, Murat led virtualization and OpenStack integration projects within the Nexenta CTO Office. Murat holds a Bachelor’s Degree in Industrial Engineering from the Sakarya University, Turkey, as well as a number of IT certifications. When he is not in his lab, he loves to travel, advise startups, and spend time with his family. Lives to innovate! Opinions my own!
Ranjith Raveendran
Ranjith is a Software Engineer in MayaData and has worked on the OpenEBS project from its beginning. He has 5+ years of experience in the Storage industry. Ranjith is interested in different solution approaches and has excellent knowledge of LocalPV and disk management. In his free time, he listens to music, watches movies, and goes for bike rides.