ZFS based cStor- the Storage Engine of OpenEBS (built on Kubernetes)

A small introduction about myself, I work for a company named MayaData who develops an Open Source software called OpenEBS (CNCF Sandbox project) that simplifies the deployment of stateful applications on Kubernetes. Haven't checked it out yet? Click here


OpenEBS, the leading block storage solution on K8s and CAS architecture, uses ‘cStor’ [pronunciation: see-stor] as one of its storage engines. This solution has been available in the field for more than half a year now.

cStor uses ZFS behind the scenes by running it in the user space. In this post, I will highlight a few challenges that our team faced while building this data engine.

So, when running in a user space, ZFS utilizes a user space binary called ‘ztest.’ Before we move on, let’s discuss a few details about this exciting user space binary. This binary is created by linking the libraries libzpool, libzfs, etc., which are based out of the ZFS code that runs in Kernel. These libraries contain transactional, pooled storage layers that are compiled from the code nearly the same as Kernel code. This means that all the major functionality available in kernal ZFS, such as creating/configuring pools and volumes, performing IOs, disk space usage management etc., are part of ztest binary. However, it is worth noting that accessibility of the volumes, i.e., ZPL/ZVOL are not part of ztest.

To perform configuration-related operations, ztest directly calls the handlers, whereas, in the case of kernel space ZFS, zpool/zfs, CLI binaries are used. Another interesting item is disk operations. ZFS in Kernel calls a disk driver, or its wrapper related APIs. This is in contrast to ztest, which performs system library calls, or IOCTLs.

Disk operations

                                                                                      Disk operations

In cStor, we followed a similar approach to create a binary called ‘zrepl’ that is part of cStor. It has been built using the libraries similar to what is used for ztest and contains transactional, pooled storage layers. It also must perform various tasks, such as:

  • Creation/configuration of pools and zvols
  • Supporting zvols to perform IOs on it
  • AIO support
  • Accessibility of the created zvols
  • Replication
  • Rebuilding of missing data, etc.

 

The first two items are related to the support and configuration of ZVOLs in cStor, which we will cover in this blog.

The first challenge is to create/configure pools and zvols for the user space binary ‘zrepl.’

As mentioned above, zpool/zfs CLI binaries are used for the kernel space ZFS. These CLI binaries send IOCTLs to kernel space ZFS to create/configure pools. However, this approach won’t work for user space’ zrepl.’

In the user space, the best way to accomplish this is by writing REST/gRPC server in `zrepl`. This listens for config requests and performs the tasks handled by current zpool/zfs CLI binaries. However, current zpool/zfs CLI binaries have evolved over a period of decades and replicating its functionality would be an immense task. This also moves the project out-of-sync with the upstream of ZoL, which likely would not be a good idea.

We came up with the idea of performing IOCTL redirection using unix domain sockets. The ‘zrep’ binary creates a server on unix domain socket. Upon receiving a message with IOCTL information that needs to be executed, this server will call the handler to be executed by kernel space ZFS after receiving that IOCTL. Rather than performing IOCTL calls, zpool / zfs CLI binaries are modified to connect to the IOCTL server and send a message with the IOCTL information that needs to be executed.

IOCTL redirection

                                                                                IOCTL redirection

Now we come to the next task of supporting zvols in user space. We implemented this by linking the same ZVOL code of kernel ZFS that creates datasets, objsets, objects to zrepl. We were able to avoid the device creation over these zvols in the user space. The 'zvol_state' structure similar to the existing one has been added, and we overwrote functions to create zvols. During the pool import process, we also added wrappers over the DMU layer to read/write IOs onto zvols and enabled ZIL related APIs to log/replay. Aside from the device creation, all features provided by ZoL over zvols, such as snapshotting/cloning/send/receive, etc., work with this approach.

Before I finish this post, I want to give credit to ZOL, from which this project has been forked.

Don Williams
Don is the CEO of MayaData and leading the company for last one year. He has an exceptional record of accomplishments leading technology teams for organizations ranging from private equity-backed start-ups to large, global corporations. He has deep experience in engineering, operations, and product development in highly technical and competitive marketplaces. His extensive professional network in several industries, large corporations and government agencies is a significant asset to early stage businesses, often essential to achieve product placement, growth and position for potential exit strategies.
Kiran Mova
Kiran evangelizes open culture and open-source execution models and is a lead maintainer and contributor to the OpenEBS project. Passionate about Kubernetes and Storage Orchestration. Contributor and Maintainer OpenEBS projects. Co-founder and Chief Architect at MayaData Inc.
Murat Karslioglu
VP @OpenEBS & @MayaData_Inc. Murat Karslioglu is a serial entrepreneur, technologist, and startup advisor with over 15 years of experience in storage, distributed systems, and enterprise hardware development. Prior to joining MayaData, Murat worked at Hewlett Packard Enterprise / 3PAR Storage in various advanced development projects including storage file stack performance optimization and the storage management stack for HPE’s Hyper-converged solution. Before joining HPE, Murat led virtualization and OpenStack integration projects within the Nexenta CTO Office. Murat holds a Bachelor’s Degree in Industrial Engineering from the Sakarya University, Turkey, as well as a number of IT certifications. When he is not in his lab, he loves to travel, advise startups, and spend time with his family. Lives to innovate! Opinions my own!