ZFS based cStor- the Storage Engine of OpenEBS (built on Kubernetes)

A small introduction about myself, I work for a company named MayaData who develops an Open Source software called OpenEBS (CNCF Sandbox project) that simplifies the deployment of stateful applications on Kubernetes. Haven't checked it out yet? Click here


OpenEBS, the leading block storage solution on K8s and CAS architecture, uses ‘cStor’ [pronunciation: see-stor] as one of its storage engines. This solution has been available in the field for more than half a year now.

cStor uses ZFS behind the scenes by running it in the user space. In this post, I will highlight a few challenges that our team faced while building this data engine.

So, when running in a user space, ZFS utilizes a user space binary called ‘ztest.’ Before we move on, let’s discuss a few details about this exciting user space binary. This binary is created by linking the libraries libzpool, libzfs, etc., which are based out of the ZFS code that runs in Kernel. These libraries contain transactional, pooled storage layers that are compiled from the code nearly the same as Kernel code. This means that all the major functionality available in kernal ZFS, such as creating/configuring pools and volumes, performing IOs, disk space usage management etc., are part of ztest binary. However, it is worth noting that accessibility of the volumes, i.e., ZPL/ZVOL are not part of ztest.

To perform configuration-related operations, ztest directly calls the handlers, whereas, in the case of kernel space ZFS, zpool/zfs, CLI binaries are used. Another interesting item is disk operations. ZFS in Kernel calls a disk driver, or its wrapper related APIs. This is in contrast to ztest, which performs system library calls, or IOCTLs.

Disk operations

                                                                                      Disk operations

In cStor, we followed a similar approach to create a binary called ‘zrepl’ that is part of cStor. It has been built using the libraries similar to what is used for ztest and contains transactional, pooled storage layers. It also must perform various tasks, such as:

  • Creation/configuration of pools and zvols
  • Supporting zvols to perform IOs on it
  • AIO support
  • Accessibility of the created zvols
  • Replication
  • Rebuilding of missing data, etc.

 

The first two items are related to the support and configuration of ZVOLs in cStor, which we will cover in this blog.

The first challenge is to create/configure pools and zvols for the user space binary ‘zrepl.’

As mentioned above, zpool/zfs CLI binaries are used for the kernel space ZFS. These CLI binaries send IOCTLs to kernel space ZFS to create/configure pools. However, this approach won’t work for user space’ zrepl.’

In the user space, the best way to accomplish this is by writing REST/gRPC server in `zrepl`. This listens for config requests and performs the tasks handled by current zpool/zfs CLI binaries. However, current zpool/zfs CLI binaries have evolved over a period of decades and replicating its functionality would be an immense task. This also moves the project out-of-sync with the upstream of ZoL, which likely would not be a good idea.

We came up with the idea of performing IOCTL redirection using unix domain sockets. The ‘zrep’ binary creates a server on unix domain socket. Upon receiving a message with IOCTL information that needs to be executed, this server will call the handler to be executed by kernel space ZFS after receiving that IOCTL. Rather than performing IOCTL calls, zpool / zfs CLI binaries are modified to connect to the IOCTL server and send a message with the IOCTL information that needs to be executed.

IOCTL redirection

                                                                                IOCTL redirection

Now we come to the next task of supporting zvols in user space. We implemented this by linking the same ZVOL code of kernel ZFS that creates datasets, objsets, objects to zrepl. We were able to avoid the device creation over these zvols in the user space. The 'zvol_state' structure similar to the existing one has been added, and we overwrote functions to create zvols. During the pool import process, we also added wrappers over the DMU layer to read/write IOs onto zvols and enabled ZIL related APIs to log/replay. Aside from the device creation, all features provided by ZoL over zvols, such as snapshotting/cloning/send/receive, etc., work with this approach.

Before I finish this post, I want to give credit to ZOL, from which this project has been forked.

Kiran Mova
Kiran Mova is a Passionate Technologist with 17 years of extensive experience working for product companies like Cisco, Lucent, Novell. Kiran has led efforts in performance engineering, simplifying the products in terms of usability, operation and deployment complexities, introducing multi-tenancy and enabling SAAS services in domains like IP/Optical Networks, Storage and Access Management. At MayaData, Kiran leads overall architecture and is responsible for architecting, solution design and customer adoption of OpenEBS and related software. Kiran evangelizes open culture and open source execution models and is a lead maintainer and contributor to the OpenEBS project Passionate about Storage Orchestration. Contributor and Maintainer OpenEBS projects. Chief Architect MayaData Inc. Open Source Dreamer
Akhil Mohan
Building Mayadata one commit at a time.
Uma Mukkara
Umasankar Mukkara (Uma) has over 20 years of experience as a hands-on developer, architecting scalable products and building innovative teams. Uma led product development in the early days of MayaData (CloudByte). He has led multiple innovations in multi-tenant storage and has contributed to more than 10 patents. Prior to CloudByte, he has contributed significantly to the development of Access Management solutions and has a sound understanding of cloud storage and security architecture. Uma also spends significant time in building chaos engineering practices in OpenEBS development eco-system, community engagement, and partner eco-system development. Uma holds a Masters degree in Telecommunications and software engineering from Illinois Institute of Technology, Chicago and a bachelor’s degree in Communications from S.V.University, Tirupati, India. Contributor at openebs.io, Co-founder& COO@MayaData