In this post, I will provide a quick summary of the changes that were released as part of OpenEBS version 1.1 and share some thoughts on the evolving project management process in OpenEBS and how it is helping to maintain a faster release cadence.
OpenEBS Release 1.1 focuses on fixing and documenting the cross-platform usability issues reported by users. It also lays the foundation for some of the long-overdue backlogs such as CSI Driver, automated upgrades, day 2 operations, and others.
Before we dive into the specifics of the current release, let’s discuss the last three OpenEBS releases, which have set an interesting precedent towards attaining a monthly release cadence.
OpenEBS was built by adopting the cloud-native and microservices principles, and it is only natural to also reap the benefits of true DevOps product with faster releases. This is easier said than done though! After having experimented with several tools and considering various open-source projects including Kubernetes, we arrived at the following process, which helps us maintain a release cadence to remain responsive to user requirements.
- Responsiveness — Almost all active contributors and maintainers of the OpenEBS project are reachable and actively participate in the OpenEBS Community Slack. OpenEBS has been credited with being one of the most responsive CNCF community projects — and thanks to the community, OpenEBS developers are getting feedback directly from end-users. This eliminates layers of requirements for implementation and improves the feedback loop.
- Clarity of criteria for alpha and beta — Recently, we clarified that our release gates are defined by OpenEBS Litmus based GitLab pipelines that run end-to-end tests on multiple different platforms and stateful workloads. Perhaps this goes without saying — however, we use these pipelines to catch any regressions. What is more, a feature is marked as Beta only after it has been added to the test pipelines. For example, LocalPV, as of OpenEBS 1.1, is now Beta because it is passing these tests and is seeing a lot of production usage as well.
- Backlog grooming — At the start of the release, we look at the backlogs which are on GitHub. Items are selected based on contributor availability. We balance the development of new features, fixing of existing features, updating and improving documentation, improving e2e coverage, and hardening the usage of OpenEBS on new platforms. As an example of a new platform, we have seen a significant amount of usage with the low footprint Jiva on ARM, and we are now releasing container images built for the ARM64 architecture. This makes OpenEBS operational on RPi4 as well as Amazon A1 instances or Packet’s powerful ARM Compute servers. As another example, we are hardening the use of OpenEBS for Konvoy from our friends at Day2IQ, and we will shortly see Konvoy on OpenEBS.ci. As a reminder, OpenEBS.ci is a public way of showing that all commits to OpenEBS master are tested against a set of workloads and platforms. OpenEBS also now appears in the OpenShift Operator Hub and on the AWS Marketplace as well.
- Tracking items — The list of selected items are tracked for the current release using these Google Sheets. This obviously isn’t fancy, but it helps get all the collaborators together and easily provides a no-barrier objective follow-up between release manager, leads and reviewers. The format of the sheet is a modified version of what is used by Kubernetes sig-storage.
- Role of core committers — As core contributors, our responsibility is to detail the design and list the implementation tasks — including covering the integration and upgrade tests. Each granular task is updated in the above project sheet, then we ask for help from the community to fix some of these items. The designs themselves are discussed and maintained as GitHub PRs here.
- Role of RC1 and RC2 — Functionality must be checked into the master before RC1 builds begin. Post RC1, it is mostly about corner cases, integration and upgrade tests. Only those features that can complete the upgrade testing within the RC2 timelines are considered for the current release.
- Role of release manager — Conducts follow-ups via daily stand ups on pending items and mitigates the risks by seeking additional help or by removing the feature from the release.
- The final two weeks — As we reach the end of a one-month release cycle, the focus turns from new features to stabilizing the proposed features through refactoring and adding more test cases. The last two weeks are focus on polishing documentation and trying to reach out to users whose requests have been incorporated into the product to get some early feedback.
- What else? I haven’t spoken about the role of beta tests or dogfooding the releases by using OpenEBS in our own hosted services such as OpenEBS director. Perhaps I’ll dig into those in a future blog. Bookkeeping tasks that start after the release also consume quite a bit of time. For example, OpenEBS can be deployed via different partner platforms, each of which maintains their repositories for their Helm charts. Each of these partners are constantly evolving with new guidelines for check-ins, and they tend to move at their own pace. There is definitely room for improvement here, and hopefully the way the Kubernetes apps are delivered will be standardized such that bookkeeping tasks can be reduced.
How do you run your Open Source projects? What tools do you use to improve productivity? Please explain below in a comment. We would love to hear from you and help improve the care and feeding of the OpenEBS community.
Now let’s get back to OpenEBS 1.1. The major features, enhancements and bug fixes in this release include:
- Upgrades! We added support for the upgrade of OpenEBS storage pools and volumes through Kubernetes Job. As a user, you no longer need to download scripts to execute an upgrade. The procedure to upgrade via Kubernetes Job is provided here. Kubernetes Job-based upgrade is a step towards completely automating the upgrades in upcoming releases. We would love to hear your feedback on the proposed design. Note: Upgrade jobs makes use of a new container image called quay.io/openebs/m-upgrade:1.1.0.
- CSI — The CSI driver reached Alpha with initial functionality for provisioning and de-provisioning of cStor volumes. Once you have OpenEBS 1.1 installed, take the CSI driver for a spin on your development clusters using the instructions provided here. The addition of the CSI driver also requires a shift in the paradigm of how the configuration of the storage class parameters should be passed on to the drivers. We want to keep this as seamless as possible, so please let us know if you have any inputs on what you notice as we shift towards the CSI driver.
- Day 2 automation ongoing — There is a tremendous amount of ongoing work to further automate Day 2 operations of the cStor storage engine. Most of these changes did not make the current release because the nature of schema changes was larger than could be taken within the current release cycle. This feature is under active development, and if you are interested in providing feedback on how this feature is shaping up, you can find the proposed design here. Thank you to everyone who has already chipped in with ideas and feedback!
Perhaps the greatest highlight of this release is an increased involvement from the OpenEBS user community with pitching in on GitHub Issues as well as providing contributions.
OpenEBS user community
Here are some issues that were raised and subsequently fixed within the current release.
- We addressed an issue where backup and restore of cStor volumes using OpenEBS velero-plugin was failing when OpenEBS was installed through Helm. @gridworkz
- There arose an issue with NDM where the kubernetes.io/hostname for Block Devices on AWS Instances was being set as the nodeName. This resulted in cStor Pools not being scheduled to the node as there was a mismatch between hostname and nodename in AWS instances. @obeyler
- We fixed an issue where NDM was seen to crash intermittently on nodes where NVMe devices are attached. There was an issue in the handling of NVMe devices with write cache supported, resulting in a segfault. [Private User]
- We added support to disable the generation of default storage configurations like StorageClasses, in case the administrators would like to run a customized OpenEBS configuration. @nike38rus
- We fixed an issue where the cStor Target would fail to start when the NDM sparse path is customized. @obeyler
- We corrected a regression that was introduced into the cStor Sparse Pool that would cause the entire Volume Replica to be recreated upon the restart of a cStor Sparse Pool. The fix ensured the data is rebuilt from the peer Sparse pools instead of recreating. Test cases have been added to the e2e pipeline to catch this behavior with Sparse Pools. Note that this doesn’t impact the cStor Pools created on Block Devices. @vishnuitta
- For Jiva Volumes, we created a utility that can clear the internal snapshots created during replica restart and rebuild. For long-running volumes that have gone through multiple restarts, the number of internal snapshots can hit the maximum supported value of 255, after which the Replica will fail to start. The utility to check and clear the snapshots is available here. @rgembalik @amarshaw
- We enhanced velero-plugin to allow users to specify a backupPathPrefix for storing the volume snapshots in a custom location. This allows users to save/backup their configuration and volume snapshot data under the same location rather than saving the configuration and data in different locations. @amarshaw
For a detailed change summary, steps to upgrade from a previous version, or to get started with v1.1, please refer to Release 1.1 Change Summary
In short, OpenEBS 1.1 demonstrates that OpenEBS development is marching ahead faster and faster, and we are constantly delivering more and more features, fixes and platforms.
As always, if you have any feedback or inputs regarding the OpenEBS project or project management — please reach out to me on Slack or GitHub, or submit comments here.