MayaData Blog

LitmusChaos is now a CNCF Sandbox project

Written by Uma Mukkara | Jun 30, 2020 2:00:00 PM


LitmusChaos - now a CNCF Sandbox project

OpenEBS was the first contribution from MayaData to the cloud-native community. Now, with great enthusiasm, we are announcing that we are contributing one more project to CNCF. LitmusChaos has been accepted as a SandBox project by the CNCF TOC. The Litmus community has been growing under the CNCF radar in the last two years and now reached a stage where many users have adopted the project for doing Chaos Engineering in a cloud-native way. By becoming a sandbox project, we are hopeful that the Litmus community will grow both in adoption and contributions. 

Litmus is the Chaos Engineering framework for Kubernetes SREs to help find weaknesses in their applications and Kubernetes platform implementation. Litmus is often used by the same teams as Kubera - our SaaS solution for the operations of Kubernetes as a data layer. 

In this blog, I will discuss how the LitmusChaos project started, its current status, and what we look at doing further under CNCF.

The beginning

A couple of years ago, we were developing chaos experiments to test the resilience of OpenEBS. It involved introducing chaos not only into various components of OpenEBS, but also the need for chaos at the Kubernetes platform level and chaos into applications that use OpenEBS. We tried to find easy to use chaos experiments on Kubernetes and realized they need to be developed from scratch or force-fit non-Kubernetes native tools onto Kubernetes. We decided to write the Chaos Engineering infrastructure for Kubernetes in the open and keep the community at the center of this effort. Hence the LitmusChaos project was born. 

At the center of the architectural goals of Litmus was a cloud-native way of doing chaos. We started with defining chaos CRDs, an operator to manage the lifecycle of those CRDs and an open place to store chaos experiments (or ChaosExperiment CRs). The initial version of LitmusChaos had those three aspects

Then we also defined a set of principles for the community to elaborate on cloud-native Chaos Engineering. The details of this architecture and approach are published in this CNCF blog

Another important aspect that we considered for the framework was to allow custom chaos logic into Litmus. For example, an SRE has developed a detailed experiment to kill a Kubernetes platform node in a certain way. It already works - why should they have to change it or even throw it out to use Litmus?  Instead, with Litmus this experiment can be brought in smoothly into the LitmusChaos framework. The SRE simply builds the logic of the experiment into a Docker container and wraps the ChaosExperiment CR around the container. Litmus then orchestrates and operates the experiment seamlessly. This is called BYOC or Bring-Your-Own-Chaos or plug-and-play chaos. We have observed many community users who started by using their own Kubernetes chaos experiments this way and have moved to the Litmus framework.

The basic experiments required for the general Chaos Engineering are already available on the Litmus Hub. There are also application-specific experiments for OpenEBS, Kafka, CoreDNS, Cassandra, etc., and the list is expected to grow with Litmus becoming a sandbox project. Any and all CNCF projects especially - and also other projects like DB projects Cassandra, Yugabyte, Greenplum, and of course Elastic and others - please know that this community is your community. Pitch in or just invite Litmus to help contribute to your project in any way possible. Either way, you are helping us all come to trust Kubernetes more for your workloads.


Current status

LitmusChaos is well underway in terms of community adoption. The project recently crossed 10K installations, and 50K experiment runs.

Community collaboration happens on the #litmus channel on Kubernetes slack. The community is 175 strong members and growing. 

In terms of the feature set that we recently added, few notables ones that were recently added were

  • Ability to orchestrate chaos workflows using the CNCF Argo project and
  • GitHub actions to enable developers on GitHub to inject Litmuschaos in their pipelines.

Roadmap

With the chaos operator, scheduler and Hub in place, the next major item in the roadmap is to develop a central console to help visualize chaos workflows in action, orchestrate them and provide analytics. Litmus-portal is the portal to manage and view chaos-in-action with GitOps backend. With this, Litmus becomes a complete take-and-use framework for Chaos Engineering needs on Kubernetes. Take a look at the roadmap here

A continuous part of our roadmap is to add more chaos experiments into the Hub. Application-specific chaos experiments are useful for injecting application-specific chaos and ascertaining the resilience in that context. We will be working with the community to seek contributions in this area. 

Conclusion

We are thrilled about being part of CNCF in the cloud-native Chaos Engineering journey. We at MayaData will continue to contribute to Litmus and drive the project along with many other ecosystem partners. We are looking forward to great community collaboration.

You can use the Demo package of Litmus to get started quickly. Go for it and let us know what you think.