MayaData Blog

LitmusChaos 0.7 Streamlines Kubernetes Chaos Engineering

Written by Karthik Satchitanand | Oct 17, 2019 12:30:00 PM

As a project, Litmus has grown significantly and its vibrant community has provided sustained feedback over the last few months.

 

Introduction

I would like to express my heartfelt gratitude to both contributors and users for that feedback. To date, we have garnered 350+ GitHub stars & 138+ forks! And I have learned through many interactions with Developer/DevOps teams across various meetups and events such as the Gitlab Commit and DevOps days that the need and importance of chaos engineering practices (and a firm commitment to the Litmus architecture) has never been greater. The consensus view is “As applications turn more cloud-native (read: Kubernetes-native), the practices and tooling around chaos engineering should too. Chaos CRDs are key!” To help advocate this message more broadly we have created a channel on Kubernetes Slack called #litmus.

The Litmus 0.7 release equips our users with more experiments and integrates infrastructure components to facilitate easier onboarding into the world of open, collaborative chaos engineering. In this blog, we will delve into some of my favorite features & peek into the immediate road map for subsequent releases. You can find the full list of changes here.

Override Experiment Tunables via ChaosEngine

The chaos charthub was introduced as part of version 0.6 and allows users to browse for chaos experiment custom resources of choice and install them on a cluster while creating a chaosEngine CR to execute them against the desired application. The chaos-experiment CRs play the role of base specifications for chaos parameters and are available to a given namespace. Considering that it is possible to have more than one application being subjected to chaos in a given namespace, there was a need to isolate the tunables for each instance of chaos without changing it at an namespace-wide level. To reinforce the status of chaosEngine as “the” single-source of truth (which the user needs to edit) the chaos executor now has the ability to override defaults.

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: chaos
  namespace: default
spec:
  monitoring: false
  appinfo:
    appkind: deployment
    applabel: app=nginx
    appns: default
  chaosServiceAccount: nginx
  experiments:
  - name: container-kill
    spec:
    components:
- name: TARGET_CONTAINER
value: nginx

 

Experiment Results Available as Status on ChaosEngine  

The status of chaos experiments executed by the chaos operator is now published in the status field of the ChaosEngine. Note that the ChaosResult CR continues to exist, with scope for further schema development on result specifics.

kind: ChaosEngine
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: | <stripped>
    creationTimestamp: 2019-10-09T11:27:09Z
    generation: 1
    name: engine-nginx
    namespace: default
    resourceVersion: "6854030"
    selfLink: /apis/litmuschaos.io/v1alpha1/namespaces/default/chaosengines/engine-nginx
    uid: bb48b201-ea87-11e9-bb68-0050569846e3
  spec:
    appinfo:
      applabel: run=nginx
      appns: default
    chaosServiceAccount: nginx
    experiments:
    - name: pod-delete
      spec:
        components: null
status:
experimentStatuses:
- instance: pod-delete-792363
name: pod-delete
status:
verdict: pass

Integration with PowerfulSeal

Litmus is inherently a community-driven chaos engineering project that aims to reuse the many excellent tools already available that can inflict chaos while orchestrating them all in a Kubernetes-native way. Powerfulseal is one such chaos tool. With Litmus 0.7, you can choose to kill pods randomly via Powerfulseal.

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: chaos
  namespace: default
spec:
  monitoring: false
  appinfo:
    appkind: deployment
    applabel: app=nginx
    appns: default
  chaosServiceAccount: nginx
  experiments:
  - name: pod-delete
    spec:
    components:
- name: FORCE
value: true
- name: LIB
value: powerfulseal

Increased Chaos Experiments

Additional chaos charts enable injecting pod-level “network” chaos (packet loss & latency) and have been added to the “generic” experiment category. In addition, this release adds OpenEBS data plane chaos (storage target and storage pool pods) experiments.

 

Improved CI for LitmusChaos Components

Litmus 0.7 improved upon the existing CI via increased unit tests & BDD tests coverage (chaos-operator, chaos-exporter) while also putting CI in place for the charthub & chaos-charts repo (which is a canonical place/backend for the CRs listed on the hub).

PASS: TestNewRunnerPodForCR/Test_Positive-2 (0.00s)
PASS: TestNewRunnerPodForCR/Test_Negative-1 (0.00s)
PASS: TestNewRunnerPodForCR/Test_Negative-2_ (0.00s)
PASS: TestNewRunnerPodForCR/Test_Negative-3_ (0.00s)
PASS: TestNewRunnerPodForCR/Test_Positive-1 (0.00s)
PASS: TestNewMonitorServiceForCR (0.00s)
PASS: TestNewMonitorServiceForCR/Test_Positive (0.00s)
PASS: TestNewMonitorServiceForCR/Test_Negative (0.00s)
PASS: TestNewMonitorPodForCR (0.00s)
PASS: TestNewMonitorPodForCR/Test_Positive (0.00s)
PASS: TestNewMonitorPodForCR/Test_Negative (0.00s)
PASS: TestInitializeApplicationInfo (0.00s)
PASS: TestInitializeApplicationInfo/Test_Negative (0.00s)
PASS: TestInitializeApplicationInfo/Test_Positive (0.00s)

RUN   TestChaos
Running Suite: BDD test
=======================
Random Seed: 1571131836
Will run 2 of 2 specs

chaos-operator created successfully
ChaosExperiment created successfully...
Chaosengine created successfully...
name :  engine-nginx-runner
• [SLOW TEST:100.090 seconds]

Ran 2 of 2 Specs in 140.025 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS: TestChaos (140.03s)

Improved Documentation

Importantly, this release includes completely-rewritten user documentation, with simpler getting-started guides, improved examples and an upgraded docusaurus version to help users to start their chaos engineering journey with Litmus.

https://docs.litmuschaos.io

Conclusion

The strength of any open source project is in its community. I would like to give a huge shoutout to @jayadeepkm, @aswathkk, and a host of other contributors for helping us roll out this release.

A quick peek into the 0.8 release, some of the high-level backlog features include:

  • Increased chaos experiment charts
  • Upgraded chaos-operator with ability to select job cleanup/retention, executor image selection, etc.
  • Improved developer docs for chaos chart contributors
  • Improved project maintenance guidelines

Do try out Litmus charts. As always, we look forward to your valuable feedback & comments.