Setup GitHub Chaos Action using KinD Cluster [Part-2]

In this blog I’ll be talking about setting up GitHub Chaos Action using KinD cluster in your GitHub workflow, to know more about GitHub Chaos Actions and how to get started with chaos actions please refer Part-1 of the action series. This blog deals with automating the workflow by creating a kind cluster inside the CI workflow only in place of connecting an external cluster as we did in Part-1. Before jumping in, let's do a quick recap on Litmus. Litmus is a framework for practicing chaos engineering in cloud-native environments. Litmus provides a chaos operator, a large set of chaos experiments on its hub, detailed documentation, and a friendly community. Litmus is very easy to use; you can also set up a very quick demo environment to install and run Litmus experiments.

Alt Text


Pre-requisites:

  • GO
  • Docker
Note: The above is only required when you're not using the default runner otherwise the ubuntu-latest image has all the dependencies installed in it.

Brief Introduction to GitHub Chaos Action

GitHub Chaos Action helps you to automate the chaos testing in a Cloud Native way using Github Actions. It contains a number of litmuschaos experiments which will help to find the weaknesses or improvements required for your application.

Alt Text

We can also say that the Github chaos action is used to create custom software development life cycle (SDLC) workflows directly in your GitHub repository.To know more about setting up a sample workflow using GitHub chaos action please visit the Part-1 of the blog.

Why KinD Cluster?

KinD (Kubernetes in Docker) was primarily designed for testing Kubernetes itself, but now it has become very famous to be used for local development or CI. But “Why KinD?” You might ask this question to yourself while setting up GitHub CI with chaos action to answer this let’s do a small analysis of what is required.

  • Easily Setup: We mostly use GitHub default runner as it has almost all the capabilities installed in it to run a simple workflow and we don’t need an external runner until or unless we have a larger requirement. The Github Ubuntu image of the default runner comes preinstalled with kind so we just need to spin up a cluster with a configuration file or with default values. In this blog I will be using default values to setup KinD which will create a single node cluster with named ‘kind’
  • Compatible: We need a cluster which is very lightweight and easy to use. For this purpose, KinD fits the best. KinD is a tool for running local Kubernetes clusters using Docker container “nodes”.

These are enough reasons to use kind in our CI workflows.

New trend in Continuous Integration/Deployment:

One of the new trends in Continuous Integration/Deployment is to:

  • Create an application image.
  • Run tests against the created image.
  • Push image to a remote registry.
  • Deploy to a server from the pushed image.

It’s also useful when your application already has the Dockerfile that can be used to create and test an image.

Alt Text

We will create a Github CI workflow covering the above stages on a particular PR for that we need to follow the following ten simple steps:

Step-1: Setup a fresh workflow in case you don’t have one

Let’s get started with writing a GitHub CI YAML if you don’t have it already. We are now familiar with the fields and attributes of CI YAML from Part-1, so we will directly move toward steps involved in setting up the actions with the following template:

name: Litmus-CI
on:
  # Trigger the workflow on push or pull request,
  # but only for the master branch
  push:
    branches:
      - master
jobs:
  # Job name
  choas-tests:
    runs-on: ubuntu-latest
    steps:

Step-2: Checkout the latest commit on the Pull Request

When you commit code to your repository, you can continuously build and test the code to make sure that the commit doesn't introduce errors. The error could be in the form of some security issue, functional issue or performance issue which can be tested using different custom tests, linters or by pulling actions. This brings the need of having Chaos Actions which will perform a chaos test on the application over a particular commit which in-turn helps to track the performance of the application on a commit level. The following lines of code will help you checkout on the latest commit on the PR:

#Using the last commit id of pull request
      - uses: octokit/request-action@v2.x
        id: get_PR_commits
        with:
          route: GET /repos/:repo/pulls/:pull_number/commits
          repo: $
          pull_number: $
        env:
          GITHUB_TOKEN: $

      - name: set commit to output
        id: getcommit
        run: | 
           prsha=$(echo $response | jq '.[-1].sha'  | tr -d '"')
           echo "::set-output name=sha::$prsha" 
        env: 
          response:  $

      - uses: actions/checkout@v2
        with:
          ref: $

Just add these lines under steps in list format and we are on the latest commit on the PR.

Step-3: Create an Image of the application

In this stage we will Dockerize the application from the repository. For this we need to have a Dockerfile in the repository which can be used to create the application image. Here, let us suppose that we have a Dockerfile at location
/build from root. So for creating a Docker image of this application we need to run following command in CI YAML:

- name: Build docker image
        run: |
          sudo docker build -f build/Dockerfile -t image-name:image-tag

Step-4: Install and configure KinD Cluster

As we discussed for running Chaos Actions we need to have a cluster and we will be creating a KinD cluster in the CI workflow only. As we’re using Ubuntu-latest image which already contain the kind installation we need to spin a cluster. For spinning the latest version of the KinD cluster we just need to run a kind create cluster command or for a specific version we need to add an action for that. For this blog I’ll be using kind 0.7.0 version which can be installed from the following line of code:

#Install and configure a kind cluster
      - name: Installing Prerequisites (KinD Cluster)
        uses: engineerd/setup-kind@v0.4.0
        with:
            version: "v0.7.0"

      - name: Configuring and testing the Installation
        run: |
          kubectl cluster-info --context kind-kind
          kind get kubeconfig --internal >$HOME/.kube/config
          kubectl get nodes 

Step-5: Load the image inside the node of the cluster

After creating the cluster we need to load the application image which was built in step 3 in the node of the cluster to be used locally for testing.

- name: Load image on the nodes of the cluster
        run: |
          kind load docker-image --name=kind image-name:image-tag

Step-6: Run a pod with the application container

Now we will run the application container in a pod. Different chaos testing on the application level and on the node level will be performed and the health of the pod will be verified after inducing chaos. You can refer to a sample pod manifest which can be used to create an application pod from here replace the nginx image with your image and place it at some location (say /path/to/application) in your repository. In this step we will create the application pod via the following command and wait for a few seconds to get it ready.

 - name: Deploy a sample application for chaos injection
        run: |
          kubectl apply -f /path/to/application
          sleep 30

Step-7: Setup kubeconfig ENV for GitHub Actions

Now we need to export KUBE_CONFIG_DATA containing kubeconfig of the the KinD cluster encoded in base64 to be used by Chaos Action.

- name: Setting up kubeconfig ENV for Github Chaos Action
        run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)

Step-8: Run GitHub Chaos Actions on the application

Here comes the step where we will use GitHub Chaos Actions for chaos testing. The experiments and parameters are controlled by actions ENV as explained in the Chaos Action page.

Example:

- name: Running Litmus pod delete chaos experiment
        if: startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-all')
        uses: mayadata-io/github-chaos-actions@v0.1.1
        env:
          INSTALL_LITMUS: true
          EXPERIMENT_NAME: pod-delete
          EXPERIMENT_IMAGE: litmuschaos/ansible-runner
          EXPERIMENT_IMAGE_TAG: ci
          IMAGE_PULL_POLICY: IfNotPresent
          LITMUS_CLEANUP: true

Step-9: Push the application image to the registry

This step will execute only when all other steps run fine. This means our application has passed the chaos test and now we are good to push the image in the registry.

- name: Publish to Docker Repository
        uses: elgohr/Publish-Docker-Github-Action@master
        with:
          Name: image-name
          username: $
          password: $
          dockerfile: build/Dockerfile
          tags: "image-tag"

Step-10: Delete Kind Cluster

This is basically a cleanup step where we remove the cluster so the application pod and chaos components also get removed. This should In case some experiment fails, even then the cleanup stage should execute.

- name: Deleting KinD cluster
        if: $
        run: kind delete cluster

So the complete workflow YAML looks like:

main.yml

name: Litmus-CI
on:
 push:
 branches: [ master ]

jobs:
  tests:
    runs-on: ubuntu-latest
    steps:

      #Using the last commit id of pull request
      - uses: octokit/request-action@v2.x
        id: get_PR_commits
        with:
          route: GET /repos/:repo/pulls/:pull_number/commits
          repo: $
          pull_number: $
        env:
          GITHUB_TOKEN: $

      - name: set commit to output
        id: getcommit
        run: | 
           prsha=$(echo $response | jq '.[-1].sha'  | tr -d '"')
           echo "::set-output name=sha::$prsha" 
        env: 
          response:  $

      - uses: actions/checkout@v2
        with:
          ref: $      

      - name: Build docker image
        run: |
          sudo docker build -f build/ansible-runner/Dockerfile -t litmuschaos/ansible-runner:ci .
      #Install and configure a kind cluster
      - name: Installing Prerequisites (KinD Cluster)
        uses: engineerd/setup-kind@v0.4.0
        with:
            version: "v0.7.0"

      - name: Configuring and testing the Installation
        run: |
          kubectl cluster-info --context kind-kind
          kind get kubeconfig --internal >$HOME/.kube/config
          kubectl get nodes   

      - name: Load image on the nodes of the cluster
        run: |
          kind load docker-image --name=kind litmuschaos/ansible-runner:ci
      - name: Deploy a sample application for chaos injection
        run: |
          kubectl apply -f https://raw.githubusercontent.com/mayadata-io/chaos-ci-lib/master/app/nginx.yml
          sleep 30

      - name: Setting up kubeconfig ENV for Github Chaos Action
        run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)

      - name: Running node-memory-hog chaos experiment
        uses: mayadata-io/github-chaos-actions@v0.1.1
        env:
          INSTALL_LITMUS: true
          EXPERIMENT_NAME: node-memory-hog
          EXPERIMENT_IMAGE: litmuschaos/ansible-runner
          EXPERIMENT_IMAGE_TAG: ci
          IMAGE_PULL_POLICY: IfNotPresent
          LITMUS_CLEANUP: true


      - name: Publish to Docker Repository
        uses: elgohr/Publish-Docker-Github-Action@master
        with:
          Name: image-name
          username: $
          password: $
          dockerfile: build/Dockerfile
          tags: "image-tag"

      - name: Deleting KinD cluster
        if: $
        run: kind delete cluster

View the Result:

Now to view the result of the run we need to navigate to the main page of the repository. Under your repository name, click Actions.

Alt Text

List of all workflows will appear. Out of these workflows select the latest one to view the logs and get an idea what happen during the execution of the chaos test.

Alt Text

 

The complete log will look like:

Alt Text

If you want to create and use your own experiment/test in the GitHub Chaos Action then you create it using litmus with the help of cool SDK or for more help you can also raise an issue in litmus for that. The community is extremely active and surely they will get back to you with some good help.

NOTE: If you're new to GitHub Actions and wanted to create a simple GitHub workflow then please also read the Part-1 of the action series.

 

Conclusion:

By using GitHub Chaos Actions along with KinD in the CI workflow automate, customize, and execute your software development workflows right in your repository. You can use the Chaos Action in a completely customized way to check the performance of your application for every commit coming in the Pull Request. This will help both developers and testers to maintain the quality of the software. So, what do you think? What else can we achieve using Chaos Actions and what other tests can be performed apart from what we have now? Kubernetes experts are welcome to comment and suggest on this!

Are you an SRE or a Kubernetes enthusiast? Does Chaos Engineering excite you?
Join Our Community On Slack For Detailed Discussion, Feedback & Regular Updates On Chaos Engineering For Kubernetes: https://kubernetes.slack.com/messages/CNXNB0ZTN
(#litmus channel on the Kubernetes workspace)
Check out the Litmus Chaos GitHub repo and do share your feedback: https://github.com/litmuschaos/litmus
Submit a pull request if you identify any necessary changes.

Kiran Mova
Kiran evangelizes open culture and open-source execution models and is a lead maintainer and contributor to the OpenEBS project. Passionate about Kubernetes and Storage Orchestration. Contributor and Maintainer OpenEBS projects. Co-founder and Chief Architect at MayaData Inc.
Kiran Mova
Kiran evangelizes open culture and open-source execution models and is a lead maintainer and contributor to the OpenEBS project. Passionate about Kubernetes and Storage Orchestration. Contributor and Maintainer OpenEBS projects. Co-founder and Chief Architect at MayaData Inc.
Murat Karslioglu
VP @OpenEBS & @MayaData_Inc. Murat Karslioglu is a serial entrepreneur, technologist, and startup advisor with over 15 years of experience in storage, distributed systems, and enterprise hardware development. Prior to joining MayaData, Murat worked at Hewlett Packard Enterprise / 3PAR Storage in various advanced development projects including storage file stack performance optimization and the storage management stack for HPE’s Hyper-converged solution. Before joining HPE, Murat led virtualization and OpenStack integration projects within the Nexenta CTO Office. Murat holds a Bachelor’s Degree in Industrial Engineering from the Sakarya University, Turkey, as well as a number of IT certifications. When he is not in his lab, he loves to travel, advise startups, and spend time with his family. Lives to innovate! Opinions my own!