OpenEBS Jiva Volumes are a great way to provide highly available persistent volumes to Kubernetes stateful workloads by making use of the local storage available on each node. Installing and setting up OpenEBS is very easy, almost like running just another application in Kubernetes Cluster. Of course, the best part is that OpenEBS Jiva volumes are completely Kubernetes Native and can be managed directly via kubectl.
For setting up OpenEBS and using Jiva volumes, please refer to this documentation.
In this blog, I will focus on some of the common questions that come up in the OpenEBS Users Slack channel, related to running Jiva volumes in production.
Before we get into the specifics, OpenEBS Jiva volumes are implemented with microservices/containers and comprise of two main components, as depicted in the following diagram.
The health of the Jiva Volume can be checked by using mayactl, which provides details status of the controller and the replicas. A sample output looks as follows:
More information on how to run mayactl is available in the OpenEBS Docs.
A Replica is considered healthy if it is in RW State.
OpenEBS Jiva Controller checks that a minimum of 51% of its replicas is healthy (aka RW) for serving the data to the application. So if there are three replicas, then for a volume to be available, at least two out of three replicas must be in RW state.
Due to cluster conditions like a node shutdown or loss of network connectivity, a replica can go into the following states:
WO is a valid state and will remain in this state, till all the data is rebuilt. In the current version, if the Replica is lagging behind from the healthy replica, then a full rebuild is initiated. The time taken to rebuild depends on the amount of the data, the speed of the network, and the storage.
The replicas that are stuck in the NA state need some manual intervention. The NA state can be due to the following scenarios:
Recover from NA if replica pod is in the pending state
Note: We use Node Affinity to pin the replica on the desired worker nodes to avoid multiple copies of data across more than the desired number of replication and also to optimize the rebuilding process.
Prerequisites:
Here are the steps to schedule replica on the other node:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
— N1
— N2
— N3 (replace with N4)
5. Now save and exit, and you will notice a new replica pod will be scheduled to the new worker node.
Another multi-failure scenario to consider is that if multiple replicas remain in the non-healthy state for a longer duration, the application node can mark the volume as read-only. After using the above resolution steps to bring back the volume to a healthy state, a manual step needs to be performed on the application node to make the volume as read-write again. The steps for recovering from the read-only states are available in the OpenEBS Troubleshooting section.
Optimizing the rebuild to reduce the time taken for a replica to go from WO to RW state. Also, right now, I am actively working on a Jiva Operator that can help automatically recovering the replicas that are stuck in the NA state — due to a complete node failure.
Feel free to reach out to me to learn about the internals of Jiva or providing feedback and suggestions to improve your experience with OpenEBS Jiva volumes.
Thank you, Kiran Mova, for your valuable suggestions and edits.