The Story of the Pod — Manifest to Running

Have you ever imagined what happens behind the scenes when you create a pod in Kubernetes using the manifest yaml? I will try to tell the story of a pod, from the manifest yaml to its journey to becoming an actual running pod on the node.
This story will need a little bit of background regarding Kubernetes and how a pod is created. I have provided links to Kubernetes docs and an external link to Github to study more details regarding the topic. I would first recommend going through the blog to understand the journey, and then you can get into the details.

The Story of the Pod

To create a pod in Kubernetes, we first write its manifest yaml, which actually contains quite a bit of information. This information on the manifest yaml is nothing but a guideline on what an actual running pod should look like. You can consider this analogous to creating a house map design before building the actual house. The house map is read by a several people who are involved in building the house as specified in the map design. The story of creating a pod I similar— here, instead of several people, there are some Kubernetes components that help the pod towards its running state.

I will paste a very simple pod manifest yaml here to elaborate on this example in simple terms. Whether the pod manifest is very basic or complex, what happens under the hood is the same. Take a look at the following pod manifest yaml.

  apiVersion: v1
  kind: Pod
  metadata:
    name: example-pod
    labels:
       env: story-telling
    spec:
      containers:
      - name: example-pod
        image: sonasingh46/node-web-app:latest
        ports:
        - containerPort: 8000

Part 1: (Kube-apiserver:Authentication) Authentication to kube-apiserver to submit pod manifest

  1. For the very first step we usually write a pod manifest yaml file and input the key and values according to our requirements.

  2. Create this pod using the kubectl client tool. To create the above specified manifest yaml, save it in a yaml file , for instance story-teller-pod.yaml and execute the command kubectl apply -f story-teller-pod.yaml.

  3. The above command sends a request to kube-apiserver for pod creation, but the kube-apiserver is authenticated. So, we need to pass the authentication.

  4. Your kubectl client tool uses the kubeconfig file that has credentials to authenticate to the kube-apiserver. The yaml manifest is converted into a JSON payload and passed to the kube-apiserver by the kubectl through a post request to api/v1/namesapces/{namespace}/pod. As pod is a namespace resource in kubernetes, we need to provide the namespace where it will be created.

  5. Now, the JSON object which describes a pod has been handed over to the kube-apiserver.

  6. Here, we humans have created the pod manifest and kept our authentication credentials in a file called kubeconfig. So, we are a client/user to kube-apiserver and this type of user/account is generally called normal user account. There is another type of user account known as service account, which is used by Kubernetes pods.
    Please go through the following link to understand more on service accounts.
    https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

It should be noted that there are several authentication modules in kube-apiserver, and if any one module passes the request the authentication is completed.
To learn more about how authentication works in kube-apiserver, please review the following document:
https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#authentication

In short, we can summarise that a normal user account must be authenticated to kube-apiserver and submitted to the pod manifest to the kube-apiserver for further steps to be taken so that a pod can be created successfully.

Part 2: (Kube-apiserver: Authorization) Checking the authority of the user to create a pod.

  1. After passing the authentication at the kube-apiserver layer, now it is the turn of the authorisation module of kube-apiserver to check whether the user who is trying to create a pod has permission to do so.

  2. The request (to create a pod) contains the username of the requester and is authorized if an existing policy declares that the user has permissions to complete the requested action. Policies for users can be created that give them different types of roles.

  3. At this point, if the existing policy allows the request to be authorized, further steps are taken or else the request is rejected.

It should be noted that there are several authorization modules in kube-apiserver. If any one module authorizes the request then it can proceed.
To learn more about authorization in kube-apiserver, please go through the following link :
https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#authorization

In short, we can summarise by saying that once the pod request passes the authentication layer, the request is checked for authorization. Again, if the request passes here further steps are taken.

Part 3: (Kube-apiserver: Admission Control) Admitting a pod to the database

  1. Here, the pod object is checked against various admission controls before persisting to the etcd database.
    Let us explain this with an example. NamespaceExists is an admission control that checks whether the namespace where we are trying to create the pod exists or not. If the namespace exists, it either passes this admission control or is rejected. Similarly, there are a few more admission control checks that would be performed.

  2. Once the pod has passed all the required admission controls, the request is accepted and finally persisted to the etcd database.

It should be noted that if any one of the admission control modules in the list rejects the request, the entire request is rejected and an error is returned.

Great! At this point of time, the pod object is persisted to the database if all the above steps specified passes successfully.

In general, for most of the api request to kube-apiserver, the above three specified sections are executed.

Part 4: (Scheduler) Scheduling the pod to a node

  1. Kube-Scheduler is a master component of kubernetes that constantly watches for pods with an empty spec.nodeName.

  2. After this, the scheduler finds a suitable node where this pod can be run by executing an algorithm.

  3. Once a node has been selected for the pod, it gives a post request to api-server to create binding resources. A binding resource ties one object to another; for example, a pod is bound to a node by the scheduler.

  4. Basically, kube-scheduler fills the spec.nodeName field of pod by creating these binding resources.
    (Question: Why create a binding resource and not directly update the spec.nodeName field on the pod?
    Answer: It is something related to RBAC, which has resource/sub-resource granularity. TBH, I also am a bit confused on this. Feel free to comment, share or write regarding this. I may try to put this in a separate blog)
    But in simple terms, we can understand that after a node has been selected for a pod to run spec.nodeName the field of the pod will contain the selected node name.

To learn more about how scheduling in Kubernetes works, please review the following link:
https://github.com/kubernetes/community/blob/master/contributors/devel/scheduler.md

Part 5: (Kubelet) Running the pod to the selected node

  1. Kubelet will watch monitor the pods for the node on which it is running.

  2. Now, after performing a few tasks, Kubelet (What are those? At the end of blog, I will provide a link to more specifics for not only Kubelet but other components also) starts the container run-time.

Wow! This was a high-level picture of a pod’s journey in Kubernetes. I have tried to keep it simple and frame the content based on my experiments and asking questions on Kubernetes slack.

If you feel it needs any correction or feedback, feel free to comment.
Also, you can review the following link to learn more specifics of what happens in each part specified above.
https://github.com/jamiehannaford/what-happens-when-k8s

I hope this helps! Thank you and see you in the next post!


This article was first published on Dec 14, 2018 on MayaData's Medium Account

 

Utkarsh Mani Tripathi
Utkarsh is a maintainer of jiva project and has contributed in building both control and data plane of OpenEBS. He loves to learn about file-system, distributed systems and networking. Currently, he is mainly focusing on enhancing jiva and maya-exporter In his free time, he loves to write poems and make lip smacking dishes
Chuck Piercey
Chuck Piercey is a Silicon Valley product manager with experience shipping more than 15 products in several different market segments representing a total of $2.5Bn revenue under both commercial and open source business models. Most recently he has been working for MayaData, Inc. focused on software-defined storage, network, and compute for Kubernetes environments. Chuck occasionally writes articles about the technology industry.
Evan Powell
Founding CEO of a few companies including StackStorm (BRCD) and Nexenta — and CEO &Chairman of OpenEBS / MayaData. ML and DevOps and Python, oh my!