Kubernetes Taints And Tolerations And Node Affinity

Behdad Kardgar
14 min readJul 27, 2023

--

Before jumping into the discussion of Taints, Tolerations, and Node Affinity in Kubernetes, it is essential to establish a solid understanding of the foundational concepts that support Kubernetes architecture. These foundational elements include nodes, pods, and pod scheduling.

In a Kubernetes cluster, you have multiple nodes, which are the underlying virtual or physical machines that run containers. These nodes together form the cluster’s computing resources. Kubernetes organizes containers into logical units called “pods,” where each pod can contain one or more tightly coupled containers that share resources and network namespaces.

Nodes are single compute instances in a Kubernetes cluster. Each node can be a virtual machine or a physical server. Nodes are the worker machines responsible for running containers and executing tasks as instructed by the Kubernetes control plane. They are the foundation of the cluster’s computing power.

Pods are the smallest deployable units in Kubernetes and represent one or more containers that are scheduled together on the same host. Containers within a pod share the same network namespace and can communicate with each other via localhost. Pods provide a way to encapsulate and manage one or more closely related containers as a single unit. This tight coupling allows them to share resources and be co-located on the same node.

In a Kubernetes cluster, the control plane manages the overall state and configuration of the cluster, while the nodes run the workloads (pods) according to the control plane’s instructions. The control plane includes components like the API server, etcd (the cluster’s data store), the scheduler, and the controller manager.

For further reading, you can refer to the official Kubernetes documentation: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/

Pod Scheduling

Pod scheduling is a critical aspect of Kubernetes, where the system determines which nodes in the cluster should run specific pods. The scheduling process involves the following steps:

  1. Pod Creation: When you create a pod, you define its resource requirements, such as CPU and memory requests and limits, along with any affinity or anti-affinity rules that specify preferences or restrictions regarding the nodes where the pod can be scheduled.
  2. Scheduling Decisions: The Kubernetes scheduler is a component of the control plane responsible for making scheduling decisions. It continuously monitors the cluster’s state and identifies suitable nodes for placing pods based on the pod’s requirements, resource availability, and other constraints.
  3. Node Scoring: The scheduler assigns a “score” to each node based on various factors, such as resource availability, node health, and any affinity/anti-affinity rules. The node with the highest score that meets the pod’s requirements is selected for scheduling.
  4. Binding the Pod: Once the scheduler chooses a node, it binds the pod to that node, ensuring that the pod’s containers run on the chosen node.

Keep that in mind that there are several factors influence pod scheduling in Kubernetes:

  1. Resource Requests and Limits: Pods can specify resource requests and limits for CPU and memory. The scheduler uses these values to ensure that a node has enough resources available to run the pod efficiently.
  2. Node Capacity: The scheduler considers the available resources (CPU, memory) on each node and ensures that the node has sufficient capacity to accommodate the pod’s resource requirements.
  3. Affinity and Anti-Affinity Rules: Pod affinity and anti-affinity rules define preferences and constraints for pod placement. For example, affinity rules can be used to schedule pods on nodes with specific labels or in the same node group, while anti-affinity rules can prevent pods from co-locating on the same node.
  4. Taints and Tolerations: Taints are applied to nodes to repel certain pods, while tolerations are set on pods to allow them to tolerate specific taints. This mechanism helps control which pods can be scheduled on specific nodes.
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: nginx:latest
resources:
requests:
cpu: "100m" # 100 milli CPUs (0.1 CPU)
memory: "256Mi" # 256 Mebibytes
limits:
cpu: "200m" # 200 milli CPUs (0.2 CPU)
memory: "512Mi" # 512 Mebibytes

In this example, the pod specifies CPU and memory requests and limits, helping the scheduler make appropriate decisions during pod scheduling.

Feel free to checkout the official documentation for further information: https://kubernetes.io/docs/concepts/scheduling-eviction/

Taints and Tolerations

Its time to discuss Taints and Tolerations in Kubernetes. Taints are a feature in Kubernetes that allow a node to reject certain pods. When a node is tainted, it means that the node has a special attribute that can prevent pods from being scheduled on it by default. Taints are useful when you want to reserve certain nodes for specific workloads or when you want to mark a node as unsuitable for running certain pods.

Tolerations: Tolerations, on the other hand, are set on pods, allowing them to tolerate (i.e., ignore) specific taints on nodes. When a pod has a toleration that matches a taint applied to a node, the pod can be scheduled on that tainted node. Tolerations enable pods to bypass the taint and be scheduled on nodes where they might not have been allowed otherwise.

Now you might say SO WHAT? When and why should I use Taints? Taints are typically used to ensure that certain nodes are not overloaded with specific types of workloads or to mark nodes for special purposes. For example:

  1. Dedicating Nodes: You might have a set of nodes with specialized hardware or high-performance capabilities. By tainting those nodes with a specific taint, you can reserve them for specific workloads that require those resources.
  2. Avoiding Scheduling on Critical Nodes: In a cluster, you might have some nodes that are reserved for critical system components (e.g., monitoring, networking, storage). By tainting those nodes, you can prevent regular application pods from being scheduled on them, ensuring that system-critical pods have the resources they need.

Here’s an example of how to taint a node in Kubernetes:

kubectl taint nodes node-name key=value:taint-effect

In this example, “node-name” is the name of the node you want to taint, “key=value” is the key-value pair for the taint, and “taint-effect” specifies the effect of the taint, which can be one of “NoSchedule,” “NoExecute,” or “PreferNoSchedule.” Below is a short explaination of taint-effects.

NoSchedule: This means that no new pods will be scheduled on nodes with this taint. Existing pods on the node will continue to run unless they are evicted for other reasons.

NoExecute: Similar to “NoSchedule,” no new pods will be scheduled on nodes with this taint. Additionally, existing pods on the node that do not tolerate the taint will be evicted.

PreferNoSchedule: This indicates that Kubernetes will try to avoid scheduling new pods on nodes with this taint, but it is not a strict prohibition like “NoSchedule.” It will be taken into account if other suitable nodes are available.

When you create a pod, you can add tolerations to it by specifying the taint key and value that the pod should tolerate. When the Kubernetes scheduler evaluates potential nodes for the pod, it checks the pod’s tolerations against the taints present on each node.

If the pod’s tolerations match a taint on a node, the scheduler considers that node as a candidate for scheduling the pod. If no taints match the pod’s tolerations, the node is not considered as a scheduling option.

Here’s an example of how to add a toleration to a pod in a YAML file:

apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: nginx:latest
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"

In this example, the pod has a toleration for a taint with the key “key,” value “value,” and effect “NoSchedule.” If a node has a matching taint, the pod can be scheduled on that node.

Lets have a look at the following image

In this example, the first three nodes from the left are associated with Blue, Red, and Green Taints, respectively. The last two nodes, however, have no taints applied to them. Additionally, there are five pods in the cluster, but only three of these pods have tolerations set that match the taints on the nodes.

Ideally, based on the taints and toleration sets defined on both pods and nodes, we would expect the pods with the same color to be placed on nodes with matching taints. However, in reality, this is not always how the scheduling works. As you can see green pod is not setting in green node.

Taints ensure that only pods with matching tolerations can be scheduled on the corresponding node, but they do not guarantee that pods with tolerations will always be scheduled on nodes with the same taint. The scheduler attempts to find suitable nodes based on various factors, including resource availability, affinity/anti-affinity rules, and node conditions. As a result, pods with tolerations might still end up being scheduled on different nodes if the preferred nodes are not available or if other factors influence the scheduling decisions.

Taints ensure that only compatible pods can be scheduled on nodes, but they do not guarantee that pods with tolerations will be confined to nodes with matching taints. The actual pod scheduling process involves considering multiple factors, and pods may be placed on different nodes even if they have the required tolerations. Later in this article we will solve this issue.

Have a look at the following official documentation to get more information about taints and tolerations: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

Labels in Kubernetes

Labels are a fundamental concept in Kubernetes used to tag or mark objects such as pods, nodes, services, and deployments with key-value pairs. These labels are used to identify and group related resources, enabling easy and flexible organization and selection of resources.

Labels are lightweight and can be applied to various Kubernetes objects. They are intended to be used for specifying attributes, characteristics, or roles of the resources rather than for controlling resource behavior. Labels are primarily used for operational and organizational purposes.

To apply a label to a Kubernetes object, you specify a set of key-value pairs in the object’s metadata. For example, in a pod’s YAML definition, you can add labels under the “metadata” section:

apiVersion: v1
kind: Pod
metadata:
name: my-pod
labels:
app: webapp
tier: frontend
spec:
containers:
- name: my-container
image: nginx:latest

In this example, we have defined a pod named “my-pod” with two labels: “app: webapp” and “tier: frontend.” These labels can be used to organize and identify the pod.

Once labels are applied to objects, you can use them for various purposes:

Grouping and Organizing: Labels allow you to group related resources together based on their attributes. For instance, you can label all frontend pods with the “tier: frontend” label and all backend pods with the “tier: backend” label.

Service Discovery: Labels play a crucial role in service discovery. When you create a service in Kubernetes, you can use label selectors to define which pods the service should route traffic to.

Filtering and Selection: Label selectors are used to identify sets of objects that have specific labels. For example, you can use label selectors to select all pods with the “app: webapp” label or all nodes with the “disk: ssd” label.

Labels are a powerful tool for managing Kubernetes resources and provide a flexible way to categorize and organize objects in your cluster.

Here are some practical examples of how labels can be used in Kubernetes:

Grouping Related Pods: You can label pods based on their purpose or component. For example, label frontend pods with “tier: frontend” and backend pods with “tier: backend.”

Service Discovery: When creating a Kubernetes service, use label selectors to specify which pods should be part of the service. For instance, a service that exposes frontend pods can use the label selector “tier: frontend.”

Applying Policies: Labels can be used to apply policies or configurations to specific sets of pods. For instance, you can label certain pods with “environment: production” to apply production-level configurations.

Rolling Updates: When performing rolling updates, you can use labels to ensure that only pods with specific labels are updated, allowing you to control the update process more precisely.

Node Selection: You can use node labels to group nodes based on hardware capabilities or availability. Then, when scheduling pods, you can use node selectors to ensure pods are deployed on nodes with specific labels.

For further reading on labels in Kubernetes, you can refer to the official documentation: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

Label Selectors

Label selectors in Kubernetes are a powerful mechanism to identify sets of objects (such as pods, services, deployments, and more) that have specific labels attached to them. By using label selectors, you can efficiently group, filter, and manage resources based on their associated labels.

In Kubernetes, label selectors come in two types:

Equality-Based Selectors: These selectors are used to match objects that have labels with specific key-value pairs. An equality-based selector uses the equality operator (=) to match labels exactly.

apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: webapp

In this example, the service will route traffic to pods that have the label app: webapp.

Set-Based Selectors: Set-based selectors allow more complex label matching using set operators like in, notin, exists, and doesnotexist. These operators provide more flexibility in selecting objects with various label combinations.

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
selector:
matchExpressions:
- {key: app, operator: In, values: [webapp, backend]}
- {key: tier, operator: NotIn, values: [test]}

In this example, the deployment will manage pods with labels app equal to either webapp or backend, and tier not equal to test.

Label selectors are used in various Kubernetes resources, such as services, deployments, and replica sets. They are essential for grouping related resources and controlling the behavior of services, load balancing, and routing traffic to the correct pods.

Lets have another example. In this example, we’ll use labels and label selectors to schedule a pod to specific nodes based on their labels.

Step 1: Apply Labels to Nodes

kubectl label nodes node-name key1=value1 key2=value2

Replace node-name with the name of the node, and key1=value1 key2=value2 with the labels you want to apply to the node.

Step 2: Create a Pod with Label Selector

apiVersion: v1
kind: Pod
metadata:
name: my-labeled-pod
spec:
containers:
- name: my-container
image: nginx:latest
nodeSelector:
key1: value1

In this example, we created a pod named my-labeled-pod with one container running the latest Nginx image. The pod has a label selector key1: value1, meaning it will be scheduled only on nodes with the matching label.

Node Affinity

Node affinity is part of Kubernetes’ advanced scheduling techniques, and it complements other features like taints and tolerations. While taints and tolerations allow nodes to repel or tolerate certain pods, node affinity provides a way to attract specific pods to specific nodes. This is particularly useful when you have nodes with specialized hardware or when you want to segregate workloads based on node characteristics.

Node affinity is configured using node affinity rules, which consist of node selectors and operator expressions. A node selector is a set of key-value pairs that describe node labels, and the operator expressions determine how the pods are scheduled based on these labels.

The two types of operators used in node affinity rules are:

requiredDuringSchedulingIgnoredDuringExecution: Pods with this rule must be scheduled onto nodes with matching labels. If no nodes match the affinity rule during pod creation, the pod will remain unscheduled until a suitable node becomes available.

preferredDuringSchedulingIgnoredDuringExecution: Pods with this rule prefer to be scheduled onto nodes with matching labels, but they can still be scheduled on nodes without the matching labels if necessary. Kubernetes will try to satisfy the preference, but it is not mandatory.

Kubernetes provides three types of node affinity that allow you to define different scheduling behavior:

  • requiredDuringSchedulingIgnoredDuringExecution: Pods with this affinity rule must be scheduled onto nodes that satisfy the defined labels. If no nodes meet the criteria, the pod will remain unscheduled.
  • preferredDuringSchedulingIgnoredDuringExecution: Pods with this affinity rule have a preference for nodes with matching labels but can be scheduled elsewhere if no matching nodes are available.
  • requiredDuringSchedulingRequiredDuringExecution: This is similar to the requiredDuringSchedulingIgnoredDuringExecution type, but it ensures that once a pod is scheduled on a node, it continues to run only on nodes that satisfy the affinity rules during its lifecycle.

Let’s consider a scenario where you have a Kubernetes cluster with three nodes, and one of the nodes is labeled as “red.” You have a deployment with three pods, and you want these pods to be scheduled only on nodes with the label “red.”

kubectl label nodes <node-name> color=red

Next, you can create a deployment YAML file with node affinity rules:

apiVersion: apps/v1
kind: Deployment
metadata:
name: red-pod-deployment
spec:
replicas: 3
selector:
matchLabels:
app: red-pod
template:
metadata:
labels:
app: red-pod
spec:
containers:
- name: red-container
image: your-red-image:latest
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: color
operator: In
values:
- red

In this example, we use node affinity with requiredDuringSchedulingIgnoredDuringExecution. The requiredDuringScheduling part ensures that the pods are scheduled only on nodes that have the label "color=red." The IgnoredDuringExecution part means that if the node label changes after the pod is running, the pod will continue to run on the node.

Now, when you create the deployment, Kubernetes will ensure that all the pods are scheduled only on nodes with the label “color=red.”

Lets have a look at the below image.

In this example, the first three nodes from the left are labeled as Blue, Red, and Green, respectively. The last two nodes, however, have no labels assigned to them. Additionally, there are five pods in the cluster, but only three of these pods have labels set that match the labels on the nodes.

Based on the node affinity and the labels assigned to pods, our expectation is that only pods with the same label as a node should be scheduled on that specific node. However, in reality, this is not always the case.

Labels ensure that pods with matching labels should be scheduled on nodes with the corresponding labels. However, labels do not guarantee that pods with no labels cannot be scheduled on nodes with label sets on them.

In this specific example, we can observe that a pod with no matching label was scheduled on a node labeled as “Green.” This is because the scheduler takes into account other available nodes and their capacities before making the scheduling decision.

labels ensure that pods with the same label as a node can be scheduled on that node, but they do not prevent pods with no labels from being scheduled on nodes with label sets. The final pod scheduling decision is influenced by a combination of factors, and pods may be placed on nodes with different labels based on those considerations.

If you want more information about label selectors in Kubernetes, you can refer to the official documentation: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors

Taint and Toleration and Node Affinity

By combining taints, tolerations, and Node Affinity, we can precisely control and restrict pod scheduling to enforce specific rules for each node. So far, we have learned that Taints ensure only pods with corresponding tolerations can be scheduled on nodes with matching taints but they do not guarantee that pods with tolerations will always be scheduled on nodes with the same taint.

On the other hand, Node Affinity, which relies on labels, guarantees that pods with the same label as a node will always be scheduled on that particular node. However, it does not prevent pods without label from being scheduled on nodes with labels set.

To achieve the desired outcome, where only pods with a specific color should be scheduled on nodes with the same color label, and no other pods without color labels should be placed on those nodes, we need to employ a combination of taints, tolerations, and Node Affinity.

By setting taints on nodes and corresponding tolerations on pods, we can ensure that only pods with matching tolerations are allowed to run on nodes with the appropriate taints. Then, by implementing Node Affinity with labels, we can guarantee that pods with the required color label will always be scheduled on nodes with the corresponding color label.

the combination of taints, tolerations, and Node Affinity offers a powerful mechanism to enforce strict scheduling rules, allowing us to control precisely which pods are scheduled on specific nodes based on their attributes, such as color labels in this example.

First use Taints and Tolerations to prevent not colored pods from being placed on colored nodes and then we use Node Affinity to prevent colored pods from being placed on not colored nodes.

--

--

Behdad Kardgar
Behdad Kardgar

No responses yet