---
title: "Taints, Tolerations, and Node Affinity: Controlling Pod Placement"
description: "How taints repel pods from nodes, how tolerations override them, and how node affinity targets specific nodes for workload placement."
url: https://agent-zone.ai/knowledge/kubernetes/taints-tolerations-and-node-affinity/
section: knowledge
date: 2026-02-21
categories: ["kubernetes"]
tags: ["taints","tolerations","node-affinity","node-selector","scheduling","gpu","dedicated-nodes"]
skills: ["pod-scheduling","node-management","workload-isolation"]
tools: ["kubectl"]
levels: ["intermediate"]
word_count: 1135
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/taints-tolerations-and-node-affinity/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/taints-tolerations-and-node-affinity/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Taints%2C+Tolerations%2C+and+Node+Affinity%3A+Controlling+Pod+Placement
---


# Taints, Tolerations, and Node Affinity

Pod scheduling in Kubernetes defaults to "run anywhere there is room." In production, that is rarely what you want. GPU workloads should land on GPU nodes. System components should not compete with application pods. Nodes being drained should stop accepting new work. Taints, tolerations, and node affinity give you control over where pods run and where they do not.

## Taints: Repelling Pods from Nodes

A taint is applied to a node and tells the scheduler "do not place pods here unless they explicitly tolerate this taint." Taints have three parts: a key, a value, and an effect.

```bash
# Add a taint to a node
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule

# Remove a taint (trailing hyphen)
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule-

# View taints on a node
kubectl describe node gpu-node-1 | grep -A5 Taints
```

### Taint Effects

There are three taint effects, and each behaves differently:

| Effect | Behavior |
|---|---|
| **NoSchedule** | New pods without a matching toleration will not be scheduled on the node. Existing pods are not affected. |
| **PreferNoSchedule** | Soft version of NoSchedule. The scheduler tries to avoid the node but will place pods there if no other option exists. |
| **NoExecute** | Pods without a matching toleration are evicted from the node immediately. New pods are also blocked. |

`NoExecute` is the most aggressive. When you add a `NoExecute` taint to a node, every pod that does not tolerate it gets evicted right then. This is what happens during node drain operations and when Kubernetes detects node problems.

## Tolerations: Allowing Pods on Tainted Nodes

A toleration in a pod spec says "I can run on nodes with this taint." Tolerations do not force a pod onto a tainted node -- they just remove the restriction.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-training-job
spec:
  template:
    spec:
      tolerations:
        # Exact match: key, value, and effect must all match
        - key: "gpu"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      containers:
        - name: trainer
          image: training-job:latest
```

### Match Operators

There are two operators for tolerations:

- **Equal**: The key, value, and effect must all match the taint exactly. This is the default.
- **Exists**: Only the key must match. The value is ignored. Useful when you do not care about the taint value.

```yaml
tolerations:
  # Exists operator: matches any taint with key "gpu" regardless of value
  - key: "gpu"
    operator: "Exists"
    effect: "NoSchedule"

  # Tolerate ALL taints (use with extreme caution)
  - operator: "Exists"
```

### tolerationSeconds for NoExecute

When a `NoExecute` taint is applied, you can use `tolerationSeconds` to control how long a pod stays before eviction:

```yaml
tolerations:
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300  # Stay for 5 minutes, then get evicted
```

Without `tolerationSeconds`, a pod that tolerates a `NoExecute` taint stays indefinitely. With it, the pod is evicted after the specified number of seconds.

## Built-in Taints

Kubernetes automatically applies taints to nodes in certain conditions. These are the most common:

| Taint | When Applied |
|---|---|
| `node.kubernetes.io/not-ready` | Node condition is not Ready |
| `node.kubernetes.io/unreachable` | Node is unreachable from the control plane |
| `node.kubernetes.io/memory-pressure` | Node is running low on memory |
| `node.kubernetes.io/disk-pressure` | Node is running low on disk space |
| `node.kubernetes.io/pid-pressure` | Node is running low on PIDs |
| `node.kubernetes.io/unschedulable` | Node has been cordoned (`kubectl cordon`) |
| `node.kubernetes.io/network-unavailable` | Node network is not configured |

Kubernetes adds default tolerations to every pod for `not-ready` and `unreachable` with `tolerationSeconds: 300`. This means pods survive brief node hiccups (up to 5 minutes) before being evicted and rescheduled.

## Node Affinity: Attracting Pods to Nodes

While taints repel pods, node affinity attracts pods to specific nodes based on node labels. There are two types:

### Required Affinity (Hard Rule)

`requiredDuringSchedulingIgnoredDuringExecution` means the pod will only be scheduled on nodes matching the expression. If no matching node exists, the pod stays Pending.

```yaml
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: node-type
                operator: In
                values:
                  - compute
                  - general
```

### Preferred Affinity (Soft Rule)

`preferredDuringSchedulingIgnoredDuringExecution` tells the scheduler to try to place the pod on matching nodes but does not block scheduling if none are available.

```yaml
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 80
          preference:
            matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                  - us-east-1a
        - weight: 20
          preference:
            matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                  - us-east-1b
```

The `weight` (1-100) influences how strongly the scheduler prefers matching nodes. Higher weight means stronger preference. The scheduler sums weights from all matching preferred rules when scoring nodes.

The `IgnoredDuringExecution` part means that if a node's labels change after a pod is already running, the pod is not evicted. Affinity rules only apply at scheduling time.

## Node Selectors vs Node Affinity

`nodeSelector` is the simpler, older mechanism. It takes a flat map of label key-value pairs and only schedules on nodes matching all of them:

```yaml
spec:
  nodeSelector:
    node-type: compute
    disk: ssd
```

Use `nodeSelector` when you need simple exact-match placement. Use node affinity when you need `In`, `NotIn`, `Exists`, `DoesNotExist`, or `Gt`/`Lt` operators, or when you want preferred (soft) rules with weights.

## Label Strategies for Nodes

Consistent node labeling makes affinity rules manageable:

```bash
# Well-known topology labels (set automatically by cloud providers)
topology.kubernetes.io/zone=us-east-1a
topology.kubernetes.io/region=us-east-1
kubernetes.io/arch=arm64
kubernetes.io/os=linux

# Custom labels for workload isolation
kubectl label nodes node-3 node-type=compute
kubectl label nodes node-4 gpu=nvidia-a100
kubectl label nodes node-5 disk=nvme
```

## Combining Taints and Node Affinity

For dedicated node pools, use both taints and node affinity together. Taints keep unwanted pods off the node. Affinity directs the right pods to it. This is the belt-and-suspenders approach:

```yaml
# Step 1: Taint the GPU nodes
# kubectl taint nodes gpu-node-1 workload=gpu:NoSchedule
# kubectl label nodes gpu-node-1 workload=gpu

# Step 2: Pod spec with both toleration and affinity
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-training
spec:
  template:
    spec:
      tolerations:
        - key: "workload"
          operator: "Equal"
          value: "gpu"
          effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: workload
                    operator: In
                    values:
                      - gpu
      containers:
        - name: trainer
          image: ml-trainer:latest
          resources:
            limits:
              nvidia.com/gpu: 1
```

Without the taint, general workloads could land on the GPU node and waste expensive resources. Without the affinity, the GPU pod might schedule on a non-GPU node (the toleration alone does not direct it anywhere).

## Common Gotchas

**DaemonSets and taints.** DaemonSets that run monitoring agents, log collectors, or network plugins must tolerate the taints on every node they need to run on. If you taint GPU nodes and your Prometheus node-exporter DaemonSet does not have a matching toleration, those nodes will have no metrics. Always check your DaemonSets when adding new taints.

```yaml
# DaemonSet that runs everywhere, even on tainted nodes
tolerations:
  - operator: "Exists"  # Tolerate everything
```

**NoExecute eviction surprises.** Adding a `NoExecute` taint evicts pods immediately. If your application has no `tolerationSeconds` and no PodDisruptionBudget, all replicas on that node can be killed at once. Always pair `NoExecute` with proper PDBs and consider adding `tolerationSeconds` to give pods time to drain gracefully.

**Taint typos are silent.** If you taint a node with `gpuu=true:NoSchedule` (typo), no pods tolerate it, and the scheduler just avoids the node. Nothing errors out. Verify taints with `kubectl describe node` after applying them.