---
title: "Pod Affinity and Anti-Affinity: Co-locating and Spreading Workloads"
description: "How to use pod affinity to co-locate related pods and pod anti-affinity to spread replicas across nodes and zones."
url: https://agent-zone.ai/knowledge/kubernetes/pod-affinity-and-anti-affinity/
section: knowledge
date: 2026-02-21
categories: ["kubernetes"]
tags: ["pod-affinity","anti-affinity","scheduling","topology","high-availability","zone-spreading"]
skills: ["pod-scheduling","high-availability-design","workload-distribution"]
tools: ["kubectl"]
levels: ["intermediate"]
word_count: 1096
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/pod-affinity-and-anti-affinity/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/pod-affinity-and-anti-affinity/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Pod+Affinity+and+Anti-Affinity%3A+Co-locating+and+Spreading+Workloads
---


# Pod Affinity and Anti-Affinity

Node affinity controls which nodes a pod can run on. Pod affinity and anti-affinity go further -- they control whether a pod should run near or away from other specific pods. This is how you co-locate a frontend with its cache for low latency, or spread database replicas across failure domains for high availability.

## Pod Affinity: Schedule Near Other Pods

Pod affinity tells the scheduler "place this pod in the same topology domain as pods matching a label selector." The topology domain is defined by `topologyKey` -- it could be the same node, the same zone, or any other node label.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
spec:
  replicas: 3
  template:
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - redis-cache
              topologyKey: kubernetes.io/hostname
      containers:
        - name: frontend
          image: web-frontend:latest
```

This places each `web-frontend` pod on a node that already has a pod labeled `app=redis-cache`. If no node runs the matching pod, the frontend pod stays Pending.

## topologyKey: Defining "Near"

The `topologyKey` is a node label that defines the scope of co-location or separation. The scheduler groups nodes by the value of this label and treats each group as a topology domain.

| topologyKey | Meaning | Use Case |
|---|---|---|
| `kubernetes.io/hostname` | Same node | Co-locate for lowest latency, separate for node-level HA |
| `topology.kubernetes.io/zone` | Same availability zone | Zone-level co-location or spreading |
| `topology.kubernetes.io/region` | Same region | Regional affinity |
| Custom label (e.g., `rack`) | Same rack/custom group | Rack-aware placement |

When you say `topologyKey: topology.kubernetes.io/zone`, the scheduler groups all nodes by their zone label value. Two pods with affinity will be placed in the same zone but not necessarily on the same node.

## Pod Anti-Affinity: Schedule Away from Other Pods

Anti-affinity is the opposite -- it tells the scheduler "do not place this pod in the same topology domain as pods matching a label selector." This is critical for spreading replicas.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: postgres
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - postgres
              topologyKey: kubernetes.io/hostname
      containers:
        - name: postgres
          image: postgres:16
```

This ensures every `postgres` pod runs on a different node. With 3 replicas, you need at least 3 nodes or the extra pods stay Pending.

## Required vs Preferred

Just like node affinity, pod affinity/anti-affinity has hard and soft variants:

**Required** (`requiredDuringSchedulingIgnoredDuringExecution`): The pod will not schedule if the rule cannot be satisfied. Use this for hard requirements like "database replicas must be on different nodes."

**Preferred** (`preferredDuringSchedulingIgnoredDuringExecution`): The scheduler tries to satisfy the rule but will schedule the pod elsewhere if it cannot. Each preferred rule has a weight from 1 to 100.

```yaml
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - web-api
          topologyKey: topology.kubernetes.io/zone
```

The weight system works like scoring. When the scheduler evaluates candidate nodes, it sums the weights of all satisfied preferred rules. A node where the weight-100 anti-affinity is satisfied scores 100 points higher than one where it is not. With multiple preferred rules, the scheduler picks the node with the highest total score.

## Namespace Scoping

By default, pod affinity/anti-affinity only matches pods in the same namespace as the pod being scheduled. You can expand or restrict this with `namespaceSelector` and `namespaces`:

```yaml
podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
          - key: app
            operator: In
            values:
              - shared-cache
      topologyKey: kubernetes.io/hostname
      # Match pods in namespaces with this label
      namespaceSelector:
        matchLabels:
          team: platform
      # Or explicitly list namespaces
      # namespaces:
      #   - cache-namespace
      #   - shared-services
```

If you set an empty `namespaceSelector: {}`, it matches pods in all namespaces. This is required when your affinity target is in a different namespace.

## Practical Use Cases

### Spread Replicas Across Zones

The most common anti-affinity pattern: ensure a stateless service has replicas in multiple availability zones.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: web-api
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: web-api
                topologyKey: topology.kubernetes.io/zone
      containers:
        - name: api
          image: web-api:latest
```

Using `preferred` here means if you only have 2 zones but 3 replicas, the third replica still schedules (it just doubles up in one zone). With `required`, the third replica would stay Pending.

### Co-locate Frontend with Cache

Place frontend pods on the same node as Redis for minimal network latency:

```yaml
affinity:
  podAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: redis
          topologyKey: kubernetes.io/hostname
```

Using `preferred` with a weight avoids blocking scheduling when the Redis node is full.

### Combined Pattern: Spread and Co-locate

A 3-replica stateless service that spreads across zones while preferring to be near its cache:

```yaml
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: web-api
          topologyKey: kubernetes.io/hostname
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 50
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: redis
            topologyKey: topology.kubernetes.io/zone
```

This says: each replica must be on a different node (hard anti-affinity), and preferably in the same zone as a Redis pod (soft affinity).

## Performance Considerations

Pod affinity and anti-affinity are significantly more expensive for the scheduler to evaluate than node affinity. For every candidate node, the scheduler must check which pods are already running on nodes in the same topology domain. In large clusters (500+ nodes), this can noticeably slow scheduling.

Mitigations:
- Use `preferredDuringSchedulingIgnoredDuringExecution` instead of `required` when possible -- it short-circuits faster.
- Limit `topologyKey` to smaller scopes (`kubernetes.io/hostname` is faster to evaluate than `topology.kubernetes.io/zone` because fewer nodes share the same hostname).
- Consider topology spread constraints (covered in a separate article) as a more efficient alternative for even distribution.

## Common Gotchas

**Required anti-affinity with not enough nodes.** If you have 5 replicas with required anti-affinity on hostname but only 4 nodes, the fifth replica is Pending forever. Use `preferred` unless you genuinely need the hard constraint.

```bash
# Find pending pods and see why
kubectl get pods --field-selector=status.phase=Pending
kubectl describe pod <pending-pod-name>
# Events will show: "0/4 nodes are available: 4 node(s) didn't match pod anti-affinity rules."
```

**Zone spreading with uneven capacity.** If you have 3 zones but zone-c is full, required anti-affinity across zones can leave pods Pending even though other zones have room. The scheduler cannot place a pod in zone-c if there are no available nodes there.

**Label selector must match existing pods.** If your `labelSelector` does not match any running pods, affinity has no effect (no pods to be "near"), and anti-affinity is trivially satisfied (no pods to avoid). Double-check your labels with `kubectl get pods --show-labels`.

**Self-referencing anti-affinity.** When a Deployment uses anti-affinity with its own labels, the first pod schedules fine (no existing pods to conflict with). The second pod then avoids the first pod's node. This is the expected behavior, but it means your anti-affinity is tested starting with the second replica, not the first.

