---
title: "Spot Instances and Preemptible Nodes: Running Kubernetes on Discounted Compute"
description: "How to run Kubernetes workloads on spot/preemptible instances for 60-90% cost savings while handling interruptions gracefully."
url: https://agent-zone.ai/knowledge/kubernetes/spot-and-preemptible-nodes/
section: knowledge
date: 2026-02-21
categories: ["kubernetes"]
tags: ["spot-instances","preemptible","cost-optimization","karpenter","node-termination-handler","eks","gke","aks"]
skills: ["cost-optimization","capacity-planning","node-management","fault-tolerance"]
tools: ["kubectl","karpenter","aws-node-termination-handler","helm"]
levels: ["intermediate"]
word_count: 1401
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/spot-and-preemptible-nodes/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/spot-and-preemptible-nodes/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Spot+Instances+and+Preemptible+Nodes%3A+Running+Kubernetes+on+Discounted+Compute
---


# Spot Instances and Preemptible Nodes

Spot instances are unused cloud capacity sold at a steep discount -- typically 60-90% off on-demand pricing. The trade-off: the cloud provider can reclaim them with minimal notice. AWS gives a 2-minute warning, GCP gives 30 seconds, and Azure varies. Running Kubernetes workloads on spot instances is one of the most effective cost reduction strategies available, but it requires architecture that tolerates sudden node loss.

## Terminology Across Providers

| Provider | Product | Warning Time | Max Lifetime |
|----------|---------|-------------|--------------|
| AWS | Spot Instances | 2 minutes | No limit |
| GCP | Spot VMs | 30 seconds | No limit |
| GCP | Preemptible VMs (legacy) | 30 seconds | 24 hours |
| Azure | Spot VMs | 30 seconds (configurable) | No limit |

GCP Preemptible VMs are the older product with a mandatory 24-hour lifetime. GCP Spot VMs replaced them and have no maximum lifetime. Use Spot VMs for new deployments.

## When to Use Spot

Good candidates for spot:
- **Stateless web services** with multiple replicas behind a load balancer
- **Batch processing** and data pipeline jobs
- **CI/CD runners** and build agents
- **Dev/staging environments** (entire environments can run on spot)
- **Scale-out workers** (queue consumers, stream processors)
- **Machine learning training** with checkpointing

Workloads to keep on on-demand:
- **Databases and stateful singletons** -- data loss risk on sudden termination
- **Control plane components** -- etcd, API server, critical operators
- **Anything that cannot tolerate a 2-minute shutdown window**
- **Pods with PVCs in a single AZ** -- reclamation can strand volumes (discussed in gotchas)

## Architecture Pattern: Mixed Node Pools

The standard pattern is to run two classes of node pools:

```
On-demand pool:  baseline capacity for critical workloads
Spot pool:       burst capacity for fault-tolerant workloads
```

Use taints on spot nodes to ensure only spot-tolerant pods are scheduled there:

```yaml
# Taint on spot node pool
taints:
  - key: kubernetes.io/spot
    value: "true"
    effect: NoSchedule
```

Pods that can run on spot add a matching toleration and a node affinity preference:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker
spec:
  replicas: 10
  template:
    spec:
      tolerations:
        - key: kubernetes.io/spot
          operator: Equal
          value: "true"
          effect: NoSchedule
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: karpenter.sh/capacity-type
                    operator: In
                    values: ["spot"]
      terminationGracePeriodSeconds: 90  # must be < 120s for AWS spot
      containers:
        - name: worker
          image: myapp/worker:latest
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
```

Pods without the toleration are blocked from spot nodes and remain on on-demand nodes.

## Spot Interruption Handling

### AWS: Node Termination Handler

The AWS Node Termination Handler (NTH) watches for EC2 spot interruption notices and automatically cordons and drains the affected node:

```bash
helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableScheduledEventDraining=true \
  --set enableRebalanceMonitoring=true
```

NTH operates in two modes:
- **IMDS mode** (Instance Metadata Service): runs as a DaemonSet on each node, polls the instance metadata endpoint for interruption notices.
- **Queue mode**: uses an SQS queue to receive EC2 events. More reliable and supports additional event types (rebalance recommendations, scheduled maintenance).

Queue mode is recommended for production:

```bash
helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSqsTerminationDraining=true \
  --set queueURL=https://sqs.us-east-1.amazonaws.com/123456789/spot-interruption-queue
```

### GKE: Built-in Handling

GKE handles spot node preemption automatically. When a Spot VM is reclaimed, GKE marks the node for deletion and drains pods. No additional components are needed, but you should still set appropriate `terminationGracePeriodSeconds` and PodDisruptionBudgets.

### AKS: Spot Node Pools

AKS spot node pools handle eviction at the VMSS level. Configure the eviction policy:

```bash
az aks nodepool add \
  --resource-group myRG \
  --cluster-name myCluster \
  --name spotnodepool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --node-count 3 \
  --node-taints kubernetes.azure.com/scalesetpriority=spot:NoSchedule
```

`--eviction-policy Delete` removes the VM entirely on eviction. `Deallocate` keeps the VM but you still lose the workload -- `Delete` is simpler and avoids confusion.

## Graceful Shutdown

Your application must handle `SIGTERM` and shut down within the interruption window. For AWS spot, you have approximately 2 minutes from notice to termination, minus the time NTH takes to cordon and begin draining (typically 10-15 seconds).

```go
// Go example: graceful shutdown on SIGTERM
func main() {
    srv := &http.Server{Addr: ":8080"}

    go func() {
        sigCh := make(chan os.Signal, 1)
        signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT)
        <-sigCh

        log.Println("received shutdown signal, draining connections...")
        ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
        defer cancel()
        srv.Shutdown(ctx)
    }()

    srv.ListenAndServe()
}
```

Set `terminationGracePeriodSeconds` to less than the interruption window:

```yaml
spec:
  terminationGracePeriodSeconds: 90  # 90 seconds, well under 2-minute AWS warning
```

### PodDisruptionBudgets

PDBs protect against too many pods being evicted simultaneously:

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: worker-pdb
spec:
  minAvailable: "60%"
  selector:
    matchLabels:
      app: worker
```

During a spot reclamation, the drain process respects PDBs. If evicting a pod would violate the PDB, the drain blocks until other replicas are available. However, if the node is forcibly terminated (after the 2-minute window), PDBs are bypassed -- the VM simply disappears.

## Instance Type Diversification

The biggest risk with spot is capacity unavailability. If you request only `m5.xlarge` spot instances and that specific type is in high demand, you get no capacity. Diversifying across multiple instance types and availability zones dramatically improves availability.

### AWS with Karpenter

Karpenter automatically selects from a wide range of instance types based on your constraints:

```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-workers
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]  # prefer spot, fall back to on-demand
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c", "r"]  # general, compute, memory families
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]  # 5th gen and newer
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge", "4xlarge"]
      taints:
        - key: kubernetes.io/spot
          value: "true"
          effect: NoSchedule
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: "200"
    memory: "400Gi"
```

With this configuration, Karpenter might launch a `c5.2xlarge` in `us-east-1a`, an `m6i.xlarge` in `us-east-1b`, and an `r5.xlarge` in `us-east-1c` -- whatever has the best spot availability and pricing at that moment. When a spot instance is reclaimed, Karpenter automatically launches a replacement from the available pool.

### AWS Managed Node Groups

If not using Karpenter, configure managed node groups with the `capacity-optimized-prioritized` allocation strategy:

```yaml
# eksctl nodegroup configuration
managedNodeGroups:
  - name: spot-workers
    instanceTypes: ["m5.xlarge", "m5a.xlarge", "m6i.xlarge", "c5.xlarge", "c5a.xlarge", "r5.xlarge"]
    spot: true
    desiredCapacity: 5
    minSize: 0
    maxSize: 20
```

The `capacity-optimized` strategy selects instance types from pools with the most available capacity, reducing the frequency of interruptions.

### GKE Spot Node Pools

```bash
gcloud container node-pools create spot-pool \
  --cluster=my-cluster \
  --spot \
  --num-nodes=3 \
  --machine-type=e2-standard-4 \
  --node-taints=cloud.google.com/gke-spot=true:NoSchedule
```

GKE does not support mixed instance types within a single node pool the same way AWS does. Use multiple spot node pools with different machine types for diversification.

## Cost Tracking

Spot savings are often invisible in basic Kubernetes monitoring because the cluster does not know what you are paying per node. Use cloud-native cost tools:

```bash
# AWS: check spot pricing history
aws ec2 describe-spot-price-history \
  --instance-types m5.xlarge m5a.xlarge c5.xlarge \
  --product-descriptions "Linux/UNIX" \
  --start-time $(date -u -v-1d +%Y-%m-%dT%H:%M:%S) \
  --query 'SpotPriceHistory[*].{Type:InstanceType,AZ:AvailabilityZone,Price:SpotPrice}' \
  --output table
```

Kubecost can distinguish spot vs on-demand costs when given access to your cloud billing data, showing the actual savings achieved.

## Common Gotchas

**Mass reclamation during capacity crunches.** When a cloud region runs low on capacity, many spot instances are reclaimed simultaneously. If all your spot nodes disappear at once, the remaining on-demand nodes face a thundering herd of rescheduling pods. Mitigate with: PDBs, topology spread constraints, and enough on-demand baseline capacity to absorb the critical workloads.

**PVCs stuck in the wrong AZ.** When a spot node in `us-east-1a` is reclaimed, any PVC attached to pods on that node stays bound to the `us-east-1a` zone. If the replacement node lands in `us-east-1b`, the pod cannot mount the volume. Solutions: use topology-aware scheduling (`volumeBindingMode: WaitForFirstConsumer`), or use EFS/Filestore (cross-AZ storage) for spot workloads.

**Spot interruption during deployment rollout.** If a spot node is reclaimed mid-rollout, the new and old ReplicaSets both lose pods. Combined with a tight PDB, this can stall the rollout. Set rollout `maxUnavailable` and `maxSurge` with spot interruptions in mind.

## Practical Example: EKS with On-Demand Baseline and Spot Overflow

A production EKS cluster running a web application with background workers:

- **On-demand node pool** (3x `m6i.2xlarge`): runs the API servers, databases, Redis, and Prometheus. These pods have no spot toleration.
- **Spot node pool** (Karpenter-managed, 5-15 nodes): runs background workers, batch processors, and non-critical services. Karpenter selects from 15+ instance types across 3 AZs.

Monthly cost breakdown:
- On-demand: 3 nodes at $0.384/hr = $829/month
- Spot: average 8 nodes at $0.10/hr (avg spot price for mixed types) = $576/month
- Same workload fully on-demand would cost: $2,650/month
- **Total savings: ~47%** compared to all on-demand

The spot nodes experience 2-3 interruptions per week. Karpenter replaces each within 60 seconds. Application-level retries handle the in-flight requests that are lost during the interruption window.

