---
title: "Temporal High Availability: Multi-Component Cluster on Kubernetes"
description: "Deploy a production-grade Temporal cluster on Kubernetes with multi-replica services, PostgreSQL persistence, Elasticsearch visibility, and health monitoring."
url: https://agent-zone.ai/knowledge/workflow-orchestration/temporal-ha-cluster/
section: knowledge
date: 2026-02-22
categories: ["workflow-orchestration"]
tags: ["temporal","high-availability","kubernetes","helm","postgresql","elasticsearch","production"]
skills: ["temporal-ha-deployment","production-temporal","kubernetes-resource-management"]
tools: ["temporal","helm","kubectl","postgresql","elasticsearch"]
levels: ["intermediate"]
word_count: 775
formats:
  json: https://agent-zone.ai/knowledge/workflow-orchestration/temporal-ha-cluster/index.json
  html: https://agent-zone.ai/knowledge/workflow-orchestration/temporal-ha-cluster/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Temporal+High+Availability%3A+Multi-Component+Cluster+on+Kubernetes
---


# Temporal High Availability

A single-replica Temporal deployment works for development, but any pod going down takes the workflow engine offline. This guide configures a multi-replica cluster with proper resource allocation, Elasticsearch visibility, and health monitoring.

For the single-replica setup this builds on, see [Running Temporal Server on Minikube](../temporal-minikube-setup/).

## Why HA Matters

| Component | What Breaks When It Goes Down |
|---|---|
| **Frontend** | No client can start, signal, query, or cancel workflows. Workers cannot poll. |
| **History** | Running workflows stall. No state transitions. Timers do not fire. |
| **Matching** | Tasks queue up but never dispatch. Workflows appear frozen. |
| **Worker** | Internal system workflows stop (archival, replication). Application workflows unaffected. |

With multiple replicas, losing a pod triggers a brief rebalance (seconds), not an outage.

## HA Architecture

Each service runs as a separate Deployment with 3+ replicas. **Frontend** is stateless and load-balances trivially. **History** partitions workflow state into shards (default 512); when a pod dies, its shards rebalance to survivors. **Matching** partitions task queue dispatch similarly. **Worker** runs Temporal internals and needs only 2 replicas.

## HA Helm Values

```yaml
# values-temporal-ha.yaml
server:
  config:
    persistence:
      default:
        driver: sql
        sql:
          driver: postgres12
          host: temporal-ha-postgresql
          port: 5432
          database: temporal
          user: postgres
          password: temporal
          maxConns: 40
      visibility:
        driver: sql
        sql:
          driver: postgres12
          host: temporal-ha-postgresql
          port: 5432
          database: temporal_visibility
          user: postgres
          password: temporal
          maxConns: 20
    numHistoryShards: 512

  frontend:
    replicaCount: 3
    resources:
      requests: { cpu: 500m, memory: 512Mi }
      limits: { cpu: "1", memory: 1Gi }
  history:
    replicaCount: 3
    resources:
      requests: { cpu: 500m, memory: 1Gi }
      limits: { cpu: "2", memory: 2Gi }
  matching:
    replicaCount: 3
    resources:
      requests: { cpu: 250m, memory: 256Mi }
      limits: { cpu: "1", memory: 512Mi }
  worker:
    replicaCount: 2
    resources:
      requests: { cpu: 250m, memory: 256Mi }
      limits: { cpu: 500m, memory: 512Mi }

cassandra: { enabled: false }
mysql: { enabled: false }
postgresql: { enabled: false }
elasticsearch: { enabled: false }
schema: { setup: { enabled: true }, update: { enabled: true } }
web:
  replicaCount: 2
  service: { type: ClusterIP, port: 8080 }
```

```bash
helm upgrade --install temporal temporal/temporal \
  --namespace temporal -f values-temporal-ha.yaml --timeout 600s
```

## PostgreSQL for HA

With 11 service replicas at `maxConns: 40`, Temporal opens up to 440 connections. PostgreSQL defaults to 100. Configure it with headroom:

```yaml
primary:
  extendedConfiguration: |
    max_connections = 600
    shared_buffers = 512MB
    effective_cache_size = 1536MB
  resources:
    requests: { cpu: "1", memory: 2Gi }
    limits: { cpu: "2", memory: 4Gi }
  persistence:
    size: 20Gi
```

For high-throughput clusters, deploy PgBouncer between Temporal and PostgreSQL to pool connections. At minimum, configure automated pg_dump backups -- Temporal's PostgreSQL is the system of record for all running workflows.

## Elasticsearch Visibility

SQL-based visibility works for small deployments but struggles with complex queries. Elasticsearch provides indexed custom search attributes and fast filtering.

Enable it by updating the Temporal values:

```yaml
server:
  config:
    persistence:
      visibility:
        driver: elasticsearch
        elasticsearch:
          version: v7
          url: { scheme: http, host: "temporal-elasticsearch:9200" }
          indices: { visibility: temporal_visibility_v1 }
```

Register custom search attributes to make workflows queryable by business fields:

```bash
temporal operator search-attribute create \
  --namespace default --name CustomerId --type Keyword

temporal operator search-attribute create \
  --namespace default --name OrderAmount --type Double
```

Set them from workflow code:

```go
func OrderWorkflow(ctx workflow.Context, order Order) error {
    _ = workflow.UpsertSearchAttributes(ctx, map[string]interface{}{
        "CustomerId":  order.CustomerID,
        "OrderAmount": order.Amount,
    })
    // ... workflow logic
    return nil
}
```

Query with the CLI:

```bash
temporal workflow list \
  --query 'CustomerId = "cust-123" AND OrderAmount > 100.0'
```

## Health Monitoring

Temporal exposes Prometheus metrics on port 9090. The critical ones:

| Metric | Meaning |
|---|---|
| `temporal_persistence_latency` | Database response time. Spikes indicate PostgreSQL issues. |
| `schedule_to_start_latency` | Time from task creation to worker pickup. High means workers cannot keep up. |
| `persistence_errors` | Database errors. Any sustained increase needs investigation. |
| `history_size` | Workflow event count. Histories above 50K events impact performance. |

Alert on these conditions:

```yaml
groups:
- name: temporal
  rules:
  - alert: TemporalPersistenceLatencyHigh
    expr: histogram_quantile(0.99, rate(temporal_persistence_latency_bucket[5m])) > 1
    for: 5m
    annotations:
      summary: "Temporal persistence p99 above 1 second"
  - alert: TemporalScheduleToStartHigh
    expr: histogram_quantile(0.99, rate(schedule_to_start_latency_bucket[5m])) > 30
    for: 5m
    annotations:
      summary: "Tasks waiting 30s+ for worker pickup"
```

## Scaling Guidelines

Scale **frontend** when gRPC latency rises (stateless, simple to add). Scale **history** when workflow task latency grows or shard rebalancing is slow. Scale **matching** when `schedule_to_start_latency` is high but workers are idle.

The `numHistoryShards` is set at cluster creation and cannot be changed without data migration. Choose carefully: 512 for most production workloads, 1024 for high-throughput (>10K concurrent workflows per namespace), 128 for development.

## Comparison: Standard vs HA

| Dimension | Standard (Dev) | HA (Production) |
|---|---|---|
| Service replicas | 1 each | 2-3 each |
| CPU total | ~1.5 cores | ~6 cores |
| Memory total | ~2 GB | ~10 GB |
| Visibility | SQL-based | Elasticsearch |
| Pod disruption tolerance | None | Loses 1 pod per service |
| Recovery time | Minutes (pod restart) | Seconds (shard rebalance) |

## Next Steps

- [Namespaces and Task Queues](../temporal-namespaces-task-queues/) -- organize workflows with proper isolation
- [Temporal Multi-Cluster on Minikube](../temporal-multi-cluster-minikube/) -- multi-cluster setups spanning profiles