Temporal High Availability#
A single-replica Temporal deployment works for development, but any pod going down takes the workflow engine offline. This guide configures a multi-replica cluster with proper resource allocation, Elasticsearch visibility, and health monitoring.
For the single-replica setup this builds on, see Running Temporal Server on Minikube.
Why HA Matters#
| Component | What Breaks When It Goes Down |
|---|---|
| Frontend | No client can start, signal, query, or cancel workflows. Workers cannot poll. |
| History | Running workflows stall. No state transitions. Timers do not fire. |
| Matching | Tasks queue up but never dispatch. Workflows appear frozen. |
| Worker | Internal system workflows stop (archival, replication). Application workflows unaffected. |
With multiple replicas, losing a pod triggers a brief rebalance (seconds), not an outage.