{"page":{"agent_metadata":{"content_type":"guide","outputs":["deploy-ha-temporal","configure-multi-replica-temporal","monitor-temporal-health"],"prerequisites":["temporal-basics","kubernetes-intermediate","helm-intermediate"]},"categories":["workflow-orchestration"],"content_plain":"Temporal High Availability# A single-replica Temporal deployment works for development, but any pod going down takes the workflow engine offline. This guide configures a multi-replica cluster with proper resource allocation, Elasticsearch visibility, and health monitoring.\nFor the single-replica setup this builds on, see Running Temporal Server on Minikube.\nWhy HA Matters# Component What Breaks When It Goes Down Frontend No client can start, signal, query, or cancel workflows. Workers cannot poll. History Running workflows stall. No state transitions. Timers do not fire. Matching Tasks queue up but never dispatch. Workflows appear frozen. Worker Internal system workflows stop (archival, replication). Application workflows unaffected. With multiple replicas, losing a pod triggers a brief rebalance (seconds), not an outage.\nHA Architecture# Each service runs as a separate Deployment with 3+ replicas. Frontend is stateless and load-balances trivially. History partitions workflow state into shards (default 512); when a pod dies, its shards rebalance to survivors. Matching partitions task queue dispatch similarly. Worker runs Temporal internals and needs only 2 replicas.\nHA Helm Values# # values-temporal-ha.yaml server: config: persistence: default: driver: sql sql: driver: postgres12 host: temporal-ha-postgresql port: 5432 database: temporal user: postgres password: temporal maxConns: 40 visibility: driver: sql sql: driver: postgres12 host: temporal-ha-postgresql port: 5432 database: temporal_visibility user: postgres password: temporal maxConns: 20 numHistoryShards: 512 frontend: replicaCount: 3 resources: requests: { cpu: 500m, memory: 512Mi } limits: { cpu: \u0026#34;1\u0026#34;, memory: 1Gi } history: replicaCount: 3 resources: requests: { cpu: 500m, memory: 1Gi } limits: { cpu: \u0026#34;2\u0026#34;, memory: 2Gi } matching: replicaCount: 3 resources: requests: { cpu: 250m, memory: 256Mi } limits: { cpu: \u0026#34;1\u0026#34;, memory: 512Mi } worker: replicaCount: 2 resources: requests: { cpu: 250m, memory: 256Mi } limits: { cpu: 500m, memory: 512Mi } cassandra: { enabled: false } mysql: { enabled: false } postgresql: { enabled: false } elasticsearch: { enabled: false } schema: { setup: { enabled: true }, update: { enabled: true } } web: replicaCount: 2 service: { type: ClusterIP, port: 8080 }helm upgrade --install temporal temporal/temporal \\ --namespace temporal -f values-temporal-ha.yaml --timeout 600sPostgreSQL for HA# With 11 service replicas at maxConns: 40, Temporal opens up to 440 connections. PostgreSQL defaults to 100. Configure it with headroom:\nprimary: extendedConfiguration: | max_connections = 600 shared_buffers = 512MB effective_cache_size = 1536MB resources: requests: { cpu: \u0026#34;1\u0026#34;, memory: 2Gi } limits: { cpu: \u0026#34;2\u0026#34;, memory: 4Gi } persistence: size: 20GiFor high-throughput clusters, deploy PgBouncer between Temporal and PostgreSQL to pool connections. At minimum, configure automated pg_dump backups \u0026ndash; Temporal\u0026rsquo;s PostgreSQL is the system of record for all running workflows.\nElasticsearch Visibility# SQL-based visibility works for small deployments but struggles with complex queries. Elasticsearch provides indexed custom search attributes and fast filtering.\nEnable it by updating the Temporal values:\nserver: config: persistence: visibility: driver: elasticsearch elasticsearch: version: v7 url: { scheme: http, host: \u0026#34;temporal-elasticsearch:9200\u0026#34; } indices: { visibility: temporal_visibility_v1 }Register custom search attributes to make workflows queryable by business fields:\ntemporal operator search-attribute create \\ --namespace default --name CustomerId --type Keyword temporal operator search-attribute create \\ --namespace default --name OrderAmount --type DoubleSet them from workflow code:\nfunc OrderWorkflow(ctx workflow.Context, order Order) error { _ = workflow.UpsertSearchAttributes(ctx, map[string]interface{}{ \u0026#34;CustomerId\u0026#34;: order.CustomerID, \u0026#34;OrderAmount\u0026#34;: order.Amount, }) // ... workflow logic return nil }Query with the CLI:\ntemporal workflow list \\ --query \u0026#39;CustomerId = \u0026#34;cust-123\u0026#34; AND OrderAmount \u0026gt; 100.0\u0026#39;Health Monitoring# Temporal exposes Prometheus metrics on port 9090. The critical ones:\nMetric Meaning temporal_persistence_latency Database response time. Spikes indicate PostgreSQL issues. schedule_to_start_latency Time from task creation to worker pickup. High means workers cannot keep up. persistence_errors Database errors. Any sustained increase needs investigation. history_size Workflow event count. Histories above 50K events impact performance. Alert on these conditions:\ngroups: - name: temporal rules: - alert: TemporalPersistenceLatencyHigh expr: histogram_quantile(0.99, rate(temporal_persistence_latency_bucket[5m])) \u0026gt; 1 for: 5m annotations: summary: \u0026#34;Temporal persistence p99 above 1 second\u0026#34; - alert: TemporalScheduleToStartHigh expr: histogram_quantile(0.99, rate(schedule_to_start_latency_bucket[5m])) \u0026gt; 30 for: 5m annotations: summary: \u0026#34;Tasks waiting 30s+ for worker pickup\u0026#34;Scaling Guidelines# Scale frontend when gRPC latency rises (stateless, simple to add). Scale history when workflow task latency grows or shard rebalancing is slow. Scale matching when schedule_to_start_latency is high but workers are idle.\nThe numHistoryShards is set at cluster creation and cannot be changed without data migration. Choose carefully: 512 for most production workloads, 1024 for high-throughput (\u0026gt;10K concurrent workflows per namespace), 128 for development.\nComparison: Standard vs HA# Dimension Standard (Dev) HA (Production) Service replicas 1 each 2-3 each CPU total ~1.5 cores ~6 cores Memory total ~2 GB ~10 GB Visibility SQL-based Elasticsearch Pod disruption tolerance None Loses 1 pod per service Recovery time Minutes (pod restart) Seconds (shard rebalance) Next Steps# Namespaces and Task Queues \u0026ndash; organize workflows with proper isolation Temporal Multi-Cluster on Minikube \u0026ndash; multi-cluster setups spanning profiles ","date":"2026-02-22","description":"Deploy a production-grade Temporal cluster on Kubernetes with multi-replica services, PostgreSQL persistence, Elasticsearch visibility, and health monitoring.","lastmod":"2026-02-22","levels":["intermediate"],"reading_time_minutes":4,"section":"knowledge","skills":["temporal-ha-deployment","production-temporal","kubernetes-resource-management"],"tags":["temporal","high-availability","kubernetes","helm","postgresql","elasticsearch","production"],"title":"Temporal High Availability: Multi-Component Cluster on Kubernetes","tools":["temporal","helm","kubectl","postgresql","elasticsearch"],"url":"https://agent-zone.ai/knowledge/workflow-orchestration/temporal-ha-cluster/","word_count":775}}