Multi-Stage Temporal Workflows: Activities, Child Workflows, and Error Propagation

Multi-Stage Temporal Workflows#

The HelloWorkflow from Temporal Go Workflow Basics calls one activity and returns. Real workflows are not that simple. A deployment pipeline provisions infrastructure, configures networking, deploys the application, runs health checks, and updates DNS. Each step depends on the previous one. Any step can fail. Some failures require undoing earlier steps.

This article covers the patterns you need for production multi-stage workflows: sequential activities with data passing, retry policies, timeouts, child workflows, error propagation, and compensation.

Temporal Workflow Example: Container Lifecycle Management with Docker

Container Lifecycle Workflow#

This article builds a complete Temporal workflow that manages Docker container lifecycle operations: inspect a container, stop it if running, create a snapshot (commit), and handle failures by restarting the container. It demonstrates every pattern from Multi-Stage Temporal Workflows in a concrete, runnable example.

The full source is in the companion repo under container-lifecycle/.

The Use Case#

You need to automate container maintenance: take a snapshot of a running container for backup or migration purposes. The sequence is:

Temporal Signals: Human-in-the-Loop and Manual Approval Workflows

Temporal Signals#

Workflows often need input after they have started. A deployment workflow pauses for human approval. An expense workflow waits for a manager’s signature. An incident response workflow escalates after a timeout. Temporal signals are the mechanism for delivering external input to a running workflow.

A signal is a message sent to a workflow from outside – from another workflow, from a CLI command, from an HTTP endpoint, or from any system that has the Temporal client. The workflow receives the signal, processes it, and continues execution. Signals are durable: if the worker crashes after a signal is sent but before the workflow processes it, the signal is replayed when the worker restarts.

Temporal Signals for Automated Coordination: Locking, Blocking, and Cross-Workflow Communication

Temporal Signals for Automated Coordination#

In Temporal Signals for Manual Interaction, you learned how external systems and humans send signals to running workflows. Signals are not limited to human input. They are a general-purpose communication channel between workflows, and they become powerful coordination primitives when workflows signal each other programmatically.

This article covers automated signal patterns: cross-workflow signaling, distributed mutexes built on signals, blocking semantics, and the anti-patterns that will burn you.

Temporal Cross-Cluster Communication: Architecture and Patterns

Temporal Cross-Cluster Communication#

When you operate multiple Temporal clusters – whether for regional deployment, compliance isolation, or blast radius reduction – workflows in one cluster eventually need to trigger work in another. This article examines three architectural approaches for cross-cluster communication, their tradeoffs, and guidance on choosing the right one for your situation.

This is an architecture guide. It establishes the concepts and patterns. The next article, Building a Worker Bridge, provides the full implementation.

Building a Temporal Worker Bridge: Cluster A Jobs Executed in Cluster B

Building a Temporal Worker Bridge#

The architecture article evaluated three cross-cluster communication patterns and identified the worker bridge as the best fit for most open-source Temporal deployments. This article builds the bridge.

The worker bridge is a single binary that holds connections to two Temporal clusters. It polls Cluster A for tasks on a dedicated queue and executes those tasks using Cluster B’s resources – its Temporal client, databases, APIs, and services. From Cluster A’s perspective, the bridge is just another worker. From Cluster B’s perspective, the bridge is just another client starting workflows.

Verifying LLM-Written SDK Code Against the Pinned Version: A Recipe Against Type Hallucination

An agent writes a 200-line streaming-client implementation against your project’s pinned SDK. It compiles cleanly in the model’s head. The test code references SomeStreamEvent, the streaming function signature is func NewStreaming(ctx, params) (stream, error), and the iteration loop uses stream.Recv(). The reviewer skims it, sees plausible naming, approves. CI fails with “undefined: SomeStreamEvent”. The agent escalates: “the SDK is broken — package not found.” Hours later, somebody figures out that the SDK they’re pinned to has none of those symbols. The import path is different. The function returns one value not two. The iteration pattern is Next() / Current() / Err(), not Recv(). The model invented the API.

Building LLM Harnesses: Orchestrating Local Models into Workflows with Scoring, Retries, and Parallel Execution

Building LLM Harnesses#

A harness is the infrastructure that wraps LLM calls into a reliable, testable, and observable workflow. It handles the concerns that a raw API call does not: input preparation, output validation, error recovery, model routing, parallel execution, and quality scoring. Without a harness, you have a script. With one, you have a tool.

Harness Architecture#

Input
  │
  ├── Preprocessing (validate input, select model, prepare prompt)
  │
  ├── Execution (call Ollama with timeout, retry on failure)
  │
  ├── Post-processing (parse output, validate schema, score quality)
  │
  ├── Routing (if quality too low, escalate to larger model or flag)
  │
  └── Output (structured result + metadata)

Core Harness in Python#

import ollama
import json
import time
from dataclasses import dataclass, field
from typing import Any, Callable

@dataclass
class LLMResult:
    content: str
    model: str
    tokens_in: int
    tokens_out: int
    duration_ms: int
    ttft_ms: int
    success: bool
    retries: int = 0
    score: float | None = None
    metadata: dict = field(default_factory=dict)

@dataclass
class HarnessConfig:
    model: str = "qwen2.5-coder:7b"
    temperature: float = 0.0
    max_tokens: int = 1024
    json_mode: bool = False
    timeout_seconds: int = 120
    max_retries: int = 2
    retry_delay_seconds: float = 1.0

def call_llm(
    messages: list[dict],
    config: HarnessConfig,
) -> LLMResult:
    """Make a single LLM call with timing metadata."""
    start = time.monotonic()

    kwargs = {
        "model": config.model,
        "messages": messages,
        "options": {
            "temperature": config.temperature,
            "num_predict": config.max_tokens,
        },
        "stream": False,
    }
    if config.json_mode:
        kwargs["format"] = "json"

    try:
        response = ollama.chat(**kwargs)
        duration = int((time.monotonic() - start) * 1000)

        return LLMResult(
            content=response["message"]["content"],
            model=config.model,
            tokens_in=response.get("prompt_eval_count", 0),
            tokens_out=response.get("eval_count", 0),
            duration_ms=duration,
            ttft_ms=int(response.get("prompt_eval_duration", 0) / 1_000_000),
            success=True,
        )
    except Exception as e:
        duration = int((time.monotonic() - start) * 1000)
        return LLMResult(
            content=str(e),
            model=config.model,
            tokens_in=0,
            tokens_out=0,
            duration_ms=duration,
            ttft_ms=0,
            success=False,
        )

Retry with Validation#

Do not retry blindly. Retry only when the output fails validation:

FIPS 140 Compliance: Validated Cryptography, FIPS-Enabled Runtimes, and Kubernetes Deployment

FIPS 140 Compliance#

FIPS 140 (Federal Information Processing Standard 140) is a US and Canadian government standard for cryptographic modules. If you sell software to US federal agencies, process federal data, or operate under FedRAMP, you must use FIPS 140-validated cryptographic modules. Many regulated industries (finance, healthcare, defense) also require or strongly prefer FIPS compliance.

FIPS 140 does not tell you which algorithms to use — it validates that a specific implementation of those algorithms has been tested and certified by an accredited lab (CMVP — Cryptographic Module Validation Program).

Rate Limiting Implementation Patterns

Rate Limiting Implementation Patterns#

Rate limiting controls how many requests a client can make within a time period. It protects services from overload, ensures fair usage across clients, prevents abuse, and provides a mechanism for graceful degradation under load. Every production API needs rate limiting at some layer.

Algorithm Comparison#

Fixed Window#

The simplest algorithm. Divide time into fixed windows (e.g., 1-minute intervals) and count requests per window. When the count exceeds the limit, reject requests until the next window starts.