Saga Pattern: Choreography, Orchestration, and Compensating Transactions

Saga Pattern#

In a monolith, a single database transaction can span multiple operations atomically. In microservices, each service owns its database. There is no distributed transaction that works reliably across services. The saga pattern solves this by breaking a transaction into a sequence of local transactions, each with a corresponding compensating transaction that undoes its work if a later step fails.

The Problem: No Distributed ACID#

Consider an order placement that must: (1) reserve inventory, (2) charge payment, (3) create shipment. In a monolith, this is one transaction. In microservices, these are three services with three databases. Two-phase commit (2PC) across these is fragile, slow, and most message brokers and modern databases do not support it across service boundaries.

Building LLM Harnesses: Orchestrating Local Models into Workflows with Scoring, Retries, and Parallel Execution

Building LLM Harnesses#

A harness is the infrastructure that wraps LLM calls into a reliable, testable, and observable workflow. It handles the concerns that a raw API call does not: input preparation, output validation, error recovery, model routing, parallel execution, and quality scoring. Without a harness, you have a script. With one, you have a tool.

Harness Architecture#

Input
  │
  ├── Preprocessing (validate input, select model, prepare prompt)
  │
  ├── Execution (call Ollama with timeout, retry on failure)
  │
  ├── Post-processing (parse output, validate schema, score quality)
  │
  ├── Routing (if quality too low, escalate to larger model or flag)
  │
  └── Output (structured result + metadata)

Core Harness in Python#

import ollama
import json
import time
from dataclasses import dataclass, field
from typing import Any, Callable

@dataclass
class LLMResult:
    content: str
    model: str
    tokens_in: int
    tokens_out: int
    duration_ms: int
    ttft_ms: int
    success: bool
    retries: int = 0
    score: float | None = None
    metadata: dict = field(default_factory=dict)

@dataclass
class HarnessConfig:
    model: str = "qwen2.5-coder:7b"
    temperature: float = 0.0
    max_tokens: int = 1024
    json_mode: bool = False
    timeout_seconds: int = 120
    max_retries: int = 2
    retry_delay_seconds: float = 1.0

def call_llm(
    messages: list[dict],
    config: HarnessConfig,
) -> LLMResult:
    """Make a single LLM call with timing metadata."""
    start = time.monotonic()

    kwargs = {
        "model": config.model,
        "messages": messages,
        "options": {
            "temperature": config.temperature,
            "num_predict": config.max_tokens,
        },
        "stream": False,
    }
    if config.json_mode:
        kwargs["format"] = "json"

    try:
        response = ollama.chat(**kwargs)
        duration = int((time.monotonic() - start) * 1000)

        return LLMResult(
            content=response["message"]["content"],
            model=config.model,
            tokens_in=response.get("prompt_eval_count", 0),
            tokens_out=response.get("eval_count", 0),
            duration_ms=duration,
            ttft_ms=int(response.get("prompt_eval_duration", 0) / 1_000_000),
            success=True,
        )
    except Exception as e:
        duration = int((time.monotonic() - start) * 1000)
        return LLMResult(
            content=str(e),
            model=config.model,
            tokens_in=0,
            tokens_out=0,
            duration_ms=duration,
            ttft_ms=0,
            success=False,
        )

Retry with Validation#

Do not retry blindly. Retry only when the output fails validation:

Multi-Agent Coordination: Patterns for Dividing and Conquering Infrastructure Tasks

Multi-Agent Coordination#

A single agent can read files, call APIs, and reason about results. But some tasks are too broad, too slow, or too dangerous for one agent to handle alone. Debugging a production outage might require one agent analyzing logs, another checking infrastructure state, and a third reviewing recent deployments – simultaneously. Multi-agent coordination is how you split work across agents without them stepping on each other.

The hard part is not spawning multiple agents. The hard part is deciding which coordination pattern fits the task, how agents share information, and what happens when they disagree.

Tool Use Patterns: Choosing, Chaining, and Validating Agent Tools

Tool Use Patterns#

An agent with access to 30 tools is not automatically more capable than one with 5. What matters is how it selects, sequences, and validates tool use. Poor tool use wastes tokens, introduces latency, and produces wrong results that look right.

Choosing the Right Tool#

When multiple tools could handle a task, the agent must pick the best one. This is harder than it sounds because tool descriptions are imperfect and tasks are ambiguous.