Schema Evolution and Compatibility #

Game-Day-Facilitation, Scenario-Design, Runbook-Validation, Resilience-Testing

Game-Day, Tabletop-Exercise, Fault-Injection, Disaster-Recovery, Resilience, Runbook-Validation, Sre

Chaos-Mesh, Litmus-Chaos, Gremlin, Pagerduty, Slack, Grafana

Why Run Exercises#

Runbooks that have never been tested are fiction. Failover procedures that have never been executed are hopes. Game days and tabletop exercises convert assumptions about system resilience into verified facts – or reveal that those assumptions were wrong before a real incident does.

The value is not just finding technical gaps. Exercises expose process gaps: unclear escalation paths, missing permissions, outdated contact lists, communication breakdowns between teams. These are invisible until a simulated failure forces people to actually follow the documented procedure.

Distributed Data Consistency Patterns

February 22, 2026

Microservices

Consistency-Pattern-Design, Outbox-Pattern-Implementation, Cdc-Setup, Conflict-Resolution-Strategy

Data-Consistency, Cap-Theorem, Eventual-Consistency, Outbox-Pattern, Cdc, Debezium, Two-Phase-Commit, Conflict-Resolution

Debezium, Kafka, Kafka-Connect, Postgresql, Mongodb

Distributed Data Consistency Patterns#

In a monolith with a single database, you get ACID transactions. In microservices, each service owns its database. Cross-service consistency requires explicit patterns because distributed ACID transactions are impractical in most real systems.

CAP Theorem: Practical Implications#

The CAP theorem states that a distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance. Since network partitions are inevitable, you choose between consistency and availability during partitions.

Reliability Review Process

Sre

Reliability-Assessment, Error-Budget-Review, Incident-Trend-Analysis, Risk-Assessment

Reliability-Review, Error-Budget, Incident-Trends, Dependency-Risk, Sre, Metrics-Review

Grafana, Prometheus, Datadog, Jira, Confluence, Pagerduty, Opsgenie

Why Regular Reviews Matter#

Reliability does not improve by accident. Without a structured review cadence, teams operate on vibes – “things feel okay” or “we’ve been having a lot of incidents lately.” Reliability reviews replace gut feelings with data. They surface slow-burning problems before they become outages, hold teams accountable for improvement actions, and create a shared understanding of system health across engineering and leadership.

Weekly Reliability Review#

The weekly review is a 30-minute tactical meeting focused on what happened this week and what needs attention next week. Attendees: on-call engineers, team leads, SRE. Keep it tight.

Multi-Stage Temporal Workflows: Activities, Child Workflows, and Error Propagation

February 22, 2026

Multi-Stage-Workflow-Design, Child-Workflow-Patterns, Error-Compensation, Timeout-Strategy

Temporal, Workflows, Child-Workflows, Error-Handling, Compensation, Timeouts, Retry-Policies

Temporal, Go

Multi-Stage Temporal Workflows#

The HelloWorkflow from Temporal Go Workflow Basics calls one activity and returns. Real workflows are not that simple. A deployment pipeline provisions infrastructure, configures networking, deploys the application, runs health checks, and updates DNS. Each step depends on the previous one. Any step can fail. Some failures require undoing earlier steps.

This article covers the patterns you need for production multi-stage workflows: sequential activities with data passing, retry policies, timeouts, child workflows, error propagation, and compensation.

Service Decomposition Anti-Patterns

February 22, 2026

Microservices

Service-Boundary-Design, Anti-Pattern-Detection, Decomposition-Evaluation, Service-Consolidation

Anti-Patterns, Distributed-Monolith, Nano-Services, Service-Decomposition, Shared-Database, Coupling, Migration, Service-Boundaries

Kubernetes, Istio, Jaeger

Service Decomposition Anti-Patterns#

Splitting a monolith into microservices is a common architectural goal. But bad decomposition creates systems that are harder to operate than the monolith they replaced. These anti-patterns are disturbingly common and often unrecognized until the team is deep in operational pain.

The Distributed Monolith#

The distributed monolith looks like microservices from the outside – separate repositories, separate deployments, separate CI pipelines – but behaves like a monolith at runtime. Services cannot be deployed independently because they are tightly coupled.

Temporal Workflow Example: Container Lifecycle Management with Docker

February 22, 2026

Container-Lifecycle-Management, Workflow-Compensation, Docker-Api-Integration, Snapshot-Management

Temporal, Docker, Container-Lifecycle, Workflow-Example, Compensation, Child-Workflow, Azure-Vm, Snapshots

Temporal, Go, Docker

Container Lifecycle Workflow#

This article builds a complete Temporal workflow that manages Docker container lifecycle operations: inspect a container, stop it if running, create a snapshot (commit), and handle failures by restarting the container. It demonstrates every pattern from Multi-Stage Temporal Workflows in a concrete, runnable example.

The full source is in the companion repo under container-lifecycle/.

The Use Case#

You need to automate container maintenance: take a snapshot of a running container for backup or migration purposes. The sequence is:

Temporal Signals: Human-in-the-Loop and Manual Approval Workflows

February 22, 2026

Temporal-Signal-Handling, Approval-Workflow-Design, Human-in-the-Loop-Patterns

Temporal, Signals, Human-in-the-Loop, Approval, Workflow-Communication, Timeouts

Temporal, Go

Temporal Signals#

Workflows often need input after they have started. A deployment workflow pauses for human approval. An expense workflow waits for a manager’s signature. An incident response workflow escalates after a timeout. Temporal signals are the mechanism for delivering external input to a running workflow.

A signal is a message sent to a workflow from outside – from another workflow, from a CLI command, from an HTTP endpoint, or from any system that has the Temporal client. The workflow receives the signal, processes it, and continues execution. Signals are durable: if the worker crashes after a signal is sent but before the workflow processes it, the signal is replayed when the worker restarts.

Temporal Signals for Automated Coordination: Locking, Blocking, and Cross-Workflow Communication

February 22, 2026

Distributed-Mutex-Design, Cross-Workflow-Signaling, Resource-Coordination, Signal-Based-Locking

Temporal, Signals, Distributed-Mutex, Locking, Cross-Workflow, Coordination, Automated-Signals

Temporal, Go

Temporal Signals for Automated Coordination#

In Temporal Signals for Manual Interaction, you learned how external systems and humans send signals to running workflows. Signals are not limited to human input. They are a general-purpose communication channel between workflows, and they become powerful coordination primitives when workflows signal each other programmatically.

This article covers automated signal patterns: cross-workflow signaling, distributed mutexes built on signals, blocking semantics, and the anti-patterns that will burn you.

Platform Engineering Maturity Model

February 22, 2026

Platform-Engineering