Schema Evolution and Compatibility

Schema Evolution and Compatibility#

Every service contract changes over time. New fields get added, old fields get removed, types change. In a monolith, you update the schema and redeploy. In microservices, producers and consumers deploy independently. A schema change that breaks consumers causes production failures. Schema evolution rules and tooling exist to prevent this.

Compatibility Modes#

There are four compatibility modes. Understanding them is essential for operating any schema registry.

Game Day and Tabletop Exercise Planning

Why Run Exercises#

Runbooks that have never been tested are fiction. Failover procedures that have never been executed are hopes. Game days and tabletop exercises convert assumptions about system resilience into verified facts – or reveal that those assumptions were wrong before a real incident does.

The value is not just finding technical gaps. Exercises expose process gaps: unclear escalation paths, missing permissions, outdated contact lists, communication breakdowns between teams. These are invisible until a simulated failure forces people to actually follow the documented procedure.

Distributed Data Consistency Patterns

Distributed Data Consistency Patterns#

In a monolith with a single database, you get ACID transactions. In microservices, each service owns its database. Cross-service consistency requires explicit patterns because distributed ACID transactions are impractical in most real systems.

CAP Theorem: Practical Implications#

The CAP theorem states that a distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance. Since network partitions are inevitable, you choose between consistency and availability during partitions.

Reliability Review Process

Why Regular Reviews Matter#

Reliability does not improve by accident. Without a structured review cadence, teams operate on vibes – “things feel okay” or “we’ve been having a lot of incidents lately.” Reliability reviews replace gut feelings with data. They surface slow-burning problems before they become outages, hold teams accountable for improvement actions, and create a shared understanding of system health across engineering and leadership.

Weekly Reliability Review#

The weekly review is a 30-minute tactical meeting focused on what happened this week and what needs attention next week. Attendees: on-call engineers, team leads, SRE. Keep it tight.

Multi-Stage Temporal Workflows: Activities, Child Workflows, and Error Propagation

Multi-Stage Temporal Workflows#

The HelloWorkflow from Temporal Go Workflow Basics calls one activity and returns. Real workflows are not that simple. A deployment pipeline provisions infrastructure, configures networking, deploys the application, runs health checks, and updates DNS. Each step depends on the previous one. Any step can fail. Some failures require undoing earlier steps.

This article covers the patterns you need for production multi-stage workflows: sequential activities with data passing, retry policies, timeouts, child workflows, error propagation, and compensation.

Service Decomposition Anti-Patterns

Service Decomposition Anti-Patterns#

Splitting a monolith into microservices is a common architectural goal. But bad decomposition creates systems that are harder to operate than the monolith they replaced. These anti-patterns are disturbingly common and often unrecognized until the team is deep in operational pain.

The Distributed Monolith#

The distributed monolith looks like microservices from the outside – separate repositories, separate deployments, separate CI pipelines – but behaves like a monolith at runtime. Services cannot be deployed independently because they are tightly coupled.

Temporal Workflow Example: Container Lifecycle Management with Docker

Container Lifecycle Workflow#

This article builds a complete Temporal workflow that manages Docker container lifecycle operations: inspect a container, stop it if running, create a snapshot (commit), and handle failures by restarting the container. It demonstrates every pattern from Multi-Stage Temporal Workflows in a concrete, runnable example.

The full source is in the companion repo under container-lifecycle/.

The Use Case#

You need to automate container maintenance: take a snapshot of a running container for backup or migration purposes. The sequence is:

Temporal Signals: Human-in-the-Loop and Manual Approval Workflows

Temporal Signals#

Workflows often need input after they have started. A deployment workflow pauses for human approval. An expense workflow waits for a manager’s signature. An incident response workflow escalates after a timeout. Temporal signals are the mechanism for delivering external input to a running workflow.

A signal is a message sent to a workflow from outside – from another workflow, from a CLI command, from an HTTP endpoint, or from any system that has the Temporal client. The workflow receives the signal, processes it, and continues execution. Signals are durable: if the worker crashes after a signal is sent but before the workflow processes it, the signal is replayed when the worker restarts.

Temporal Signals for Automated Coordination: Locking, Blocking, and Cross-Workflow Communication

Temporal Signals for Automated Coordination#

In Temporal Signals for Manual Interaction, you learned how external systems and humans send signals to running workflows. Signals are not limited to human input. They are a general-purpose communication channel between workflows, and they become powerful coordination primitives when workflows signal each other programmatically.

This article covers automated signal patterns: cross-workflow signaling, distributed mutexes built on signals, blocking semantics, and the anti-patterns that will burn you.

Platform Engineering Maturity Model

Why a Maturity Model#

Platform engineering investments fail when organizations skip levels. A team that cannot maintain shared Terraform modules reliably has no business building a self-service portal. The maturity model provides an honest assessment of where you are and what must be true before advancing.

This is not a five-year roadmap. Some organizations reach Level 2 and stay there — it serves their needs. The model helps you identify what level you need, what level you are at, and what is blocking progress.