Contract Testing for Microservices

Contract Testing for Microservices#

Integration tests verify that services work together, but they require all services to be running. In a system with 20 services, running full integration tests is slow, brittle, and blocks deployments. Contract testing solves this by verifying service interactions independently – each service tests against a contract instead of against a live instance of every dependency.

The Problem with Integration and E2E Tests#

A traditional integration test for “order service calls payment service” requires both services running, along with their databases, message brokers, and any other dependencies. Problems:

Service Catalog Management and Design

Why a Service Catalog Exists#

A service catalog answers: “What do we have, who owns it, and what state is it in?” Without one, this information lives in tribal knowledge and stale wiki pages. When an incident hits at 3 AM, the on-call engineer needs to know who owns the failing service, what it depends on, and where to find the runbook. The catalog provides this in seconds.

The catalog is also the foundation for other platform capabilities. Golden paths register outputs in it. Scorecards evaluate catalog entities. Self-service workflows provision resources linked to catalog entries.

SLO Practical Implementation Guide

Sre

From Theory to Running SLOs#

Every SRE resource explains what SLOs are. Few explain how to actually implement them from scratch – the Prometheus queries, the error budget math, the alerting rules, and the conversations with product managers when the budget runs out. This guide covers all of it.

Step 1: Choose Your SLIs#

SLIs must measure what users experience. Internal metrics like CPU usage or queue depth are useful for debugging but are not SLIs because users do not care about your CPU – they care whether the page loaded.

AWS CodePipeline and CodeBuild: Pipeline Structure, ECR Integration, ECS/EKS Deployments, and Cross-Account Patterns

AWS CodePipeline and CodeBuild#

AWS CodePipeline orchestrates CI/CD workflows as a series of stages. CodeBuild executes the actual build and test commands. Together they provide a fully managed pipeline that integrates natively with S3, ECR, ECS, EKS, Lambda, and CloudFormation. No servers to manage, no agents to maintain – but the trade-off is less flexibility than self-hosted systems and tighter coupling to the AWS ecosystem.

Pipeline Structure#

A CodePipeline has stages, and each stage has actions. Actions can run in parallel or sequentially within a stage. The most common pattern is Source -> Build -> Deploy:

Developer Self-Service Workflows

The Cost of Not Having Self-Service#

A developer needs a PostgreSQL database. They file a ticket. It sits in a backlog for two days. A DBA provisions it, sends credentials via Slack DM. Elapsed time: 3 days. Actual need: 5 minutes of configuration. Multiply across every database, cache, queue, and namespace, and manual provisioning becomes the single largest drag on velocity. Self-service lets developers provision pre-approved resources directly, within guardrails the platform team defines.

Schema Evolution and Compatibility

Schema Evolution and Compatibility#

Every service contract changes over time. New fields get added, old fields get removed, types change. In a monolith, you update the schema and redeploy. In microservices, producers and consumers deploy independently. A schema change that breaks consumers causes production failures. Schema evolution rules and tooling exist to prevent this.

Compatibility Modes#

There are four compatibility modes. Understanding them is essential for operating any schema registry.

Game Day and Tabletop Exercise Planning

Sre

Why Run Exercises#

Runbooks that have never been tested are fiction. Failover procedures that have never been executed are hopes. Game days and tabletop exercises convert assumptions about system resilience into verified facts – or reveal that those assumptions were wrong before a real incident does.

The value is not just finding technical gaps. Exercises expose process gaps: unclear escalation paths, missing permissions, outdated contact lists, communication breakdowns between teams. These are invisible until a simulated failure forces people to actually follow the documented procedure.

Distributed Data Consistency Patterns

Distributed Data Consistency Patterns#

In a monolith with a single database, you get ACID transactions. In microservices, each service owns its database. Cross-service consistency requires explicit patterns because distributed ACID transactions are impractical in most real systems.

CAP Theorem: Practical Implications#

The CAP theorem states that a distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance. Since network partitions are inevitable, you choose between consistency and availability during partitions.

Reliability Review Process

Sre

Why Regular Reviews Matter#

Reliability does not improve by accident. Without a structured review cadence, teams operate on vibes – “things feel okay” or “we’ve been having a lot of incidents lately.” Reliability reviews replace gut feelings with data. They surface slow-burning problems before they become outages, hold teams accountable for improvement actions, and create a shared understanding of system health across engineering and leadership.

Weekly Reliability Review#

The weekly review is a 30-minute tactical meeting focused on what happened this week and what needs attention next week. Attendees: on-call engineers, team leads, SRE. Keep it tight.

Service Decomposition Anti-Patterns

Service Decomposition Anti-Patterns#

Splitting a monolith into microservices is a common architectural goal. But bad decomposition creates systems that are harder to operate than the monolith they replaced. These anti-patterns are disturbingly common and often unrecognized until the team is deep in operational pain.

The Distributed Monolith#

The distributed monolith looks like microservices from the outside – separate repositories, separate deployments, separate CI pipelines – but behaves like a monolith at runtime. Services cannot be deployed independently because they are tightly coupled.