The Five-Agent Research Pattern: Surveying a New LLM Provider Before You Tune It

The Five-Agent Research Pattern#

Adopting a new LLM provider for a coding-agent role looks easy from the docs. Read the model card, copy the partner adapter’s defaults, ship. A week later you find out the provider rejects tool_choice=required in thinking mode, the docs lied about reasoning_content echoing, and your retry loop multiplies the per-turn timeout by 3x because the rate-limit response isn’t JSON.

The docs miss what was patched after release. The community catches what the docs miss. Partner adapters encode lived defaults nobody published. Your own adapter has bugs you can’t see from inside it. Reading any one of these in isolation gets you to “I think I understand this provider.” Reading all five in parallel gets you a knob list, an open-contradictions list, and a list of bugs to fix before the matrix runs. The pattern: spawn 5 parallel research sub-agents, one per angle, then synthesize.

Buildkite Pipeline Patterns: Dynamic Pipelines, Agents, Plugins, and Parallel Builds

Buildkite Pipeline Patterns#

Buildkite splits CI/CD into two parts: a hosted web service that manages pipelines, builds, and the UI, and self-hosted agents that execute the actual work. This architecture means your code, secrets, and build artifacts never touch Buildkite’s infrastructure. The agents run on your machines – EC2 instances, Kubernetes pods, bare metal, laptops.

Why Teams Choose Buildkite#

The question usually comes up against Jenkins and GitHub Actions.

Over Jenkins: Buildkite eliminates the Jenkins controller as a single point of failure. There is no plugin compatibility hell, no Groovy DSL, no Java memory tuning. Agents are stateless binaries that poll for work. Scaling is adding more agents. Jenkins requires careful capacity planning of the controller itself.

CQRS and Event Sourcing

CQRS and Event Sourcing#

CQRS (Command Query Responsibility Segregation) and event sourcing are frequently discussed together but are independent patterns. You can use either one without the other. Understanding each separately is essential before combining them.

CQRS: Separate Read and Write Models#

Most applications use the same data model for reads and writes. The same orders table serves the API that creates orders and the API that lists them. This works until read and write requirements diverge significantly.

Platform Team Structure and Operating Model

Why the Operating Model Matters#

The platform team’s operating model determines whether the platform becomes a force multiplier or a bottleneck. A ticket-driven, gatekeeper-oriented team produces a platform developers route around. A product-oriented, self-service team produces a platform developers adopt voluntarily. Organizational structure shapes developer experience more than technology choices.

Team Topologies and Interaction Modes#

The Team Topologies framework (Skelton & Pais) defines four team types relevant to platform engineering:

Production Readiness Reviews

Why Services Need a Gate Before Production#

Every production outage caused by a service that launched without monitoring, without runbooks, without capacity planning, without anyone knowing who owns it at 3 AM – every one of those was preventable. A production readiness review is the gate between “it works on my machine” and “it is ready for real users.” Google formalized this as the PRR process. You do not need Google-scale infrastructure to benefit from it.

Azure DevOps Pipelines: YAML Pipelines, Templates, Service Connections, and AKS Integration

Azure DevOps Pipelines#

Azure DevOps Pipelines uses YAML files stored in your repository to define build and deployment workflows. The pipeline model has three levels: stages contain jobs, jobs contain steps. This hierarchy maps directly to how you think about CI/CD – build stage, test stage, deploy-to-staging stage, deploy-to-production stage – with each stage containing one or more parallel jobs.

Pipeline Structure#

A complete pipeline in azure-pipelines.yml:

trigger:
  branches:
    include:
      - main
      - release/*
  paths:
    exclude:
      - docs/**
      - README.md

pool:
  vmImage: 'ubuntu-latest'

variables:
  - group: common-vars
  - name: buildConfiguration
    value: 'Release'

stages:
  - stage: Build
    jobs:
      - job: BuildApp
        steps:
          - task: GoTool@0
            inputs:
              version: '1.22'
          - script: |
              go build -o $(Build.ArtifactStagingDirectory)/myapp ./cmd/myapp
            displayName: 'Build binary'
          - publish: $(Build.ArtifactStagingDirectory)
            artifact: drop

  - stage: Test
    dependsOn: Build
    jobs:
      - job: UnitTests
        steps:
          - task: GoTool@0
            inputs:
              version: '1.22'
          - script: go test ./... -v -coverprofile=coverage.out
            displayName: 'Run tests'
          - task: PublishCodeCoverageResults@2
            inputs:
              summaryFileLocation: coverage.out
              codecoverageTool: 'Cobertura'

  - stage: DeployStaging
    dependsOn: Test
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - deployment: DeployToStaging
        environment: staging
        strategy:
          runOnce:
            deploy:
              steps:
                - download: current
                  artifact: drop
                - script: echo "Deploying to staging"

trigger controls which branches and paths trigger the pipeline. dependsOn creates stage ordering. condition adds logic – succeeded() checks the previous stage passed, and you can combine it with variable checks to restrict certain stages to specific branches.

Contract Testing for Microservices

Contract Testing for Microservices#

Integration tests verify that services work together, but they require all services to be running. In a system with 20 services, running full integration tests is slow, brittle, and blocks deployments. Contract testing solves this by verifying service interactions independently – each service tests against a contract instead of against a live instance of every dependency.

The Problem with Integration and E2E Tests#

A traditional integration test for “order service calls payment service” requires both services running, along with their databases, message brokers, and any other dependencies. Problems:

Service Catalog Management and Design

Why a Service Catalog Exists#

A service catalog answers: “What do we have, who owns it, and what state is it in?” Without one, this information lives in tribal knowledge and stale wiki pages. When an incident hits at 3 AM, the on-call engineer needs to know who owns the failing service, what it depends on, and where to find the runbook. The catalog provides this in seconds.

The catalog is also the foundation for other platform capabilities. Golden paths register outputs in it. Scorecards evaluate catalog entities. Self-service workflows provision resources linked to catalog entries.

SLO Practical Implementation Guide

From Theory to Running SLOs#

Every SRE resource explains what SLOs are. Few explain how to actually implement them from scratch – the Prometheus queries, the error budget math, the alerting rules, and the conversations with product managers when the budget runs out. This guide covers all of it.

Step 1: Choose Your SLIs#

SLIs must measure what users experience. Internal metrics like CPU usage or queue depth are useful for debugging but are not SLIs because users do not care about your CPU – they care whether the page loaded.

AWS CodePipeline and CodeBuild: Pipeline Structure, ECR Integration, ECS/EKS Deployments, and Cross-Account Patterns

AWS CodePipeline and CodeBuild#

AWS CodePipeline orchestrates CI/CD workflows as a series of stages. CodeBuild executes the actual build and test commands. Together they provide a fully managed pipeline that integrates natively with S3, ECR, ECS, EKS, Lambda, and CloudFormation. No servers to manage, no agents to maintain – but the trade-off is less flexibility than self-hosted systems and tighter coupling to the AWS ecosystem.

Pipeline Structure#

A CodePipeline has stages, and each stage has actions. Actions can run in parallel or sequentially within a stage. The most common pattern is Source -> Build -> Deploy: