MCP Server Development: Building Servers from Scratch

MCP Server Development#

This reference covers building MCP servers from scratch – the server lifecycle, defining tools with proper JSON Schema, exposing resources, choosing transports, handling errors, and testing the result. If you want to understand when to use MCP versus alternatives, see the companion article on MCP Server Patterns. This article focuses on how to build one.

Server Lifecycle#

An MCP server goes through four phases: initialization, capability negotiation, operation, and shutdown.

Monolith to Microservices: When and How to Decompose

Monolith to Microservices#

The decision to break a monolith into microservices is one of the most consequential architectural choices a team makes. Get it right and you unlock independent deployment, team autonomy, and targeted scaling. Get it wrong and you trade a manageable monolith for a distributed monolith – all the complexity of microservices with none of the benefits.

When to Stay with a Monolith#

Microservices are not an upgrade from monoliths. They are a different set of tradeoffs. A well-structured monolith is the right choice in many situations.

Multi-Account Cloud Architecture with Terraform: AWS Organizations, Azure Management Groups, and GCP Organizations

Multi-Account Cloud Architecture with Terraform#

Single-account cloud deployments work for learning and prototypes. Production systems need multiple accounts (AWS), subscriptions (Azure), or projects (GCP) for isolation — security boundaries, blast radius control, billing separation, and compliance requirements.

Terraform manages multi-account architectures well, but the patterns differ significantly from single-account work. Provider configuration, state isolation, cross-account references, and IAM trust relationships all need explicit design.

Why Multiple Accounts#

ReasonSingle Account ProblemMulti-Account Solution
Blast radiusMisconfigured IAM affects everythingDamage limited to one account
BillingCannot attribute costs to teamsPer-account billing and budgets
CompliancePCI data mixed with dev workloadsSeparate accounts for regulated workloads
Service limitsVPC limit of 5 per region sharedEach account has its own limits
Access controlComplex IAM policies to isolate teamsAccount boundary is the strongest isolation
TestingDev resources can affect productionImpossible for dev to touch prod resources

AWS Organizations#

Organization Structure#

Organization Root
├── Core OU
│   ├── Management Account (billing, org management)
│   ├── Security Account (GuardDuty, SecurityHub, audit logs)
│   └── Networking Account (Transit Gateway, shared VPCs)
├── Workload OU
│   ├── Production OU
│   │   ├── App-A Production Account
│   │   └── App-B Production Account
│   └── Non-Production OU
│       ├── App-A Development Account
│       └── App-A Staging Account
└── Sandbox OU
    └── Developer Sandbox Accounts

Terraform for AWS Organizations#

resource "aws_organizations_organization" "main" {
  feature_set = "ALL"

  enabled_policy_types = [
    "SERVICE_CONTROL_POLICY",
    "TAG_POLICY",
  ]
}

resource "aws_organizations_organizational_unit" "core" {
  name      = "Core"
  parent_id = aws_organizations_organization.main.roots[0].id
}

resource "aws_organizations_organizational_unit" "workloads" {
  name      = "Workloads"
  parent_id = aws_organizations_organization.main.roots[0].id
}

resource "aws_organizations_organizational_unit" "production" {
  name      = "Production"
  parent_id = aws_organizations_organizational_unit.workloads.id
}

# Create a workload account
resource "aws_organizations_account" "app_production" {
  name      = "app-a-production"
  email     = "aws+app-a-prod@example.com"
  parent_id = aws_organizations_organizational_unit.production.id
  role_name = "OrganizationAccountAccessRole"  # cross-account admin role

  lifecycle {
    prevent_destroy = true  # accounts cannot be easily recreated
  }
}

Service Control Policies (SCPs)#

SCPs set permission boundaries for entire OUs:

Multi-Agent Coordination: Patterns for Dividing and Conquering Infrastructure Tasks

Multi-Agent Coordination#

A single agent can read files, call APIs, and reason about results. But some tasks are too broad, too slow, or too dangerous for one agent to handle alone. Debugging a production outage might require one agent analyzing logs, another checking infrastructure state, and a third reviewing recent deployments – simultaneously. Multi-agent coordination is how you split work across agents without them stepping on each other.

The hard part is not spawning multiple agents. The hard part is deciding which coordination pattern fits the task, how agents share information, and what happens when they disagree.

Multi-Cloud Networking Patterns

Multi-Cloud Networking Patterns#

Multi-cloud networking connects workloads across two or more cloud providers into a coherent network. The motivations vary – vendor redundancy, best-of-breed service selection, regulatory requirements – but the challenges are the same: private connectivity between isolated networks, consistent service discovery, and traffic routing that handles failures.

VPN Tunnels Between Clouds#

IPsec VPN tunnels are the simplest way to connect two cloud networks. Each provider offers managed VPN gateways that terminate IPsec tunnels, encrypting traffic between VPCs without exposing it to the public internet.

Multi-Cloud vs Single-Cloud Strategy Decisions

Multi-Cloud vs Single-Cloud Strategy#

Multi-cloud is one of the most oversold strategies in infrastructure. Vendors, consultants, and conference speakers promote it as the default approach, but the reality is that most organizations are better served by a single cloud provider used well. This framework helps you determine whether multi-cloud is actually worth the cost for your situation.

The Default Answer Is Single-Cloud#

Start with single-cloud unless you have a specific, concrete reason to go multi-cloud. Here is why.

Multi-Tenancy Patterns: Namespace Isolation, vCluster, and Dedicated Clusters

Multi-Tenancy Patterns: Namespace Isolation, vCluster, and Dedicated Clusters#

Multi-tenancy in Kubernetes means running workloads for multiple teams, customers, or environments on shared infrastructure. The core tension is always the same: sharing reduces cost, but isolation prevents blast radius. Choosing the wrong model creates security gaps or wastes money. This guide provides a framework for selecting the right approach and implementing it correctly.

The Three Models#

Every Kubernetes multi-tenancy approach falls into one of three categories, each with different isolation guarantees:

Post-Mortem Action Item Tracking

The Action Item Problem#

Post-mortem reviews produce action items. Teams agree on what needs to change. Then weeks pass, priorities shift, and items quietly decay into a backlog nobody checks. The next incident hits the same root cause, and the post-mortem produces the same action items again.

Studies of recurring incidents consistently show the root cause was identified in a previous post-mortem, and the corresponding action item was never completed. Action item tracking is the mechanism by which incidents make systems more reliable instead of just more documented.

Rate Limiting Implementation Patterns

Rate Limiting Implementation Patterns#

Rate limiting controls how many requests a client can make within a time period. It protects services from overload, ensures fair usage across clients, prevents abuse, and provides a mechanism for graceful degradation under load. Every production API needs rate limiting at some layer.

Algorithm Comparison#

Fixed Window#

The simplest algorithm. Divide time into fixed windows (e.g., 1-minute intervals) and count requests per window. When the count exceeds the limit, reject requests until the next window starts.

Refactoring Terraform: When and How to Restructure Growing Infrastructure Code

Refactoring Terraform#

Terraform configurations grow organically. A project starts with 10 resources in one directory. Six months later it has 80 resources, 3 levels of modules, and a state file that takes 2 minutes to plan. Changes feel risky because everything is interconnected. New team members (or agents) cannot understand the structure without reading every file.

Refactoring addresses this — but Terraform refactoring is harder than code refactoring because the state file maps resource addresses to real infrastructure. Rename a resource and Terraform thinks you want to destroy the old one and create a new one. Move a resource into a module and Terraform plans to recreate it. Every structural change requires corresponding state manipulation.