Builder Pool Naming: The (role, tier, replica) Coordinate Decouples Identity From Model

Builder Pool Naming: The (role, tier, replica) Coordinate#

Naming agent pools after the model they run today (kimi-N, deepseek-N, flash-N, lite-N) felt natural when each pool ran one model. It stopped feeling natural the third time a pool’s model churned — when the lite-tier swapped through qwen → gemma → gemini in six weeks and every rename cascaded through K8s manifests, secret names, MM bot accounts, Gitea identities, and helm values. The fix was to make pool names model-independent: builder-lite-0 runs whatever model the pool config says it runs today.

Platform Team Structure and Operating Model

Why the Operating Model Matters#

The platform team’s operating model determines whether the platform becomes a force multiplier or a bottleneck. A ticket-driven, gatekeeper-oriented team produces a platform developers route around. A product-oriented, self-service team produces a platform developers adopt voluntarily. Organizational structure shapes developer experience more than technology choices.

Team Topologies and Interaction Modes#

The Team Topologies framework (Skelton & Pais) defines four team types relevant to platform engineering:

Service Catalog Management and Design

Why a Service Catalog Exists#

A service catalog answers: “What do we have, who owns it, and what state is it in?” Without one, this information lives in tribal knowledge and stale wiki pages. When an incident hits at 3 AM, the on-call engineer needs to know who owns the failing service, what it depends on, and where to find the runbook. The catalog provides this in seconds.

The catalog is also the foundation for other platform capabilities. Golden paths register outputs in it. Scorecards evaluate catalog entities. Self-service workflows provision resources linked to catalog entries.

Service Decomposition Anti-Patterns

Service Decomposition Anti-Patterns#

Splitting a monolith into microservices is a common architectural goal. But bad decomposition creates systems that are harder to operate than the monolith they replaced. These anti-patterns are disturbingly common and often unrecognized until the team is deep in operational pain.

The Distributed Monolith#

The distributed monolith looks like microservices from the outside – separate repositories, separate deployments, separate CI pipelines – but behaves like a monolith at runtime. Services cannot be deployed independently because they are tightly coupled.

Crossplane for Platform Abstractions

What Crossplane Does#

Crossplane extends Kubernetes to provision and manage cloud infrastructure using the Kubernetes API. Instead of writing Terraform and running apply, you write Kubernetes manifests and kubectl apply them. Crossplane controllers reconcile the desired state with the actual cloud resources.

The real value is not replacing Terraform — it is building abstractions. Platform teams define custom resource types (like DatabaseClaim) that developers consume without knowing whether they are getting RDS, CloudSQL, or Azure Database. The composition layer maps the simple claim to the actual cloud resources.

Port vs Backstage: Developer Portal Comparison

Two Approaches to the Same Problem#

Both Port and Backstage solve the same core problem: giving developers a single interface to discover services, provision infrastructure, and understand the operational state of their systems. They take fundamentally different approaches to getting there.

Backstage is an open-source framework (CNCF Incubating) originally built by Spotify. You deploy and operate it yourself. It provides a plugin architecture and core primitives — you build the portal your organization needs by assembling and configuring plugins.

Self-Hosted CI Runners at Scale: GitHub Actions Runner Controller, GitLab Runners on K8s, and Autoscaling

Self-Hosted CI Runners at Scale#

GitHub-hosted and GitLab SaaS runners work until they do not. You hit limits when you need private network access to deploy to internal infrastructure, specific hardware like GPUs or ARM64 machines, compliance requirements that prohibit running code on shared infrastructure, or cost control when you are burning thousands of dollars per month on hosted runner minutes.

Self-hosted runners solve these problems but introduce operational complexity: you now own runner provisioning, scaling, security, image updates, and cost management.

An Autonomous PR-to-Deploy Loop: CI Gate, Dual Approval, Auto-Merge, Versioned Deploy

An Autonomous PR-to-Deploy Loop#

The goal: a contributor (human or agent) opens a PR; if it passes CI and gets the required approvals, it merges and deploys itself with no human clicking buttons. The loop:

PR → CI gate (required status) → N approvals → auto-merge → auto-tag → build image:<tag> → deploy (pin tag)

This is buildable on plain Jenkins/Gitea/Kubernetes (or GitHub/Actions/Argo equivalents). The pieces are independent; wire them in order.

Heterogeneous A/B/C/D Pool Dispatch: Real Model Comparison Without an Eval Harness

You need to know whether model-X is worth deploying for your real workload. The benchmarks suggest yes, but benchmarks are static and your workload is not. The standard answer — build an eval harness — runs into two structural problems: harnesses are expensive to build well, and they tend to over-fit to the inputs you remembered to include in the corpus, missing the real production failure modes you discover only later.

Cloud-Native vs Portable Infrastructure: A Decision Framework

Cloud-Native vs Portable Infrastructure#

Every infrastructure decision sits on a spectrum between portability and fidelity. On one end, you have generic Kubernetes running on minikube or kind – it works everywhere, costs nothing, and captures the behavior of the Kubernetes API itself. On the other end, you have cloud-native managed services – EKS with IRSA and ALB Ingress Controller, GKE with Workload Identity and Cloud Load Balancing, AKS with Azure AD Pod Identity and Azure Load Balancer. These capture the behavior of the actual platform your workloads will run on.