Platform Team Structure and Operating Model

Why the Operating Model Matters#

The platform team’s operating model determines whether the platform becomes a force multiplier or a bottleneck. A ticket-driven, gatekeeper-oriented team produces a platform developers route around. A product-oriented, self-service team produces a platform developers adopt voluntarily. Organizational structure shapes developer experience more than technology choices.

Team Topologies and Interaction Modes#

The Team Topologies framework (Skelton & Pais) defines four team types relevant to platform engineering:

Service Catalog Management and Design

Why a Service Catalog Exists#

A service catalog answers: “What do we have, who owns it, and what state is it in?” Without one, this information lives in tribal knowledge and stale wiki pages. When an incident hits at 3 AM, the on-call engineer needs to know who owns the failing service, what it depends on, and where to find the runbook. The catalog provides this in seconds.

The catalog is also the foundation for other platform capabilities. Golden paths register outputs in it. Scorecards evaluate catalog entities. Self-service workflows provision resources linked to catalog entries.

Service Decomposition Anti-Patterns

Service Decomposition Anti-Patterns#

Splitting a monolith into microservices is a common architectural goal. But bad decomposition creates systems that are harder to operate than the monolith they replaced. These anti-patterns are disturbingly common and often unrecognized until the team is deep in operational pain.

The Distributed Monolith#

The distributed monolith looks like microservices from the outside – separate repositories, separate deployments, separate CI pipelines – but behaves like a monolith at runtime. Services cannot be deployed independently because they are tightly coupled.

Crossplane for Platform Abstractions

What Crossplane Does#

Crossplane extends Kubernetes to provision and manage cloud infrastructure using the Kubernetes API. Instead of writing Terraform and running apply, you write Kubernetes manifests and kubectl apply them. Crossplane controllers reconcile the desired state with the actual cloud resources.

The real value is not replacing Terraform — it is building abstractions. Platform teams define custom resource types (like DatabaseClaim) that developers consume without knowing whether they are getting RDS, CloudSQL, or Azure Database. The composition layer maps the simple claim to the actual cloud resources.

Port vs Backstage: Developer Portal Comparison

Two Approaches to the Same Problem#

Both Port and Backstage solve the same core problem: giving developers a single interface to discover services, provision infrastructure, and understand the operational state of their systems. They take fundamentally different approaches to getting there.

Backstage is an open-source framework (CNCF Incubating) originally built by Spotify. You deploy and operate it yourself. It provides a plugin architecture and core primitives — you build the portal your organization needs by assembling and configuring plugins.

Self-Hosted CI Runners at Scale: GitHub Actions Runner Controller, GitLab Runners on K8s, and Autoscaling

Self-Hosted CI Runners at Scale#

GitHub-hosted and GitLab SaaS runners work until they do not. You hit limits when you need private network access to deploy to internal infrastructure, specific hardware like GPUs or ARM64 machines, compliance requirements that prohibit running code on shared infrastructure, or cost control when you are burning thousands of dollars per month on hosted runner minutes.

Self-hosted runners solve these problems but introduce operational complexity: you now own runner provisioning, scaling, security, image updates, and cost management.

Cloud-Native vs Portable Infrastructure: A Decision Framework

Cloud-Native vs Portable Infrastructure#

Every infrastructure decision sits on a spectrum between portability and fidelity. On one end, you have generic Kubernetes running on minikube or kind – it works everywhere, costs nothing, and captures the behavior of the Kubernetes API itself. On the other end, you have cloud-native managed services – EKS with IRSA and ALB Ingress Controller, GKE with Workload Identity and Cloud Load Balancing, AKS with Azure AD Pod Identity and Azure Load Balancer. These capture the behavior of the actual platform your workloads will run on.

Data Classification and Handling: Labeling, Encryption Tiers, Retention Policies, and DLP Patterns

Data Classification and Handling#

Data classification assigns sensitivity levels to data and maps those levels to specific handling requirements — who can access it, how it is encrypted, where it can be stored, how long it is retained, and how it is disposed of. Without classification, every piece of data gets the same (usually insufficient) protection, or security is applied inconsistently based on individual judgment.

Defining Classification Tiers#

Most organizations need four tiers. Fewer leads to overly broad categories. More leads to confusion about which tier applies.

Data Sovereignty and Residency: Jurisdictional Requirements, GDPR, and Multi-Region Architecture

Data Sovereignty and Residency#

Data sovereignty is the principle that data is subject to the laws of the country where it is stored or processed. Data residency is the requirement to keep data within a specific geographic boundary. These are not abstract legal concepts — they dictate where you deploy infrastructure, how you replicate data, and what services you can use.

Get this wrong and the consequences are regulatory fines, contract violations, and loss of customer trust. GDPR fines alone have exceeded billions of euros since enforcement began.

Designing Internal Developer Platforms

What an Internal Developer Platform Actually Is#

An Internal Developer Platform (IDP) is the set of tools, workflows, and self-service capabilities that a platform team builds and maintains so application developers can ship code without filing tickets or waiting on other teams. It is not a single product. It is a curated layer on top of your existing infrastructure that abstracts complexity while preserving the ability to go deeper when needed.