Terraform Core Concepts and Workflow

Providers, Resources, and Data Sources#

Terraform has three core object types. Providers are plugins that talk to APIs (AWS, Azure, GCP, Kubernetes, GitHub). Resources are the things you create and manage. Data sources read existing objects without managing them.

# providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  required_version = ">= 1.5.0"
}

provider "aws" {
  region = var.region
}
# A resource Terraform creates and manages
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  tags = { Name = "main-vpc" }
}

# A data source that reads an existing AMI
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type
  subnet_id     = aws_vpc.main.id
}

Resources create, update, and delete. Data sources only read. If you need information about something Terraform does not manage, use a data source.

Terraform Modules: Structure, Composition, and Reuse

What Modules Are#

A Terraform module is a directory containing .tf files. Every Terraform configuration is already a module (the “root module”). When you call another module from your root module, that is a “child module.” Modules let you encapsulate a set of resources behind a clean interface of input variables and outputs.

Module Structure#

A well-organized module looks like this:

modules/vpc/
  main.tf           # resource definitions
  variables.tf      # input variables
  outputs.tf        # output values
  versions.tf       # required providers and terraform version
  README.md         # usage documentation

The module itself has no backend, no provider configuration, and no hardcoded values. Everything configurable comes in through variables. Everything downstream consumers need comes out through outputs.

Terraform Safety for Agents: Plans, Applies, and the Human Approval Gate

Terraform Safety for Agents#

Terraform is the most dangerous tool most agents have access to. A single terraform apply can create, modify, or destroy real infrastructure — databases with production data, networking that carries live traffic, security groups that protect running services. There is no undo button. terraform destroy is not an undo — it is a different destructive action.

This article defines the safety protocols agents must follow when working with Terraform: what to check before every plan, how to read plan output for danger, how to present plans to humans, when to apply vs when to stop, and how to handle state conflicts.

Testing Infrastructure Code: The Validation Pyramid from Lint to Integration

Testing Infrastructure Code#

Infrastructure code has a unique testing challenge: the thing you are testing is expensive to instantiate. You cannot spin up a VPC, an RDS instance, and an EKS cluster for every pull request and tear it down 5 minutes later without significant cost and time. But you also cannot ship untested infrastructure changes to production without risk.

The solution is the same as in software engineering: a testing pyramid. Fast, cheap tests at the bottom catch most errors. Slower, expensive tests at the top catch the rest. The key is knowing what to test at which level.

Validation Path Selection: Choosing the Right Approach for Infrastructure Testing

Validation Path Selection#

Not every infrastructure change needs a full Kubernetes cluster to validate. Some changes can be verified with a linter in under a second. Others genuinely need a multi-node cluster with ingress, persistent volumes, and network policies. The cost of choosing wrong is real in both directions: too little validation lets broken configs reach production, while too much wastes minutes or hours on environments you did not need.

Choosing a Database Strategy: On Kubernetes vs Managed Service, and PostgreSQL vs MySQL vs CockroachDB

Choosing a Database Strategy#

Every Kubernetes-based platform eventually faces two questions: should the database run inside the cluster or as a managed service, and which database engine fits the workload? These decisions are difficult to reverse. A database migration is one of the highest-risk operations in production. Getting the initial decision roughly right saves months of future pain.

Where to Run: Kubernetes vs Managed Service#

This is not a technology question. It is an organizational question about who owns database operations and what tradeoffs the team will accept.

Choosing an Infrastructure as Code Tool: Terraform vs Pulumi vs CloudFormation/Bicep vs Crossplane

Choosing an Infrastructure as Code Tool#

Infrastructure as Code tools differ in language, state management, provider ecosystem, and operational model. The choice affects how your team writes, reviews, tests, and maintains infrastructure definitions for years. Switching IaC tools mid-project is possible but expensive – it typically means rewriting all definitions and carefully importing existing resources into the new tool’s state.

Decision Criteria#

Before comparing tools, establish what matters to your organization:

Cloud Networking Fundamentals: VPCs, Subnets, Security Groups, and Connectivity

VPC Concepts#

A Virtual Private Cloud is an isolated virtual network inside a cloud provider. Every resource you launch – EC2 instances, RDS databases, Lambda functions with VPC access – lives inside a VPC. The VPC defines an IP address range using CIDR notation, and all resources within it get addresses from that range.

The most common mistake is giving every VPC a /16 (65,536 addresses). This wastes IP space and causes problems later when you need to peer VPCs – overlapping CIDR blocks cannot be peered. Plan your IP allocation before building anything.

EKS vs AKS vs GKE: Choosing a Managed Kubernetes Provider

EKS vs AKS vs GKE: Choosing a Managed Kubernetes Provider#

All three major managed Kubernetes services run certified, conformant Kubernetes. The differences lie in networking models, identity integration, node management, upgrade experience, cost, and ecosystem strengths. Your choice should be driven by where the rest of your infrastructure lives, your team’s existing expertise, and specific feature requirements.

Feature Comparison#

Control Plane#

GKE has the most polished upgrade experience. Release channels (Rapid, Regular, Stable) provide automatic upgrades with configurable maintenance windows. Surge upgrades handle node pools with minimal disruption. Google invented Kubernetes, and GKE reflects that pedigree in control plane operations.

Load Balancer Patterns: L4 vs L7, Health Checks, Session Affinity, and Cloud LB Selection

L4 vs L7 Load Balancing#

The distinction between Layer 4 and Layer 7 load balancing determines what the load balancer can see and what routing decisions it can make.

Layer 4 (Transport) load balancers work at the TCP/UDP level. They see source/destination IPs and ports but not the content of the traffic. They forward raw TCP connections to backends. This makes them fast (no protocol parsing), protocol-agnostic (works for HTTP, gRPC, database connections, custom protocols), and transparent (the backend sees the original packets, mostly). Use L4 for database connections, raw TCP services, and when you need maximum throughput with minimum latency.