Terraform Workspaces vs Directories: Choosing an Environment Isolation Strategy

Terraform Workspaces vs Directories#

You have dev, staging, and production. Same infrastructure, different sizes and settings. How do you organize Terraform? Two competing approaches:

  • Workspaces: One set of .tf files, multiple state files selected by terraform workspace select
  • Directories: Separate directories per environment, each with their own .tf files and state

Both work. Both have tradeoffs. The right choice depends on how similar your environments are, who manages them, and how much isolation you need.

Testing Infrastructure Code: The Validation Pyramid from Lint to Integration

Testing Infrastructure Code#

Infrastructure code has a unique testing challenge: the thing you are testing is expensive to instantiate. You cannot spin up a VPC, an RDS instance, and an EKS cluster for every pull request and tear it down 5 minutes later without significant cost and time. But you also cannot ship untested infrastructure changes to production without risk.

The solution is the same as in software engineering: a testing pyramid. Fast, cheap tests at the bottom catch most errors. Slower, expensive tests at the top catch the rest. The key is knowing what to test at which level.

TLS Certificate Lifecycle Management

Certificate Basics#

A TLS certificate binds a public key to a domain name. The certificate is signed by a Certificate Authority (CA) that browsers and operating systems trust. The chain goes: your certificate, signed by an intermediate CA, signed by a root CA. All three must be present and valid for a client to trust the connection.

Self-Signed Certificates for Development#

For local development and testing, generate a self-signed certificate. Clients will not trust it by default, but you can add it to your local trust store.

Ansible Role Structure and Patterns

Standard Role Directory Structure#

An Ansible role is a directory with a fixed layout. Each subdirectory serves a specific purpose:

roles/
  webserver/
    tasks/
      main.yml          # Entry point — task list
    handlers/
      main.yml          # Service restart/reload triggers
    templates/
      nginx.conf.j2     # Jinja2 templates
    files/
      index.html        # Static files copied as-is
    vars/
      main.yml          # Internal variables (high precedence)
    defaults/
      main.yml          # Default variables (low precedence, meant to be overridden)
    meta/
      main.yml          # Role metadata, dependencies

Generate a skeleton with:

Choosing a Kubernetes Backup Strategy: Velero vs Native Snapshots vs Application-Level Backups

Choosing a Kubernetes Backup Strategy#

Kubernetes clusters contain two fundamentally different types of state: cluster state (the Kubernetes objects themselves – Deployments, Services, ConfigMaps, Secrets, CRDs) and application data (the contents of Persistent Volumes). A complete backup strategy must address both. Most backup failures happen because teams back up one but not the other, or because they never test the restore process.

What Needs Backing Up#

Before choosing tools, inventory what your cluster contains:

Choosing an Infrastructure as Code Tool: Terraform vs Pulumi vs CloudFormation/Bicep vs Crossplane

Choosing an Infrastructure as Code Tool#

Infrastructure as Code tools differ in language, state management, provider ecosystem, and operational model. The choice affects how your team writes, reviews, tests, and maintains infrastructure definitions for years. Switching IaC tools mid-project is possible but expensive – it typically means rewriting all definitions and carefully importing existing resources into the new tool’s state.

Decision Criteria#

Before comparing tools, establish what matters to your organization:

Cloud Networking Fundamentals: VPCs, Subnets, Security Groups, and Connectivity

VPC Concepts#

A Virtual Private Cloud is an isolated virtual network inside a cloud provider. Every resource you launch – EC2 instances, RDS databases, Lambda functions with VPC access – lives inside a VPC. The VPC defines an IP address range using CIDR notation, and all resources within it get addresses from that range.

The most common mistake is giving every VPC a /16 (65,536 addresses). This wastes IP space and causes problems later when you need to peer VPCs – overlapping CIDR blocks cannot be peered. Plan your IP allocation before building anything.

DNS Deep Dive: Record Types, Resolution, Troubleshooting, and Cloud DNS Management

How DNS Resolution Works#

When a client requests api.example.com, the resolution follows a chain of queries. The client asks its configured recursive resolver (often the ISP’s, or a public one like 8.8.8.8). The recursive resolver does the heavy lifting: it asks a root name server for .com, the .com TLD server for example.com, and the authoritative name server for example.com returns the answer for api.example.com. Each level caches the result according to the record’s TTL, so subsequent requests short-circuit the chain.

Linux Performance Tuning: sysctl, ulimits, I/O Schedulers, and Kernel Parameters

sysctl: Kernel Parameter Tuning#

The sysctl interface exposes kernel parameters that control how Linux manages memory, networking, file systems, and processes. Changes take effect immediately but are lost on reboot unless persisted.

Memory Parameters#

# Reduce swap aggressiveness (default is 60, range 0-100)
# Lower values make the kernel prefer reclaiming page cache over swapping
# Set to 10 for database servers -- swapping destroys database performance
sysctl -w vm.swappiness=10

# Overcommit behavior
# 0 = heuristic overcommit (default, kernel estimates if there is enough memory)
# 1 = always overcommit (never refuse malloc -- dangerous but used by Redis)
# 2 = strict overcommit (never allocate more than swap + ratio*physical)
sysctl -w vm.overcommit_memory=0

The vm.swappiness parameter is one of the most impactful settings for database servers. The default of 60 means the kernel will fairly aggressively swap application memory to disk in favor of filesystem cache. For databases that manage their own caching (PostgreSQL shared_buffers, MySQL innodb_buffer_pool), this is counterproductive – the database’s carefully managed cache gets swapped out to make room for OS-level cache the database does not use.

Linux Troubleshooting: A Systematic Approach to Diagnosing System Issues

The USE Method: A Framework for Systematic Diagnosis#

The USE method, developed by Brendan Gregg, provides a structured approach to system performance analysis. For every resource on the system – CPU, memory, disk, network – you check three things:

  • Utilization: How busy is the resource? (e.g., CPU at 90%)
  • Saturation: Is work queuing because the resource is overloaded? (e.g., CPU run queue length)
  • Errors: Are there error events? (e.g., disk I/O errors, network packet drops)

This method prevents the common trap of randomly checking things. Instead, you systematically walk through each resource and check all three dimensions. If you find high utilization, saturation, or errors on a resource, you have found your bottleneck.