Agent Context Preservation for Long-Running Workflows: Checkpoints, Sub-Agent Delegation, and Avoiding Context Pollution

February 22, 2026

Context-Preservation, Checkpoint-Design, Sub-Agent-Delegation, Context-Scoping, Workflow-State-Management

Context-Management, Context-Preservation, Checkpoints, Sub-Agents, Context-Pollution, Long-Running, Todo-Lists, Claude-Code, Memory-Files, Skills, Spec-Documents, Agent-Delegation, Context-Window

Claude-Code, Filesystem, Git, Markdown

Agent Context Preservation for Long-Running Workflows#

The context window is the single most important constraint in agent-driven work. A single-turn task uses a fraction of it. A multi-hour project fills it, overflows it, and degrades the agent’s reasoning quality long before the task is complete. Agents that work effectively on ambitious projects are not smarter – they manage context better.

This article covers practical, battle-tested patterns for preserving context across long sessions, delegating to sub-agents without losing coherence, and avoiding context pollution – the gradual degradation that happens when irrelevant information accumulates in the working context.

Agent Debugging Patterns: Tracing Decisions in Production

February 22, 2026

Agent-Tooling

Intermediate, Advanced

Agent-Debugging, Observability-Design, Production-Monitoring

Debugging, Observability, Tracing, Logging, Hallucination

Python, Typescript, Opentelemetry, Structured-Logging

Agent Debugging Patterns#

When an agent produces a wrong answer, the question is always the same: why did it do that? Unlike traditional software where you read a stack trace, agent failures are buried in a chain of LLM decisions, tool calls, and context accumulation. Debugging agents requires specialized observability that captures not just what happened, but what the agent was thinking at each step.

Tracing Agent Decision Chains#

Every agent action follows a decision chain: the model reads its context, decides which tool to call (or whether to respond directly), processes the result, and decides again. To debug failures, you need to see this chain as a structured trace.

Agent Evaluation and Testing: Measuring What Matters in Agent Performance

February 22, 2026

Agent-Tooling

Advanced

Agent-Evaluation, Test-Harness-Design, Metrics-Engineering

Testing, Evaluation, Metrics, Benchmarks, Regression-Testing, A-B-Testing

Python, Pytest, Json-Schema

Agent Evaluation and Testing#

You cannot improve what you cannot measure. Agent evaluation is harder than traditional software testing because agents are non-deterministic, their behavior depends on prompt wording, and the same input can produce multiple valid outputs. But “it is hard” is not an excuse for not doing it. This article provides a step-by-step framework for building an agent evaluation pipeline that catches regressions, compares configurations, and quantifies real-world performance.

Agent Sandboxing: Isolation Strategies for Execution Environments

February 22, 2026

Agent-Tooling

Intermediate, Advanced

Sandbox-Design, Security-Architecture, Container-Isolation

Sandboxing, Security, Containers, Isolation, Firecracker, Gvisor

Docker, Firecracker, Gvisor, Seccomp, Apparmor

Agent Sandboxing#

An AI agent that can execute code, run shell commands, or call APIs needs a sandbox. Without one, a single bad tool call – whether from a bug, a hallucination, or a prompt injection attack – can read secrets, modify production data, or pivot to other systems. This article is a decision framework for choosing the right sandboxing strategy based on your trust level, threat model, and performance requirements.

Agent-Oriented Terraform: Linear Patterns for Machine-Managed Infrastructure

February 22, 2026

Infrastructure

Intermediate, Advanced

Terraform-Agent-Patterns, State-Decomposition, Linear-Terraform-Design, Agent-Oriented-Iac

Terraform, Agent-Patterns, Linear-Code, State-Decomposition, Parallelism, Modules, Agent-Oriented, Infrastructure-as-Code, Refactoring, Maintainability

Terraform, Claude-Code

Agent-Oriented Terraform#

Most Terraform code is written by humans for humans. It favors abstraction, DRY principles, and deep module nesting — patterns that make sense when a human maintains a mental model of the codebase. Agents do not maintain mental models. They read code fresh each time, trace references to resolve dependencies, and reason about the full resource graph in a single context window.

The patterns that make Terraform elegant for humans make it expensive for agents. Deep module nesting multiplies the files an agent must read. Variable threading through three layers of modules hides dependencies behind indirection. Complex for_each over maps of objects creates resources that are invisible until runtime. The agent spends most of its context on navigation, not comprehension.

API Gateway Patterns: Selection, Configuration, and Routing

February 22, 2026

Microservices

Intermediate, Advanced

Api-Gateway-Configuration, Traffic-Management, Auth-Offloading, Rate-Limit-Design

Api-Gateway, Kong, Emissary-Ingress, Nginx, Traefik, Rate-Limiting, Authentication, Routing, Request-Transformation

Kong, Emissary-Ingress, Nginx, Traefik, Aws-Api-Gateway, Cloudflare, Helm, Kubectl

API Gateway Patterns#

An API gateway sits between clients and your backend services. It handles cross-cutting concerns – authentication, rate limiting, request transformation, routing – so your services do not have to. Choosing the right gateway and configuring it correctly is one of the first decisions in any microservices architecture.

Gateway Responsibilities#

Before selecting a gateway, clarify which responsibilities it should own:

Routing – directing requests to the correct backend service based on path, headers, or method.
Authentication and authorization – validating tokens, API keys, or certificates before requests reach backends.
Rate limiting – protecting backends from traffic spikes and enforcing usage quotas.
Request/response transformation – modifying headers, rewriting paths, converting between formats.
Load balancing – distributing traffic across service instances.
Observability – emitting metrics, logs, and traces for every request that passes through.
TLS termination – handling HTTPS so backends can speak plain HTTP internally.

No gateway does everything equally well. The right choice depends on which of these responsibilities matter most in your environment.

Automating Operational Runbooks

February 22, 2026

Sre

Intermediate, Advanced

Runbook-Design, Automation-Assessment, Guardrail-Implementation, Approval-Workflow-Design, Audit-Trail-Setup

Runbook-Automation, Rundeck, Ansible-Awx, Stackstorm, Automation, Guardrails, Approval-Workflow, Audit-Trail

Rundeck, Ansible, Ansible-Awx, Stackstorm, Bash, Python, Terraform, Kubectl

The Manual-to-Automated Progression#

Not every runbook should be automated, and automation does not happen in a single jump. The progression builds confidence at each stage.

Level 0 – Tribal Knowledge: The procedure exists only in someone’s head. Invisible risk.

Level 1 – Documented Runbook: Step-by-step instructions a human follows, including commands, expected outputs, and decision points. Every runbook starts here.

Level 2 – Scripted Runbook: Manual steps encoded in a script that a human triggers and monitors. The script handles tedious parts; the human handles judgment calls.

CDN and Edge Computing Patterns

February 22, 2026

Serverless

Intermediate, Advanced

Cdn-Configuration, Edge-Function-Development, Cache-Strategy-Design, Waf-Rule-Management, Geolocation-Routing

Cdn, Edge-Computing, Cloudflare-Workers, Lambda-Edge, Vercel, Cache-Invalidation, Origin-Shield, Waf, Geolocation, Ab-Testing, Edge-Functions

Cloudflare, Cloudflare-Workers, Aws-Cloudfront, Lambda-Edge, Vercel, Fastly, Terraform

CDN and Edge Computing Patterns#

A CDN (Content Delivery Network) caches content at edge locations close to users, reducing latency and offloading traffic from origin servers. Edge computing extends this by running custom code at those edge locations, enabling request transformation, authentication, A/B testing, and dynamic content generation without round-tripping to an origin server.

CDN Cache Fundamentals#

Cache-Control Headers#

The origin server controls CDN caching behavior through HTTP headers. Getting these right is the single most impactful CDN optimization.

Change Management for Infrastructure

February 22, 2026

Sre

Intermediate, Advanced

Change-Request-Workflow, Risk-Assessment, Rollback-Planning, Progressive-Rollout-Execution, Change-Freeze-Management

Change-Management, Rollback, Progressive-Rollout, Risk-Assessment, Change-Freeze, Infrastructure, Deployment

Git, Jira, Pagerduty, Slack, Terraform, Helm, Argocd, Kubectl

Why Change Management Matters#

Most production incidents trace back to a change. Code deployments, configuration updates, infrastructure modifications, database migrations – each introduces risk. Change management reduces that risk through structure, visibility, and accountability. The goal is not to prevent change but to make change safe, visible, and reversible.

Change Request Process#

Every infrastructure change flows through a structured request. The formality scales with risk, but the basic elements remain constant.

Cloud Behavioral Divergence Guide: Where AWS, Azure, and GCP Actually Differ

February 22, 2026

Cloud-Services

Intermediate, Advanced

Multi-Cloud-Operations, Cloud-Migration-Planning, Infrastructure-Debugging, Cross-Cloud-Translation

Aws, Azure, Gcp, Multi-Cloud, Iam, Irsa, Workload-Identity, Managed-Identity, Vpc, Vnet, Networking, Storage, Ebs, Csi-Drivers, Rds, Cloud-Sql, Azure-Sql, Dns, Load-Balancers, Behavioral-Divergence

Aws-Cli, Gcloud, Az-Cli, Kubectl, Terraform

Cloud Behavioral Divergence Guide#

Running the “same” workload on AWS, Azure, and GCP does not produce the same behavior. The Kubernetes API is portable, application containers are portable, and SQL queries are portable. Everything else – identity, networking, storage, load balancing, DNS, and managed service behavior – diverges in ways that matter for production reliability.

This guide documents the specific divergence points with practical examples. Use it when translating infrastructure from one cloud to another, when debugging behavior that differs between environments, or when assessing migration risk.