Agent Security Patterns: Defending Against Injection, Leakage, and Misuse

Agent Security Patterns#

An AI agent with tool access is a program that can read files, call APIs, execute code, and modify systems – driven by natural language input. Every classic security concern applies, plus new attack surfaces unique to LLM-powered systems. This article covers practical defenses, not theoretical risks.

Prompt Injection Defense#

Prompt injection is the most agent-specific security threat. An attacker embeds instructions in data the agent processes – a file, a web page, an API response – and the agent follows those instructions as if they came from the user.

Agent-Friendly API Design: Building APIs That Agents Can Consume

Agent-Friendly API Design#

Most APIs are designed for human developers who read documentation, interpret ambiguous error messages, and adapt their approach based on experience. Agents do not have these skills. They parse structured responses, follow explicit instructions, and fail on ambiguity. An API that is pleasant for humans to use may be impossible for an agent to use reliably.

This reference covers practical patterns for designing APIs – or modifying existing ones – so that agents can consume them effectively.

Agentic Workflow Patterns: Plan-Execute-Observe Loops, ReAct, and Task Decomposition

Agentic Workflow Patterns#

An agent without a workflow pattern is a chatbot. What separates an agent from a single-turn LLM call is the loop: observe the environment, reason about what to do, act, observe the result, and decide whether to continue. The loop structure determines everything – how the agent plans, how it recovers from errors, when it stops, and whether it can handle tasks that take minutes or hours.

Building LLM Harnesses: Orchestrating Local Models into Workflows with Scoring, Retries, and Parallel Execution

Building LLM Harnesses#

A harness is the infrastructure that wraps LLM calls into a reliable, testable, and observable workflow. It handles the concerns that a raw API call does not: input preparation, output validation, error recovery, model routing, parallel execution, and quality scoring. Without a harness, you have a script. With one, you have a tool.

Harness Architecture#

Input
  │
  ├── Preprocessing (validate input, select model, prepare prompt)
  │
  ├── Execution (call Ollama with timeout, retry on failure)
  │
  ├── Post-processing (parse output, validate schema, score quality)
  │
  ├── Routing (if quality too low, escalate to larger model or flag)
  │
  └── Output (structured result + metadata)

Core Harness in Python#

import ollama
import json
import time
from dataclasses import dataclass, field
from typing import Any, Callable

@dataclass
class LLMResult:
    content: str
    model: str
    tokens_in: int
    tokens_out: int
    duration_ms: int
    ttft_ms: int
    success: bool
    retries: int = 0
    score: float | None = None
    metadata: dict = field(default_factory=dict)

@dataclass
class HarnessConfig:
    model: str = "qwen2.5-coder:7b"
    temperature: float = 0.0
    max_tokens: int = 1024
    json_mode: bool = False
    timeout_seconds: int = 120
    max_retries: int = 2
    retry_delay_seconds: float = 1.0

def call_llm(
    messages: list[dict],
    config: HarnessConfig,
) -> LLMResult:
    """Make a single LLM call with timing metadata."""
    start = time.monotonic()

    kwargs = {
        "model": config.model,
        "messages": messages,
        "options": {
            "temperature": config.temperature,
            "num_predict": config.max_tokens,
        },
        "stream": False,
    }
    if config.json_mode:
        kwargs["format"] = "json"

    try:
        response = ollama.chat(**kwargs)
        duration = int((time.monotonic() - start) * 1000)

        return LLMResult(
            content=response["message"]["content"],
            model=config.model,
            tokens_in=response.get("prompt_eval_count", 0),
            tokens_out=response.get("eval_count", 0),
            duration_ms=duration,
            ttft_ms=int(response.get("prompt_eval_duration", 0) / 1_000_000),
            success=True,
        )
    except Exception as e:
        duration = int((time.monotonic() - start) * 1000)
        return LLMResult(
            content=str(e),
            model=config.model,
            tokens_in=0,
            tokens_out=0,
            duration_ms=duration,
            ttft_ms=0,
            success=False,
        )

Retry with Validation#

Do not retry blindly. Retry only when the output fails validation:

Choosing a Local Model: Size Tiers, Task Matching, and Cost Comparison with Cloud APIs

Choosing a Local Model#

The most expensive mistake in local LLM adoption is running a 70B model for a task that a 3B model handles at 20x the speed for equivalent quality. The second most expensive mistake is running a 3B model on a task that requires 32B-level reasoning and getting garbage output.

Matching model size to task complexity is the core skill. This guide provides a framework grounded in empirical benchmarks, not marketing claims.

Designing Agent-Ready Projects: Structure That Benefits Humans and Agents Equally

Designing Agent-Ready Projects#

An “agent-ready” project is just a well-documented project. Every practice that helps an agent — clear conventions, explicit commands, tracked progress, documented decisions — also helps a new team member, a future-you who forgot the details, or a contractor picking up the project for the first time.

The difference is that humans can ask follow-up questions and gradually build context through conversation. Agents cannot. They need it written down, in the right place, at the right level of detail. Projects that meet this bar are better for everyone.

Detecting Infrastructure Knowledge Gaps: What Agents Don't Know They Don't Know

Detecting Infrastructure Knowledge Gaps#

The most dangerous thing an agent can do is confidently produce a deliverable based on wrong assumptions. An agent that assumes x86_64 when the target is ARM64, that assumes PostgreSQL 14 behavior when the target runs 15, or that assumes AWS IAM patterns when the target is Azure – that agent produces a runbook that will fail in ways the human did not expect and may not understand.

How Agents Communicate: Explaining Plans, Risks, and Trade-offs to Humans

How Agents Communicate#

The most capable agent in the world is useless if the human does not understand what it is doing, why it is doing it, or what it needs. Poor communication is the single largest cause of failed agent-human collaboration — not poor reasoning, not incorrect code, but the human losing confidence because the agent did not explain itself well enough.

This article covers communication patterns that build trust: how to present plans, explain risks, frame trade-offs, report progress, and escalate problems. It is written for agents to follow and for humans to know what good agent communication looks like.

Human-in-the-Loop Patterns: Approval Gates, Escalation, and Progressive Autonomy

Human-in-the-Loop Patterns#

The most common failure mode in agent-driven work is not a wrong answer – it is a correct action taken without permission. An agent that deletes a file to “clean up,” force-pushes a branch to “fix history,” or restarts a service to “apply changes” can cause more damage in one unauthorized action than a dozen wrong answers.

Human-in-the-loop design is not about limiting agent capability. It is about matching autonomy to risk. Safe, reversible actions should proceed without interruption. Dangerous, irreversible actions should require explicit approval. The challenge is building this classification into the workflow without turning every action into a confirmation dialog.

Long-Running Workflow Orchestration: State Machines, Checkpointing, and Resumable Multi-Agent Execution

Long-Running Workflow Orchestration#

Most agent examples show single-turn or single-session tasks: answer a question, write a function, debug an error. Real projects are different. Building a feature, migrating a database, setting up a monitoring stack – these take hours, span multiple sessions, involve parallel work streams, and must survive context window resets, session timeouts, and partial failures.

This article covers the architecture for workflows that last hours or days: how to model progress as a state machine, how to checkpoint for reliable resumption, how to delegate to parallel sub-agents without losing coherence, and how to recover when things fail partway through.