---
title: "Autonomy Tiers and Escalation as Runtime Contracts, Not Prompt Instructions"
description: "How to make autonomous agents fail loudly and safely without humans-in-loop on every action. Treat defer_to_human, escalate_to_architect, and dispatch_paused_until as first-class runtime tools and database state, not as prompt suggestions. Covers the three-tier autonomy model, the silent-give-up failure that prompt-only autonomy keeps producing, the terminal_tools guardrail, and the F1 dispatch-pause pattern."
url: https://agent-zone.ai/knowledge/agent-tooling/autonomy-tiers-runtime-contracts/
section: knowledge
date: 2026-05-18
categories: ["agent-tooling"]
tags: ["autonomy","escalation","agent-failure-modes","human-in-the-loop","defer-pattern","dispatch-control","agent-runtime","guardrails"]
skills: ["autonomy-tier-design","escalation-contract-design","agent-failure-mode-engineering"]
tools: ["mcp","prometheus"]
levels: ["advanced"]
word_count: 1912
formats:
  json: https://agent-zone.ai/knowledge/agent-tooling/autonomy-tiers-runtime-contracts/index.json
  html: https://agent-zone.ai/knowledge/agent-tooling/autonomy-tiers-runtime-contracts/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Autonomy+Tiers+and+Escalation+as+Runtime+Contracts%2C+Not+Prompt+Instructions
---


An agent is dispatched on a task it cannot complete. The spec is broken. The dependency is missing. The credentials are wrong. What happens next determines whether you have an autonomous fleet or a fleet that quietly fails.

The most common answer — instructing the agent in its prompt to "ask for help if stuck" — does not survive contact with production. Agents either keep grinding and produce broken work, or output text that looks like a question but never reaches a human, or politely "complete" the task by writing nothing and reporting success. None of these failure modes are visible from the outside until the dashboards have been lying for hours.

The fix is to treat autonomy boundaries as runtime contracts: tools the agent must call, state the runtime enforces, and observable events that escalate up the chain. Prompts can describe the contract but cannot enforce it. The runtime has to.

## The three failure modes of prompt-only autonomy

Every team building autonomous agents discovers these in roughly this order:

**Silent give-up.** The model emits text, no tool calls, and the cycle ends with `status="ok"`. From the runtime's perspective the task succeeded. From the user's perspective nothing happened. Common when the agent decides the problem is too hard but the prompt taught it not to refuse loudly.

**Polite loop.** The agent keeps trying the same approach with minor variations until it hits a turn limit or token budget. Each cycle costs real money. The task never ships. From dashboards, the agent looks busy.

**Pseudo-completion.** The agent writes "Done — the task has been completed" in text but never called the tool that actually does the work (no `push_branch`, no `apply_patch`, no `submit_form`). The runtime sees text output and reports success. The downstream state was never changed.

All three are forms of the same root cause: the runtime has no way to distinguish "the agent successfully completed the task" from "the agent stopped". Prompts that say "use the defer tool if you cannot proceed" depend on the model choosing to call the tool. Sometimes it does. Sometimes it doesn't. The cost of "sometimes" is too high to leave to the model.

## The contract: tools, state, observability

A runtime-enforced autonomy boundary has three layers:

**Tools** — specific functions the agent calls to express "I cannot proceed", "I need human attention", or "I'm done in a way the runtime can verify". These are first-class MCP tools (or equivalent), not prompt patterns. The agent's tool list literally includes them.

**State** — database columns or persistent values that record the autonomy decision. `dispatch_paused_until`, `accepted_by`, `escalation_reason`. These survive pod restarts and are queryable from outside the agent.

**Observability** — events emitted to logs, metrics, and message channels whenever an autonomy boundary is crossed. Alerts fire on patterns ("agent paused >30min", "escalation rate >5/hour"). The fleet operator sees the boundary cross even when nobody asked.

The pattern below uses three concrete tools and the corresponding state to cover the three failure modes.

## The autonomy tier ladder

Different situations warrant different responses. A three-tier ladder captures the practical distinctions:

| Tier | Tool | Semantics | Recovery |
|---|---|---|---|
| **Soft** | `defer_to_human` | "I attempted but cannot complete this specific task. Reassign or clarify." | Item rotates back to triage; agent stays available for other work |
| **Medium** | `escalate_to_architect` | "Something is wrong that's bigger than this task. Architect should investigate." | Posts to coordination channel; architect agent or human reads + decides |
| **Hard** | `dispatch_pause` (runtime-triggered) | "This agent is unhealthy. Stop sending it work until X." | Sets `dispatch_paused_until` timestamp; dispatcher refuses to route until it expires |

Each tier is more disruptive than the last. `defer_to_human` affects one task. `escalate_to_architect` affects the architect's queue. `dispatch_pause` takes the agent out of rotation entirely. Models tend to pick the wrong tier if you don't make the trade-offs concrete in their prompt and reinforce with examples.

## Implementing `defer_to_human`

The tool implementation is small. The contract is what matters.

```python
async def defer_to_human(ctx, args: DeferArgs) -> str:
    """Tool: declare that this specific task cannot proceed without human input.

    Args:
        reason: 1-2 sentence explanation of what's blocking
    """
    if not args.reason or len(args.reason) < 20:
        return error("defer reason must be >=20 chars; be specific")

    # Record state
    await db.execute("""
        INSERT INTO events (agent, event_type, summary, detail)
        VALUES ($1, 'defer_to_human', $2, $3)
    """, ctx.agent, args.reason, json.dumps({"task_id": ctx.task_id, "reason": args.reason}))

    # Post to coordination channel (tagged to humans, not other agents)
    await mm.post(channel="hub-general",
                  message=f":raising_hand: **@{ctx.agent} defers to human**: {args.reason}\n(item: {ctx.task_id})")

    # Mark task incomplete in a way the runtime recognizes as terminal
    return success({"status": "deferred", "reason": args.reason})
```

The agent prompt says: "If the spec contradicts the source, the credentials are missing, or you have tried twice and cannot make progress, call `defer_to_human` with a specific reason. Do not invent missing information or guess at the user's intent."

The runtime says: "If the cycle ends with `status="ok"` but no `push_branch` / `defer_to_human` / `escalate_to_architect` was called, that's not really `ok`." See the terminal_tools guardrail section below.

## Implementing `escalate_to_architect`

Same shape, different routing. Reserved for problems bigger than the current task.

```python
async def escalate_to_architect(ctx, args: EscalateArgs) -> str:
    """Tool: report a problem outside the scope of this task.

    Args:
        severity: 'warning' | 'critical'
        reason: what's wrong and why this task can't fix it
    """
    # Record + alert
    await db.execute("""
        INSERT INTO events (agent, event_type, summary, detail)
        VALUES ($1, 'escalation', $2, $3)
    """, ctx.agent, args.reason, json.dumps({
        "severity": args.severity, "task_id": ctx.task_id, "reason": args.reason,
    }))

    tag = "@architect" if args.severity == "warning" else "@architect @ops-lead"
    await mm.post(channel="hub-general",
                  message=f":rotating_light: **[{args.severity.upper()} ESCALATION from @{ctx.agent}]** {tag} — {args.reason}")

    return success({"status": "escalated", "severity": args.severity})
```

The prompt makes the line between defer and escalate concrete: "Defer if this task is just blocked. Escalate if you've noticed something that suggests the spec was wrong, the infrastructure is broken, or other agents are about to hit the same wall."

## The terminal_tools guardrail (catching silent give-up)

The single highest-leverage piece of runtime autonomy: at the end of every cycle, check whether the agent called at least one "terminal" tool — a tool that demonstrably moved the world forward or surfaced a problem.

```python
TERMINAL_TOOLS = {"push_branch", "open_pr", "apply_patch",
                  "defer_to_human", "escalate_to_architect"}

def classify_cycle_result(tools_used: list[str], status: str) -> str:
    if status != "ok":
        return status  # error already
    if not any(t in TERMINAL_TOOLS for t in tools_used):
        # Agent emitted text and stopped without calling a terminal tool.
        # This is the silent-give-up failure mode.
        return "incomplete"
    return "ok"
```

The runtime then treats `"incomplete"` differently from `"ok"`:

- Increment a counter (`agent_incomplete_cycles_total{agent=…}`)
- Do NOT mark the backlog item as completed
- Log the conversation tail (last 2 tool calls + last model output) for forensics
- Optionally auto-trigger one retry with a directive prefix ("Your last cycle ended without calling any of the terminal tools. Either ship the work or call defer_to_human / escalate_to_architect with a reason.")

This single check catches every shape of silent-stop the prompt-only approach misses. The agent learns over time (or the prompt is tightened) to either ship or escalate; "stop quietly" stops being a valid outcome.

## Dispatch pause: when the agent itself is the problem

Sometimes the problem is not a task but the agent. The LLM provider is returning auth errors. The agent's token budget is exhausted. The model is in a confidence-decay loop. Continuing to dispatch work just burns money.

The pattern: a `dispatch_paused_until` column on the agent record, written by the runtime when health-tracker thresholds are crossed, and respected by the dispatcher before routing any work.

```python
# Health tracker — fed by the runtime after each cycle
class DispatchHealthTracker:
    def record_llm_error(self, agent, error_class):
        # Consecutive errors → pause; non-error → reset
        if error_class in ("auth_failure", "quota_exceeded"):
            # Critical: pause for a longer window
            self.pause(agent, duration=timedelta(hours=1),
                       reason=f"llm_{error_class}")
        else:
            self._consecutive[agent] += 1
            if self._consecutive[agent] >= 10:
                self.pause(agent, duration=timedelta(minutes=30),
                           reason="llm_errors_threshold")

    def pause(self, agent, duration, reason):
        until = datetime.utcnow() + duration
        db.execute("UPDATE agents SET dispatch_paused_until=$1, dispatch_paused_reason=$2 WHERE name=$3",
                   until, reason, agent)
        log.warning("dispatch_pause", agent=agent, until=until, reason=reason)

# Dispatcher — refuses to route to a paused agent
async def route_task(item, candidate):
    agent = await db.fetchrow("SELECT dispatch_paused_until, dispatch_paused_reason FROM agents WHERE name=$1", candidate)
    if agent.dispatch_paused_until and agent.dispatch_paused_until > datetime.utcnow():
        return refuse(f"{candidate} paused until {agent.dispatch_paused_until} ({agent.dispatch_paused_reason})")
    # ... proceed with dispatch
```

The pause is self-clearing — when `dispatch_paused_until` is in the past, the dispatcher routes again. The agent recovers without operator intervention if the underlying issue (transient provider error, expired token getting refreshed by a sidecar) resolves on its own. Persistent failures push the agent into longer pauses and eventually surface as alerts.

## Alerts on the autonomy state

The state values are observable, which means they make great alert targets. A few that catch real production issues:

```promql
# F1: agent stuck in dispatch-paused state longer than expected
agent_dispatch_paused == 1
for: 30m

# F2: escalation rate spike — something fleet-wide is going wrong
sum by (agent) (rate(agent_escalation_total[5m])) > 0.05
for: 10m

# F3: defer rate per agent — possibly model degradation or spec quality drop
sum by (agent) (rate(agent_defer_to_human_total[1h])) > 1
for: 30m

# F4: incomplete cycles (silent give-up) — terminal_tools guardrail firing
sum by (agent) (rate(agent_incomplete_cycles_total[10m])) > 0.5
for: 15m
```

Each one is a leading indicator. Dispatch pauses sitting too long mean the underlying issue isn't clearing — could be a permanent auth break. Escalation spikes often mean a shared dependency broke. Defer rate growth on one agent suggests the model is hitting specs it cannot handle. Incomplete cycles fired anywhere means terminal_tools is doing its job; investigate why the agent stopped without escalating.

## What this is NOT

The autonomy tier ladder does not replace prompt engineering. Prompts still describe what each tier means, when to use each one, and what the agent's role is. The runtime contracts catch cases where the model (for whatever reason) does not follow the prompt — but they're not a substitute for a clear prompt.

It is also not a substitute for testing. An agent that confidently calls `defer_to_human` on every task is broken in a different way than one that never calls it. Eval harnesses still matter; the runtime contracts catch what slips past the eval, not everything.

And it is not free. Each tier adds a tool, a database column, an event type, a metric, possibly an alert. For a small fleet doing simple work, this is over-engineered. The pattern starts paying off when agents run autonomously for hours at a time, when nobody is watching every cycle, and when the cost of a silent failure shipping (or not shipping) is bigger than the cost of building the contracts.

## Where to start

The single highest-leverage addition is the `terminal_tools` guardrail. It is the cheapest to implement (one runtime check at end of cycle), catches the most damaging failure mode (silent give-up), and feeds into everything else (the metric it emits is the basis of the F4 alert above).

After that, add `defer_to_human` as a tool, accept that prompts now need to teach the agent when to use it, and watch the defer rate. Then `escalate_to_architect` when you notice cases where the issue is bigger than one task. Then dispatch-pause when you start seeing repeating LLM errors burn money. Each step builds on the previous and is independently shippable — you don't have to design the whole tier system upfront.

The common mistake is to build the most sophisticated tier (dispatch-pause with health tracking, multi-tier alerting, auto-recovery) first because it's the most interesting. The actual order of value is the opposite: catch silent give-up first, give the agent a polite way out second, give it a louder voice third, and only build the meta-health system once you have evidence the agent itself is sometimes the problem.

