Cost-Per-Pass, Not Cost-Per-Call: The Right Metric for Autonomous Agent Routing

May 20, 2026

Cost-Modeling, Model-Selection, Fleet-Economics

Cost-Optimization, Model-Selection, Routing, Deepseek, Kimi, Grok, Sonnet, Economics

Cost-Per-Pass, Not Cost-Per-Call#

Practitioners price LLMs by the per-token rate on the provider’s pricing page. For autonomous agents, that number is misleading. Two layers of indirection sit between the per-token rate and the cost you actually pay to get work done: variable prompt sizes turn per-token into per-call, and variable pass rates turn per-call into per-pass. Each layer can invert the ranking.

For autonomous fleets where failed attempts trigger reviewer cycles, retries, and reputational drag, cost-per-pass is the only metric that ranks models correctly. This article shows how to compute it, when it dominates, and where the cheapest-per-token model becomes the most expensive in production.