Cost-Per-Pass, Not Cost-Per-Call: The Right Metric for Autonomous Agent Routing

May 20, 2026

Cost-Modeling, Model-Selection, Fleet-Economics

Cost-Optimization, Model-Selection, Routing, Deepseek, Kimi, Grok, Sonnet, Economics

Cost-Per-Pass, Not Cost-Per-Call#

Practitioners price LLMs by the per-token rate on the provider’s pricing page. For autonomous agents, that number is misleading. Two layers of indirection sit between the per-token rate and the cost you actually pay to get work done: variable prompt sizes turn per-token into per-call, and variable pass rates turn per-call into per-pass. Each layer can invert the ranking.

For autonomous fleets where failed attempts trigger reviewer cycles, retries, and reputational drag, cost-per-pass is the only metric that ranks models correctly. This article shows how to compute it, when it dominates, and where the cheapest-per-token model becomes the most expensive in production.

DeepSeek V4 Operational Quirks: Pro vs Flash, Reasoning Echo, and the Discount Cliff

May 20, 2026

Agent-Tooling

Intermediate, Advanced

Llm-Adapter-Development, Provider-Integration, Cost-Modeling

Deepseek, Deepseek-V4, Llm-Quirks, Reasoning-Models, Openai-Compatible, Production, Cost-Modeling

Deepseek, Deepseek-V4-Pro, Deepseek-V4-Flash, Go

DeepSeek V4 Operational Quirks#

DeepSeek V4 ships two models behind one OpenAI-compatible API: V4-Pro (reasoning) at $1.74/M input / $3.48/M output and V4-Flash (chat) at $0.28/M input / $1.10/M output. Until 2026-05-31 V4-Pro carries a 75% discount, putting it at $0.435/M input — cheap enough to use as a heavy-tier coding model. After that, the cost steps up 4×.

The two models live on the same endpoint but want very different things. V4-Pro behaves like a reasoning model (thin prompts, reasoning_content echo required, tool_choice restrictions). V4-Flash behaves like a chat model (rich prompts win dramatically; rejects nothing). Confuse them and your matrix lights up red.

Long-Term Metrics Storage: Thanos vs Grafana Mimir vs VictoriaMetrics

February 21, 2026

Observability

Intermediate

Long-Term-Storage-Design, Multi-Cluster-Monitoring, Metrics-Architecture, Cost-Modeling

Prometheus, Thanos, Mimir, Victoriametrics, Long-Term-Storage, Multi-Cluster, Object-Storage, Metrics

Prometheus, Thanos, Grafana-Mimir, Victoriametrics, Grafana, S3, Gcs

The Retention Problem#

Prometheus stores metrics on local disk with a default retention of 15 days. Most production teams extend this to 30 or 90 days, but local storage has hard limits. A single Prometheus instance cannot scale disk beyond the node it runs on. It provides no high availability – if the instance goes down, you lose scraping and query access. And each Prometheus instance only sees its own targets, so there is no unified view across clusters or regions.