The d4-rich Prompt Pattern: Unlocking Non-Reasoning Models on Multi-File Tasks

May 20, 2026

Prompt-Design, Model-Selection, Scaffolding-Pattern-Design

Prompt-Engineering, Deepseek, Grok, Kimi, Matrix-Testing, Non-Reasoning-Models, Tool-Use

The d4-rich Prompt Pattern#

Non-reasoning chat models (deepseek-V4-Flash, grok-4.3, kimi with thinking disabled) collapse on multi-file refactor tasks when given thin or baseline prompts. Pass rates of 0-33% on canaries that reasoning models clear at 67-100%. The cheap fix is a three-part prompt addendum: completion checklist, callsites-exhaustively-updated rule, and verify-before-push instruction. Drop it into the system prompt of a non-reasoning model and the canaries go green. Drop it into a reasoning model and you pay 12× more for 0% quality improvement.

Tiered-LLM Tooling: Local Model by Default, Escalate to the Frontier Model

May 27, 2026

Agent-Tooling

Intermediate, Advanced

Llm-Application-Design, Agent-Architecture

Llm, Local-Llm, Ollama, Agents, Cost-Optimization, Tool-Calling, Architecture

Ollama, Claude

Tiered-LLM Tooling: Local by Default, Escalate to Frontier#

When you build a chat or ops interface backed by an LLM, paying a frontier model for every interaction is wasteful — most interactions are cheap lookups, summaries, and routing. A tiered design serves the high-frequency majority with a small local model (e.g. an Ollama-served model on a GPU you already have) and escalates to a frontier model (e.g. Claude) only for the hard minority.