Reasoning-Model Tuning Asymmetry: Why Thin Prompts Beat Rich Prompts (and When They Don't)

May 20, 2026

Prompt-Engineering, Model-Evaluation, Ab-Testing

Prompt-Engineering, Reasoning-Models, Kimi, Deepseek, Grok, Sonnet, Ofat, Tuning

Reasoning-Model Tuning Asymmetry#

Practitioners assume “better prompt = better output”. For one model class, that assumption is correct. For the other, the same prompt makes things measurably worse. This article documents the asymmetry, names the dividing line, and gives you a 4-cell test to confirm it on your own canary before you commit to a prompt.

The asymmetry is empirical, not theoretical. It shows up cleanly across four independent OFAT (one-factor-at-a-time) matrices run between 2026-05-18 and 2026-05-20: sonnet POC, grok matrix v1+v2, deepseek matrix v1, kimi matrix v1.

The d4-rich Prompt Pattern: Unlocking Non-Reasoning Models on Multi-File Tasks

May 20, 2026

Agent-Tooling

Intermediate, Advanced

Prompt-Design, Model-Selection, Scaffolding-Pattern-Design

Prompt-Engineering, Deepseek, Grok, Kimi, Matrix-Testing, Non-Reasoning-Models, Tool-Use

Deepseek, Grok, Kimi, Claude

The d4-rich Prompt Pattern#

Non-reasoning chat models (deepseek-V4-Flash, grok-4.3, kimi with thinking disabled) collapse on multi-file refactor tasks when given thin or baseline prompts. Pass rates of 0-33% on canaries that reasoning models clear at 67-100%. The cheap fix is a three-part prompt addendum: completion checklist, callsites-exhaustively-updated rule, and verify-before-push instruction. Drop it into the system prompt of a non-reasoning model and the canaries go green. Drop it into a reasoning model and you pay 12× more for 0% quality improvement.

Tuning Local LLMs for Agentic Coding: Sampling, Reasoning, and Budgets

May 25, 2026

Agent-Tooling

Intermediate, Advanced

Llm-Tuning, Sampling-Configuration, Prompt-Directive-Design, Budget-Configuration

Local-Llm, Tuning, Temperature, Reasoning, Sampling, Prompt-Engineering, Moe, Ollama, Lm-Studio, Tool-Calling

Lm-Studio, Ollama, Llama.cpp

Decision-first: Per new model, sweep temperature (don’t assume 0.3), try reasoning off for builders, test echo_reasoning both ways, and on budget_exceeded check turns-vs-tokens before changing either. The right config is model-specific — assume nothing.

Scope & freshness: Local + cloud models for agentic coding, 2026-05. Findings are per-model (see the specific models named); treat them as examples of shape, not universal constants — re-sweep for any new model.

Prompt Engineering for Infrastructure Operations: Templates, Safety, and Structured Reasoning

February 22, 2026

Agent-Tooling

Intermediate

Prompt-Design, Infrastructure-Automation, Safety-Constraints

Prompt-Engineering, Infrastructure, Chain-of-Thought, Few-Shot, Safety, Templates

Python, Bash, Kubernetes, Terraform

Prompt Engineering for Infrastructure Operations#

Infrastructure prompts differ from general-purpose prompts in one critical way: the output often drives real actions on real systems. A hallucinated filename in a creative writing task is harmless. A hallucinated resource name in a Kubernetes delete command causes an outage. Every prompt pattern here is designed with that asymmetry in mind – prioritizing correctness and safety over cleverness.

Structured Output for Infrastructure Data#

Infrastructure operations produce structured data: IP addresses, resource names, status codes, configuration values. Free-form text responses create parsing fragility. Force structured output from the start.

Prompt Engineering for Local Models: Presets, Focus Areas, and Differences from Cloud Model Prompting

February 22, 2026

Agent-Tooling

Intermediate

Local-Model-Prompting, Preset-Design, Prompt-Debugging

Prompt-Engineering, Local-Llm, Ollama, Presets, Structured-Prompts, Small-Models

Ollama, Qwen, Llama, Python

Prompt Engineering for Local Models#

Prompting a 7B local model is not the same as prompting Claude or GPT-4. Cloud models are overtrained on instruction following, tolerate vague prompts, and self-correct. Small local models need more structure, more constraints, and more explicit formatting instructions. The prompts that work effortlessly on cloud models often produce garbage on local models.

This is not a weakness — it is a design consideration. Local models trade generality for speed and cost. Your prompts must compensate by being more specific.