Reasoning-Model Tuning Asymmetry: Why Thin Prompts Beat Rich Prompts (and When They Don't)

May 20, 2026

Prompt-Engineering, Model-Evaluation, Ab-Testing

Prompt-Engineering, Reasoning-Models, Kimi, Deepseek, Grok, Sonnet, Ofat, Tuning

Reasoning-Model Tuning Asymmetry#

Practitioners assume “better prompt = better output”. For one model class, that assumption is correct. For the other, the same prompt makes things measurably worse. This article documents the asymmetry, names the dividing line, and gives you a 4-cell test to confirm it on your own canary before you commit to a prompt.

The asymmetry is empirical, not theoretical. It shows up cleanly across four independent OFAT (one-factor-at-a-time) matrices run between 2026-05-18 and 2026-05-20: sonnet POC, grok matrix v1+v2, deepseek matrix v1, kimi matrix v1.