Reasoning-Model Tuning Asymmetry: Why Thin Prompts Beat Rich Prompts (and When They Don't)

Reasoning-Model Tuning Asymmetry#

Practitioners assume “better prompt = better output”. For one model class, that assumption is correct. For the other, the same prompt makes things measurably worse. This article documents the asymmetry, names the dividing line, and gives you a 4-cell test to confirm it on your own canary before you commit to a prompt.

The asymmetry is empirical, not theoretical. It shows up cleanly across four independent OFAT (one-factor-at-a-time) matrices run between 2026-05-18 and 2026-05-20: sonnet POC, grok matrix v1+v2, deepseek matrix v1, kimi matrix v1.

The Five-Agent Research Pattern: Surveying a New LLM Provider Before You Tune It

The Five-Agent Research Pattern#

Adopting a new LLM provider for a coding-agent role looks easy from the docs. Read the model card, copy the partner adapter’s defaults, ship. A week later you find out the provider rejects tool_choice=required in thinking mode, the docs lied about reasoning_content echoing, and your retry loop multiplies the per-turn timeout by 3x because the rate-limit response isn’t JSON.

The docs miss what was patched after release. The community catches what the docs miss. Partner adapters encode lived defaults nobody published. Your own adapter has bugs you can’t see from inside it. Reading any one of these in isolation gets you to “I think I understand this provider.” Reading all five in parallel gets you a knob list, an open-contradictions list, and a list of bugs to fix before the matrix runs. The pattern: spawn 5 parallel research sub-agents, one per angle, then synthesize.

An End-to-End Workflow for Evaluating & Tuning Local LLMs for Agents

Decision-first: Follow this order and you’ll have a deployable model + tuned config in days, not weeks: (1) scope the hardware, (2) shortlist by active params, (3) per-model OFAT matrix, (4) run serially with an OOM guard (smoke first), (5) write a finding card per model, (6) decide. The expensive mistakes are skipping the smoke step, sweeping more than one factor at once, and trusting a single run.

Scope & freshness: Process is model/hardware-independent; the worked numbers are from a 2026-05 effort on a GB10 (128 GB) + an Apple-Silicon Mac, evaluating local MoE models vs cloud baselines for agentic coding. Re-validate the findings, not the workflow.

Tuning Local LLMs for Agentic Coding: Sampling, Reasoning, and Budgets

Decision-first: Per new model, sweep temperature (don’t assume 0.3), try reasoning off for builders, test echo_reasoning both ways, and on budget_exceeded check turns-vs-tokens before changing either. The right config is model-specific — assume nothing.

Scope & freshness: Local + cloud models for agentic coding, 2026-05. Findings are per-model (see the specific models named); treat them as examples of shape, not universal constants — re-sweep for any new model.

Linux Performance Tuning: sysctl, ulimits, I/O Schedulers, and Kernel Parameters

sysctl: Kernel Parameter Tuning#

The sysctl interface exposes kernel parameters that control how Linux manages memory, networking, file systems, and processes. Changes take effect immediately but are lost on reboot unless persisted.

Memory Parameters#

# Reduce swap aggressiveness (default is 60, range 0-100)
# Lower values make the kernel prefer reclaiming page cache over swapping
# Set to 10 for database servers -- swapping destroys database performance
sysctl -w vm.swappiness=10

# Overcommit behavior
# 0 = heuristic overcommit (default, kernel estimates if there is enough memory)
# 1 = always overcommit (never refuse malloc -- dangerous but used by Redis)
# 2 = strict overcommit (never allocate more than swap + ratio*physical)
sysctl -w vm.overcommit_memory=0

The vm.swappiness parameter is one of the most impactful settings for database servers. The default of 60 means the kernel will fairly aggressively swap application memory to disk in favor of filesystem cache. For databases that manage their own caching (PostgreSQL shared_buffers, MySQL innodb_buffer_pool), this is counterproductive – the database’s carefully managed cache gets swapped out to make room for OS-level cache the database does not use.