OFAT Matrix LLM Tuning: A Methodology for Picking Sampling Params, Tool Configs, and Prompts Without Guessing

OFAT Matrix LLM Tuning#

When a new provider or model lands and you have to decide what temperature, max_tokens, tool_choice, prompt-shape, and turn budget to ship in production, the default is to pick by hunch. Read the model card, copy a partner adapter’s defaults, ship. A week later you find out reasoning_effort=high doubled cost for no quality gain, max_tokens=2048 silently truncated half your tier-3 runs, and the “prompt-rich” pattern you copied from grok-4.3 actively hurts kimi.