LLM Adapter Audit Checklist: 10 Bugs That Hide in OpenAI-Compatible Providers

LLM Adapter Audit Checklist#

When you wrap an OpenAI-compatible LLM provider (Moonshot, DeepSeek, xAI, Together, Fireworks, OpenRouter, vLLM, anything else that exposes POST /v1/chat/completions) in a Go HTTP client, the same ten bug classes show up. They all silently degrade or break the agent — none of them crash loudly. Each was observed in production across at least one of xAI, DeepSeek, or Moonshot during a two-week audit period.

This checklist is the audit. Run it against any new adapter before shipping. Each entry is Symptom → Cause → Fix with a code shape you can grep your repo for.

Moonshot Kimi K2.6 Operational Quirks: What Breaks in Production

Moonshot Kimi K2.6 Operational Quirks#

Kimi K2.6 is one of the cheapest competent reasoning models — $0.95/M input cache-miss, $0.16/M cache-hit, $4.00/M output, 256K context. It is also one of the most opinionated. Half of what works on OpenAI breaks here, and the failures are silent: empty content, mid-reasoning truncation, 400 errors that don’t mention the actual problem, and a cache key parameter that makes cost go up instead of down.

OFAT Matrix LLM Tuning: A Methodology for Picking Sampling Params, Tool Configs, and Prompts Without Guessing

OFAT Matrix LLM Tuning#

When a new provider or model lands and you have to decide what temperature, max_tokens, tool_choice, prompt-shape, and turn budget to ship in production, the default is to pick by hunch. Read the model card, copy a partner adapter’s defaults, ship. A week later you find out reasoning_effort=high doubled cost for no quality gain, max_tokens=2048 silently truncated half your tier-3 runs, and the “prompt-rich” pattern you copied from grok-4.3 actively hurts kimi.