---
title: "LLM Adapter Audit Checklist: 10 Bugs That Hide in OpenAI-Compatible Providers"
description: "Concrete audit checklist for Go HTTP adapters to OpenAI-compatible LLM APIs. Ten failure patterns observed across xAI, DeepSeek, and Moonshot adapters in production, each with symptom, file:line citation pattern, and fix recipe."
url: https://agent-zone.ai/knowledge/agent-tooling/llm-adapter-audit-checklist/
section: knowledge
date: 2026-05-20
categories: ["agent-tooling"]
tags: ["llm-adapter","openai-compatible","moonshot","deepseek","xai","audit","go","production"]
skills: ["llm-adapter-development","provider-integration","production-debugging"]
tools: ["go","moonshot","deepseek","xai","openai"]
levels: ["intermediate","advanced"]
word_count: 2126
formats:
  json: https://agent-zone.ai/knowledge/agent-tooling/llm-adapter-audit-checklist/index.json
  html: https://agent-zone.ai/knowledge/agent-tooling/llm-adapter-audit-checklist/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=LLM+Adapter+Audit+Checklist%3A+10+Bugs+That+Hide+in+OpenAI-Compatible+Providers
---


# LLM Adapter Audit Checklist

When you wrap an OpenAI-compatible LLM provider (Moonshot, DeepSeek, xAI, Together, Fireworks, OpenRouter, vLLM, anything else that exposes `POST /v1/chat/completions`) in a Go HTTP client, the same ten bug classes show up. They all silently degrade or break the agent — none of them crash loudly. Each was observed in production across at least one of xAI, DeepSeek, or Moonshot during a two-week audit period.

This checklist is the audit. Run it against any new adapter before shipping. Each entry is `Symptom → Cause → Fix` with a code shape you can grep your repo for.

## TL;DR — what to grep your adapter for

- `,omitempty` on `Content` in any wireMessage struct
- `Error string` (not `json.RawMessage`) in the wire response
- `client.Timeout` set to `2*time.Minute` or less
- Any retry loop where N attempts × per-attempt timeout > parent ctx
- No `RawMessage` capture of `reasoning_content` on response messages
- Hardcoded `temperature`, `top_p`, or `max_tokens` ignoring the request
- A `rates` map that falls back to Sonnet rates for unknown models
- `tools[].function` struct with no `strict` field
- No JSON validity check on `tool_calls[].function.arguments` before dispatch
- No terminal-tool guardrail (model claims completion without push/escalate/defer)

## 1. `Content` with `omitempty` — silent 422 on tool-call turns

**Symptom**: Multi-turn tool flows work for the first call, then return HTTP 422 ("messages.N.content is required") on the second assistant-tool-call turn.

**Cause**: Your wireMessage struct has `Content string \`json:"content,omitempty"\``. On a tool-call-only assistant turn, `Content` is the empty string. `omitempty` drops the field. Moonshot, xAI, and others (but NOT all OpenAI-compatible servers) reject this shape with 422.

**Fix**:

```go
// wrong:
type chatMessage struct {
    Role      string     `json:"role"`
    Content   string     `json:"content,omitempty"`  // ← drops on empty
    ToolCalls []toolCall `json:"tool_calls,omitempty"`
}

// right:
type chatMessage struct {
    Role      string     `json:"role"`
    Content   string     `json:"content"`            // ← always present
    ToolCalls []toolCall `json:"tool_calls,omitempty"`
}
```

**Verify**: marshal a synthetic assistant tool-call message with empty content and inspect the JSON for the `"content": ""` field. If missing, you have the bug.

## 2. `Error` as string, not `json.RawMessage` — whole response decode fails

**Symptom**: A real provider error returns `parse response: invalid character '{' looking for beginning of string value` and the entire turn's response body is discarded. Caller sees "decode error", not the actual provider message.

**Cause**: Your wireResponse has `Error string \`json:"error,omitempty"\``. Most providers return error as a string field (`"error": "rate limited"`), but xAI and some others return it as a JSON object (`"error": {"message": "...", "code": "..."}`). Unmarshaling an object into a string crashes the whole `json.Unmarshal`.

**Fix**: Use `json.RawMessage` and decode in a helper:

```go
type wireResponse struct {
    Choices []wireChoice    `json:"choices"`
    Usage   wireUsage       `json:"usage"`
    Error   json.RawMessage `json:"error,omitempty"`  // ← was string
}

func decodeError(raw json.RawMessage) string {
    if len(raw) == 0 {
        return ""
    }
    // Try string first
    var s string
    if err := json.Unmarshal(raw, &s); err == nil {
        return s
    }
    // Then object
    var obj struct {
        Message string `json:"message"`
        Code    string `json:"code"`
        Type    string `json:"type"`
    }
    if err := json.Unmarshal(raw, &obj); err == nil {
        return obj.Message
    }
    return string(raw)
}
```

**Verify**: send a request that triggers a 4xx (e.g. bad model name). If your adapter returns "decode response: ..." instead of the provider's error message, you have the bug.

## 3. HTTP client timeout too low — reasoning chains truncate

**Symptom**: Calls to reasoning models (kimi-k2.6, deepseek-reasoner, grok-reasoning) randomly fail with `context deadline exceeded` after ~2 minutes, even though the model is making progress.

**Cause**: `&http.Client{Timeout: 2*time.Minute}`. Reasoning models routinely take 3-5 minutes on multi-step tool work. The HTTP client timeout cuts the connection mid-response, dropping all output.

**Fix**: Set the client timeout to at least 6 minutes (longer than any reasonable per-turn budget):

```go
http: &http.Client{
    Timeout: 6 * time.Minute,  // upper bound on a single LLM call
},
```

The per-turn budget should be enforced via `context.WithTimeout` at the call site, not the HTTP client.

**Verify**: trace a long-running call with `go tool trace` or `httptrace.ClientTrace`. If your connection terminates while the response body is still arriving, raise the timeout.

## 4. Retry-loop timeout multiplication — single turn can exceed budget by Nx

**Symptom**: A turn budget of 5 minutes blows up to 20 minutes in production. Logs show 3 retries on a 503, each with the full per-attempt timeout.

**Cause**: Retry loop with per-attempt timeout, no outer ctx bound:

```go
// wrong:
for attempt := 0; attempt < 3; attempt++ {
    ctx, cancel := context.WithTimeout(parentCtx, 5*time.Minute)
    defer cancel()
    resp, err := http.Do(req.WithContext(ctx))
    if isRetryable(err) {
        time.Sleep(backoff(attempt))
        continue
    }
    ...
}
```

Worst case: 3 × (5min timeout + 8s backoff) = 15min24s, even though the caller's parent context expired at 5 minutes.

**Fix**: Bind a single context at the call edge. Reuse it for every attempt. Make the backoff ctx-aware:

```go
turnCtx, cancel := context.WithTimeout(parentCtx, perTurnTimeout)
defer cancel()

for attempt := 0; attempt < maxAttempts; attempt++ {
    resp, err := http.Do(req.WithContext(turnCtx))
    if err == nil { return resp, nil }
    if !isRetryable(err) { return nil, err }

    select {
    case <-time.After(backoff(attempt)):
    case <-turnCtx.Done():
        return nil, turnCtx.Err()
    }
}
```

**Verify**: simulate a flaky provider (httptest server that returns 503 twice then 200) and measure end-to-end time. If it exceeds the parent ctx timeout, you have the bug.

## 5. Missing `reasoning_content` round-trip — multi-turn breaks at turn 2

**Symptom**: First turn with a reasoning model succeeds. Second turn returns HTTP 400: `"thinking is enabled but reasoning_content is missing in assistant tool call message at index N"`.

**Cause**: Reasoning models (Moonshot kimi, DeepSeek reasoner, others) emit a `reasoning_content` field on assistant messages in addition to `content`. On the next request, the provider requires that field echoed back verbatim in the conversation history. Most OpenAI-shape adapters strip it because the standard OpenAI client doesn't know about it.

**Fix**: capture `reasoning_content` on response and re-emit on every assistant message in subsequent requests:

```go
type wireMessage struct {
    Role             string         `json:"role"`
    Content          string         `json:"content"`
    ReasoningContent string         `json:"-"`  // not auto-serialized
    ToolCalls        []wireToolCall `json:"tool_calls,omitempty"`
}

// Custom MarshalJSON splices reasoning_content back in for assistant role:
func (m wireMessage) MarshalJSON() ([]byte, error) {
    type alias wireMessage
    raw, err := json.Marshal(alias(m))
    if err != nil { return nil, err }
    if m.Role != "assistant" { return raw, nil }
    var obj map[string]json.RawMessage
    json.Unmarshal(raw, &obj)
    rc, _ := json.Marshal(m.ReasoningContent)
    obj["reasoning_content"] = rc
    return json.Marshal(obj)
}
```

**Verify**: 3-turn tool-use trace against a reasoning model. If turn 2 returns 400 about reasoning_content, the round-trip is missing.

Per LiteLLM #26156 and Moonshot's own docs, this is required not optional. Self-asking the model "do I need this?" often returns wrong answers — verify against documentation and the actual 400 response.

## 6. Rate-card miscost — non-Anthropic models billed at Sonnet rates

**Symptom**: A pod's `hub_agent_budget` row shows 5-10× the actual provider invoice. Triggers false budget-exhaustion pauses.

**Cause**: A `rates` map keyed by model name with a fallback to Sonnet:

```go
// wrong:
var rates = map[string]ModelRates{
    "claude-sonnet-4-6": {3.00, 15.00, ...},
    "gemini-2.5-flash":  {0.075, 0.30, ...},
}

func cost(model string, in, out int) float64 {
    r, ok := rates[model]
    if !ok {
        r = rates["claude-sonnet-4-6"]  // ← silent over-bill
    }
    return float64(in)/1e6*r.Input + float64(out)/1e6*r.Output
}
```

Moonshot kimi-k2.6 at real $0.95/M input + $4.00/M output gets billed at $3.00/$15.00 — a ~5× over-bill. The tracker triggers DAILY_USD_CAP pauses on a fictional spend number.

**Fix**: add explicit entries for every model you serve. Real Moonshot rates:

```go
"kimi-k2.6":  {0.95, 4.00, 0, 0, 0.16},  // input, output, c5m, c1h, cache-read
"kimi-k2.5":  {0.60, 2.50, 0, 0, 0.15},
```

DeepSeek V4-Pro: `{1.74, 3.48, 0, 0, 0.0}` (or apply the 75% discount manually until expiration). DeepSeek V4-Flash: `{0.28, 1.10, 0, 0, 0.028}`.

**Verify**: compare 24h of `hub_agent_budget.cost_usd` against the provider's billing dashboard. If hub > provider by >2×, the fallback is firing.

## 7. `max_tokens` default too low — silent mid-reasoning truncation

**Symptom**: A reasoning model's responses are empty or truncated mid-sentence. `finish_reason: "length"` in the response. No error, no warning.

**Cause**: Two places this hits:

1. Adapter default at 2048 (legacy openai-default) or 4096
2. Pod config that explicitly sets `max_tokens: 2048` to "save tokens"

For reasoning models, `max_tokens` includes reasoning tokens. Reasoning routinely consumes 10-30K. At 2048, you get an empty `content` field while `completion_tokens == 2048` — the model spent the entire budget thinking.

**Fix**:

```go
// adapter side: default high
maxTokens := req.MaxTokens
if maxTokens == 0 || maxTokens < 16000 {
    maxTokens = 96000  // or model-appropriate ceiling
}
```

For pod configs: set `models.max_output_tokens: 32000` or higher. Document in the pod YAML: `# reasoning models share max_tokens with reasoning — must be ≥16K`.

**Verify**: send a complex reasoning prompt. If `finish_reason == "length"` AND `content == ""` AND `completion_tokens == max_tokens`, you've hit the silent truncation.

## 8. Missing `strict` field on function defs — JSON-arg loops poison sessions

**Symptom**: After ~10 turns of tool use, the model starts emitting malformed JSON in tool arguments (unterminated strings, trailing commas, unescaped control chars). The adapter retries the malformed call into the history, the next response is worse, the session is dead.

**Cause**: function definitions sent without `strict: true`. The provider performs only JSON-validity checks, not schema enforcement, so malformed-but-parseable args get accepted and propagated. Bad args in conversation history poison further generations.

**Fix**: opt into strict mode:

```go
type wireFunctionDef struct {
    Name        string `json:"name"`
    Description string `json:"description,omitempty"`
    Parameters  any    `json:"parameters"`
    Strict      bool   `json:"strict,omitempty"`  // ← set true for coding agents
}
```

Test the impact per provider. In our matrix work, kimi-k2.6 with `strict: false` dropped tier-3 pass rate from 2/3 to 1/3. DeepSeek showed neutral signal. Default `strict: true` unless data says otherwise.

## 9. No JSON-validity guard on `tool_calls[].function.arguments`

**Symptom**: Adapter panics with `json: cannot unmarshal ...` when dispatching a tool call. Or worse, the dispatcher receives malformed args and the tool errors with a cryptic message that doesn't surface to the model.

**Cause**: Direct dispatch:

```go
// wrong:
for _, tc := range msg.ToolCalls {
    result, err := dispatch(ctx, tc.Function.Name, tc.Function.Arguments)
    // ...
}
```

The model occasionally emits args like `{"path": "/tmp/file` (unterminated) or `{"path": "/tmp", }` (trailing comma). `dispatch` decodes and panics or returns an opaque error.

**Fix**: pre-validate, feed the error back as a tool result so the model can correct itself:

```go
for _, tc := range msg.ToolCalls {
    var probe any
    if err := json.Unmarshal([]byte(tc.Function.Arguments), &probe); err != nil {
        results = append(results, wireMessage{
            Role:       "tool",
            ToolCallID: tc.ID,
            Content:    fmt.Sprintf("error: tool args are not valid JSON: %v", err),
        })
        continue
    }
    result, err := dispatch(ctx, tc.Function.Name, tc.Function.Arguments)
    // ...
}
```

The model sees the validation error in the next turn and retries with valid JSON. This avoids the session-poison failure mode described in kimi-cli #1171.

## 10. No terminal-tool guardrail — model claims completion without success

**Symptom**: Backlog item moves to "completed" but no PR exists. Or the assistant emits `defer_to_human` AFTER successfully calling `push_branch` + `open_pr`. Or the response says "Done!" with no tool call.

**Cause**: The runtime trusts the model's claim of completion without verifying a terminal tool fired. Reasoning models in particular tend to call terminal tools (e.g. `push_branch`) successfully, then call `defer_to_human` as a kind of farewell signal, confusing the orchestrator.

**Fix**: provider-agnostic guard at the runtime call site:

```go
const (
    StatusOK         = "ok"
    StatusIncomplete = "incomplete"
)

// Whitelist of tools that end a turn successfully:
var terminalTools = map[string]bool{
    "push_branch":     true,
    "open_pr":         true,
    "escalate":        true,
    "defer_to_human":  true,
}

func resolveTurnStatus(toolCallsFired []string) string {
    for _, tool := range toolCallsFired {
        if terminalTools[tool] {
            return StatusOK
        }
    }
    return StatusIncomplete  // model claims done without terminal tool
}
```

If the model fires `push_branch` + `open_pr` + `defer_to_human`, count the first terminal hit and ignore the trailing call. Document the rule in the system prompt: "After `push_branch` and `open_pr`, end the response."

## How to use this checklist

For an existing adapter, grep each pattern. Fix the HIGH-severity ones (1, 2, 6) before any production traffic — they break silently. The MEDIUM ones (3, 4, 5, 7) fail under load or with reasoning models. The LOW ones (8, 9, 10) are defense-in-depth.

For a new adapter, copy the wireMessage / wireRequest / error-decoder structs from an existing audited adapter and adapt the URL + auth. Don't write from scratch — every adapter does the same thing, and the bugs are in the things-everyone-does layer.

## Common Mistakes

**Treating every provider as identical to OpenAI**. They're not. Moonshot rejects `temperature != 1.0` in thinking mode. xAI rejects `reasoning_effort` on grok-4.20-reasoning. DeepSeek requires `reasoning_content` echo. The OpenAI client library handles none of this.

**Trusting the model's self-report**. Asking "do you need reasoning_content echoed?" returns wrong answers — kimi-k2.6 self-reports `false`, reality is `true`. Verify against documentation, partner adapter source (Cline, RooCode, Continue.dev), and actual provider error responses.

**Conflating per-attempt and total timeouts**. A retry loop with per-attempt timeout has no upper bound on total time. Always pin the outer ctx.

**Skipping the rate-card audit**. Adapters that work fine functionally can over-bill by 5-10× because of a Sonnet fallback. The cost number in your tracker is wrong until you've added an explicit entry for every model you serve.

