DeepSeek V4 Operational Quirks: Pro vs Flash, Reasoning Echo, and the Discount Cliff

May 20, 2026

Llm-Adapter-Development, Provider-Integration, Cost-Modeling

Deepseek, Deepseek-V4, Llm-Quirks, Reasoning-Models, Openai-Compatible, Production, Cost-Modeling

Deepseek, Deepseek-V4-Pro, Deepseek-V4-Flash, Go

DeepSeek V4 Operational Quirks#

DeepSeek V4 ships two models behind one OpenAI-compatible API: V4-Pro (reasoning) at $1.74/M input / $3.48/M output and V4-Flash (chat) at $0.28/M input / $1.10/M output. Until 2026-05-31 V4-Pro carries a 75% discount, putting it at $0.435/M input — cheap enough to use as a heavy-tier coding model. After that, the cost steps up 4×.

The two models live on the same endpoint but want very different things. V4-Pro behaves like a reasoning model (thin prompts, reasoning_content echo required, tool_choice restrictions). V4-Flash behaves like a chat model (rich prompts win dramatically; rejects nothing). Confuse them and your matrix lights up red.

Moonshot Kimi K2.6 Operational Quirks: What Breaks in Production

May 20, 2026

Agent-Tooling

Intermediate, Advanced

Llm-Adapter-Development, Provider-Integration, Production-Debugging

Moonshot, Kimi, Kimi-K2, Llm-Quirks, Reasoning-Models, Openai-Compatible, Production, Thinking-Mode

Moonshot, Kimi-K2.6, Go

Moonshot Kimi K2.6 Operational Quirks#

Kimi K2.6 is one of the cheapest competent reasoning models — $0.95/M input cache-miss, $0.16/M cache-hit, $4.00/M output, 256K context. It is also one of the most opinionated. Half of what works on OpenAI breaks here, and the failures are silent: empty content, mid-reasoning truncation, 400 errors that don’t mention the actual problem, and a cache key parameter that makes cost go up instead of down.

Reasoning-Model Tuning Asymmetry: Why Thin Prompts Beat Rich Prompts (and When They Don't)

May 20, 2026

Agent-Tooling

Intermediate, Advanced

Prompt-Engineering, Model-Evaluation, Ab-Testing

Prompt-Engineering, Reasoning-Models, Kimi, Deepseek, Grok, Sonnet, Ofat, Tuning

Go, Moonshot, Deepseek, Xai

Reasoning-Model Tuning Asymmetry#

Practitioners assume “better prompt = better output”. For one model class, that assumption is correct. For the other, the same prompt makes things measurably worse. This article documents the asymmetry, names the dividing line, and gives you a 4-cell test to confirm it on your own canary before you commit to a prompt.

The asymmetry is empirical, not theoretical. It shows up cleanly across four independent OFAT (one-factor-at-a-time) matrices run between 2026-05-18 and 2026-05-20: sonnet POC, grok matrix v1+v2, deepseek matrix v1, kimi matrix v1.

xAI Grok Operational Quirks: Error Shapes, Rate-Limit HTML, and Per-Model Tool Surfaces

May 20, 2026

Agent-Tooling

Intermediate, Advanced

Llm-Adapter-Development, Provider-Integration, Production-Debugging

Xai, Grok, Grok-4, Llm-Quirks, Openai-Compatible, Production, Reasoning-Models

Xai, Grok-4.3, Grok-4.20-Reasoning, Go

xAI Grok Operational Quirks#

xAI’s Grok API is OpenAI-compatible on paper. In practice it has more wire-format edge cases than any other provider in production: error responses change shape, rate-limit pages come back as HTML, assistant turns reject missing fields with HTTP 422, and the two flagship models (grok-4.3 and grok-4.20-reasoning) have incompatible parameter sets. Wrap it carelessly and the adapter crashes the conversation mid-turn.

This page is the production-confirmed quirks list, each as Symptom → Cause → Fix → Verify. Numbers come from two OFAT matrix runs (15 cells × N=3 baseline, 3 cells × N=5 validation) on api.x.ai and the heavy-tier POC. Full synthesis: ~/.claude/projects/-Users-mstather/memory/project_xai_adapter_wireerror_bug_2026_05_19.md and project_grok_matrix_v1_2026_05_19.md.