---
title: "The Five-Agent Research Pattern: Surveying a New LLM Provider Before You Tune It"
description: "Spawn 5 parallel research sub-agents across distinct sources before drafting a tuning matrix or adoption decision for a new LLM provider. Catches the bugs docs omit, encodes lived experience from partner adapters, and surfaces open contradictions worth turning into matrix cells."
url: https://agent-zone.ai/knowledge/agent-tooling/five-agent-research-pattern/
section: knowledge
date: 2026-05-20
categories: ["agent-tooling"]
tags: ["llm-research","provider-evaluation","sub-agents","parallel-agents","tuning","adoption"]
skills: ["agent-orchestration","llm-evaluation"]
tools: ["claude-code","web-search","github"]
levels: ["intermediate","advanced"]
word_count: 1356
formats:
  json: https://agent-zone.ai/knowledge/agent-tooling/five-agent-research-pattern/index.json
  html: https://agent-zone.ai/knowledge/agent-tooling/five-agent-research-pattern/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=The+Five-Agent+Research+Pattern%3A+Surveying+a+New+LLM+Provider+Before+You+Tune+It
---


# The Five-Agent Research Pattern

Adopting a new LLM provider for a coding-agent role looks easy from the docs. Read the model card, copy the partner adapter's defaults, ship. A week later you find out the provider rejects `tool_choice=required` in thinking mode, the docs lied about `reasoning_content` echoing, and your retry loop multiplies the per-turn timeout by 3x because the rate-limit response isn't JSON.

The docs miss what was patched after release. The community catches what the docs miss. Partner adapters encode lived defaults nobody published. Your own adapter has bugs you can't see from inside it. Reading any one of these in isolation gets you to "I think I understand this provider." Reading all five in parallel gets you a knob list, an open-contradictions list, and a list of bugs to fix before the matrix runs. The pattern: spawn 5 parallel research sub-agents, one per angle, then synthesize.

## TL;DR — what the pattern produces

- Five structured reports, each from a distinct source angle
- One synthesis doc consolidating the knob list and open contradictions
- A list of adapter bugs to fix BEFORE matrix runs
- A list of constants the matrix should NOT vary (API-forced settings)
- Open contradictions become matrix cells (see [ofat-matrix-llm-tuning.md](ofat-matrix-llm-tuning.md))
- Cost: ~$0 on Max subscription, $2-5 on API. Worth it for any provider you'll commit to.

## Problem

Practitioners read the docs and skip the rest. Three failure modes:

- **Docs lie by omission.** Moonshot's kimi-k2.6 docs don't mention `reasoning_content` must echo back on every subsequent request. LiteLLM #26156 documents it; the docs don't. Without the partner-adapter source you ship a 400-error multi-turn flow.
- **You ship API-forced defaults as if they were choices.** Kimi thinking mode rejects `temperature != 1.0`, `top_p != 0.95`. The docs phrase these as recommendations; partner-adapter source hard-codes them. The matrix should NOT vary them.
- **You re-derive bugs everyone else fixed.** Cline #10544 documents the temperature=1 rejection. xAI's HTML-rate-limit response is in 3+ GitHub issues. The 5-agent sweep finds these in 20 minutes; trial-and-error finds them over 2 weeks of production failures.

## The five angles

Each agent gets one angle, scoped tightly, with explicit deliverables. Run them in parallel.

### 1. Official provider docs

Model card, API reference, parameters table, rate card, region notes. Endpoint URL(s), auth shape, supported request fields. Mode-specific constraints (thinking, reasoning, vision). Pricing including cache rates, batch rates, discount expirations. Anything labeled "beta", "experimental", "deprecated".

Deliverable: a flat table of every documented parameter, with the provider's recommended value, allowed range, and any mode-specific constraint.

### 2. Partner adapter source

- Cline (`src/shared/api.ts`, `src/core/api/providers/`)
- RooCode (`packages/types/src/providers/`)
- Continue.dev (model registry + provider configs)
- LiteLLM (`litellm/llms/<provider>/`)
- Aider (`model-settings.yml` + provider modules)

What they configure for this provider specifically. Their defaults reveal lived experience. If three partner adapters all set `max_tokens >= 16000` for kimi, that signal is stronger than the docs.

Deliverable: a comparison table of how each partner adapter configures the provider. Columns: temperature, max_tokens, tool_choice, strict, custom retry, error-handling quirks.

### 3. Community signals

GitHub issues across the partner adapters (LiteLLM, Cline, RooCode, Aider, Continue.dev). Hacker News, Reddit r/LocalLLaMA, X. Provider's own Discord / forum. Tag each finding `[strong]` (3+ independent reports) or `[weak]` (1-2 reports).

Deliverable: a list of community-reported quirks, each tagged with strength and source URL.

### 4. Production data (if you have any)

Your own DB / metrics for prior runs of this provider or sibling providers. Per-task cost, defer rate, REQUIRED-FIX rate, completion rate. Sibling-provider data counts — kimi failure modes often overlap with grok.

Deliverable: a table of metrics for your existing usage of this provider (or the closest sibling), with workload-shape context.

### 5. Adapter audit

Your existing adapter code for this provider, compared against patterns from prior provider integrations (see [llm-adapter-audit-checklist.md](llm-adapter-audit-checklist.md)). Especially error-decode shape, Content-omitempty, retry-timeout multiplication, rate-card fallback.

Deliverable: a severity-tagged list of adapter bugs (HIGH / MEDIUM / LOW), each with file:line citations and a fix recipe.

### Plus a sixth: self-ask

Call the provider directly with a structured "describe your own quirks" prompt. Useful for brainstorming, NOT ground truth — self-ask is sample size 1. The kimi self-ask said `reasoning_content_must_echo_back: false` — wrong; LiteLLM #26156 and three partner adapters say `true`. Verify every self-ask claim against the other 5 angles before acting.

## A sample sub-agent prompt template

Scope each sub-agent tightly. One angle. Explicit deliverables. Source citations required.

```
You are research sub-agent N of 5 for the kimi-k2.6 adoption decision.

YOUR ANGLE: Partner adapter source.

GATHER: How do Cline, RooCode, Continue.dev, LiteLLM, and Aider
configure kimi-k2.6 specifically? Read their source code (not their
docs).

For each adapter, report:
- File path + commit SHA + line numbers for the provider config
- Hard-coded request defaults (temperature, max_tokens, tool_choice,
  strict, top_p, presence_penalty, anything else)
- Mode-specific overrides (thinking vs non-thinking)
- Custom retry / error-handling logic
- Any comments in source explaining a quirk

DELIVERABLE: Markdown table with columns: adapter | file:line |
temperature | max_tokens | tool_choice | strict | custom retry |
notes. One row per adapter. Cite every value's source.

DO NOT: read the provider's docs (that's another agent's angle).
DO NOT: read community forum posts (also another agent).
DO NOT: speculate about why a value is what it is. Just report.

OUTPUT: structured markdown report, max 800 words. Save to
~/projects/dream-team/planning/kimi-research-partner-adapters.md.
```

Run all 5 (plus self-ask) in parallel. On Claude Code, this is a single message with parallel tool calls. The longest sub-agent typically finishes in 5-10 minutes; the slowest one is the bottleneck.

## Why five

Each angle catches a different bug class:

| Angle | Catches |
|---|---|
| Docs | Documented parameters, pricing, endpoints |
| Partner adapter source | API-forced settings, mode constraints, sane defaults |
| Community signals | Active rough edges, recent regressions, version-specific bugs |
| Production data | What your workload actually does vs what it should |
| Adapter audit | Bugs in YOUR code (rate-card, error-decode, retry math) |
| Self-ask (optional) | Brainstorming + model's self-reported quirks |

Three angles miss things; two is just docs + community, the default failure mode. Ten is diminishing returns — angles 6-10 mostly overlap with 1-5.

## The synthesis step

After all 5 (or 6) reports land, write one synthesis doc with this structure:

1. **TL;DR** — production reality, dominant failure mode, adapter bug count, matrix surface size
2. **Production data table** — your existing metrics for this provider
3. **Convergent constants** — settings ALL sources agree on. These become matrix "held constants" — DO NOT vary them
4. **Open contradictions** — settings where sources disagree. These become matrix cells
5. **Adapter audit summary** — bugs found in your own code, severity-tagged. Fix the HIGHs before matrix runs
6. **Matrix design** — cell list derived from #4, held constants from #3 documented as excluded
7. **References** — source URLs for every claim

The synthesis doc is the input to the OFAT matrix. Without it, matrix cells are chosen by hunch and the held-constants list is implicit.

## Trade-offs

5 parallel agents cost ~$0 on a Max subscription, $2-5 on API. Wallclock ~10-20 minutes; the slowest agent (usually partner-adapter source, which reads multiple repos) is the bottleneck. Worth it for any provider you'll commit to. Skip for one-off experiments you'll throw away.

The cost of NOT doing it: a week of production debugging chasing a multi-turn 400 you could have found in LiteLLM issue #26156, or a tuning matrix that varies API-forced parameters and produces no signal.

## Common Mistakes

**Running them sequentially.** Lose the parallel speedup — 5 ten-minute agents take 50 minutes in series, 10 in parallel. Send all five as a single message with parallel tool calls.

**Asking each agent for "everything."** Each angle should produce one structured deliverable. "Tell me about kimi-k2.6" gets five overlapping wikipedia essays. "Read Cline source and report a markdown table of hard-coded request defaults" gets a synthesis input.

**Not requiring source citations.** Claims rot. "Cline sets max_tokens=16000" without a file:line+SHA is unverifiable in 2 weeks. Every claim cites a source URL or file:line.

**Trusting self-ask as ground truth.** Self-ask is brainstorming, not a source. Every self-ask claim must be confirmed by at least one of the other 4 angles before acting. The kimi self-ask was wrong about reasoning_content; trusting it would have shipped a 400-error multi-turn flow.

**Not writing the synthesis doc.** Five reports without synthesis is five documents nobody reads again. The synthesis is the artifact; the reports are inputs.