---
title: "Two-Pass Analysis: The Summarize-Then-Correlate Pattern for Scaling Beyond Context Windows"
description: "Using a two-pass architecture to analyze codebases larger than any model's context window — fast small models summarize individual files, then a larger model correlates the summaries to answer cross-cutting questions."
url: https://agent-zone.ai/knowledge/agent-tooling/two-pass-analysis-pattern/
section: knowledge
date: 2026-02-22
categories: ["agent-tooling"]
tags: ["local-llm","two-pass","summarize-correlate","codebase-analysis","context-window","architecture-pattern"]
skills: ["multi-file-analysis","llm-orchestration","context-window-management"]
tools: ["ollama","python","qwen"]
levels: ["intermediate"]
word_count: 375
formats:
  json: https://agent-zone.ai/knowledge/agent-tooling/two-pass-analysis-pattern/index.json
  html: https://agent-zone.ai/knowledge/agent-tooling/two-pass-analysis-pattern/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Two-Pass+Analysis%3A+The+Summarize-Then-Correlate+Pattern+for+Scaling+Beyond+Context+Windows
---


# Two-Pass Analysis: Summarize-Then-Correlate

A 32B model with a 32K context window can process roughly 8-10 source files at once. A real codebase has hundreds. Concatenating everything into one prompt fails — the context overflows, quality degrades, and the model either truncates or hallucinates connections.

The two-pass pattern solves this by splitting analysis into two stages:

1. **Pass 1 (Summarize):** A fast 7B model reads each file independently and produces a focused summary.
2. **Pass 2 (Correlate):** A capable 32B model reads all summaries (which are much shorter than the original files) and answers the cross-cutting question.

This effectively multiplies your context window by the compression ratio of summarization — typically 10-20x. A 32K context that handles 10 files directly can handle 100-200 files through summaries.

## Architecture

```
Source Files (100+ files, 500K+ tokens total)
  │
  ├── file1.py ──→ 7B Model ──→ Summary (~200 tokens)
  ├── file2.py ──→ 7B Model ──→ Summary (~200 tokens)
  ├── file3.go ──→ 7B Model ──→ Summary (~200 tokens)
  │   ... (parallel, 3 workers)
  └── fileN.rs ──→ 7B Model ──→ Summary (~200 tokens)
  │
  │  Total summaries: ~20K tokens (fits in 32K context)
  │
  └──→ 32B Model + All Summaries + Question ──→ Analysis
```

## Implementation

### Pass 1: Parallel Summarization

```python
import ollama
import json
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path

SUMMARY_MODEL = "qwen2.5-coder:7b"
MAX_WORKERS = 3  # Ollama single-threads models; 3 workers avoids overwhelming it

PRESETS = {
    "architecture": {
        "focus": "dependencies, imports, data flow, coupling between components",
        "question": "How do the components of this codebase fit together?",
    },
    "security": {
        "focus": "input validation, authentication, secrets handling, error exposure",
        "question": "What security gaps exist in this codebase?",
    },
    "consistency": {
        "focus": "error handling patterns, naming conventions, code style",
        "question": "What inconsistencies exist across this codebase?",
    },
    "review": {
        "focus": "bugs, edge cases, unchecked assumptions, error handling",
        "question": "What bugs and issues exist in this codebase?",
    },
    "onboard": {
        "focus": "purpose, entry points, key abstractions, domain concepts",
        "question": "Explain this codebase to a new developer.",
    },
}

def summarize_file(filepath: str, preset: str) -> dict:
    """Summarize a single file using the 7B model."""
    content = Path(filepath).read_text()
    focus = PRESETS[preset]["focus"]

    prompt = f"""Summarize this source file with focus on: {focus}

Be specific. Reference function names, types, and concrete details.
Keep the summary under 300 words.

File: {filepath}

```
{content}
```"""

    response = ollama.chat(
        model=SUMMARY_MODEL,
        messages=[{"role": "user", "content": prompt}],
        options={"temperature": 0.0, "num_predict": 512},
    )

    return {
        "file": filepath,
        "summary": response["message"]["content"],
        "tokens": response.get("eval_count", 0),
    }


def summarize_all(files: list[str], preset: str) -> list[dict]:
    """Summarize all files in parallel."""
    summaries = []

    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        futures = {executor.submit(summarize_file, f, preset): f for f in files}

        for future in as_completed(futures):
            filepath = futures[future]
            try:
                result = future.result()
                summaries.append(result)
                print(f"  Summarized: {filepath} ({result['tokens']} tokens)")
            except Exception as e:
                print(f"  Failed: {filepath}: {e}")

    return sorted(summaries, key=lambda s: s["file"])
```

### Pass 2: Correlation

```python
CORRELATE_MODEL = "qwen2.5-coder:32b"

def correlate(summaries: list[dict], preset: str) -> str:
    """Correlate all summaries to answer the cross-cutting question."""
    question = PRESETS[preset]["question"]

    summary_text = "\n\n".join(
        f"### {s['file']}\n{s['summary']}" for s in summaries
    )

    prompt = f"""You are analyzing a codebase. Below are summaries of each file.

{summary_text}

Based on these summaries, answer this question:
{question}

Reference specific file names when making observations.
Organize your response by theme, not by file."""

    response = ollama.chat(
        model=CORRELATE_MODEL,
        messages=[{"role": "user", "content": prompt}],
        options={"temperature": 0.1, "num_predict": 4096},
    )

    return response["message"]["content"]
```

### Full Pipeline

```python
def analyze_codebase(directory: str, preset: str = "architecture"):
    """Run the full two-pass analysis."""
    # Discover source files
    extensions = {".py", ".go", ".rs", ".ts", ".js", ".java"}
    files = [
        str(p) for p in Path(directory).rglob("*")
        if p.suffix in extensions and "vendor" not in str(p) and "node_modules" not in str(p)
    ]

    print(f"Found {len(files)} files. Preset: {preset}")

    # Pass 1: Summarize
    print("\n--- Pass 1: Summarizing files ---")
    summaries = summarize_all(files, preset)

    # Pass 2: Correlate
    print("\n--- Pass 2: Correlating summaries ---")
    analysis = correlate(summaries, preset)

    return analysis
```

## Caching Summaries

Summarization is the expensive step (many API calls). Cache summaries and reuse them across different questions:

```python
import hashlib

CACHE_DIR = Path.home() / ".cache" / "codebase-analysis"

def file_hash(filepath: str) -> str:
    """Hash based on path + mtime + size for change detection."""
    stat = Path(filepath).stat()
    key = f"{filepath}:{stat.st_mtime}:{stat.st_size}"
    return hashlib.sha256(key.encode()).hexdigest()[:16]

def load_cached_summaries(files: list[str], preset: str) -> tuple[list[dict], list[str]]:
    """Load cached summaries and return list of files needing summarization."""
    cache_file = CACHE_DIR / f"{preset}_summaries.json"
    cached = {}

    if cache_file.exists():
        cached = {s["file"]: s for s in json.loads(cache_file.read_text())}

    hit = []
    miss = []

    for f in files:
        fhash = file_hash(f)
        if f in cached and cached[f].get("hash") == fhash:
            hit.append(cached[f])
        else:
            miss.append(f)

    return hit, miss

def save_summaries(summaries: list[dict], preset: str):
    """Save summaries to cache."""
    CACHE_DIR.mkdir(parents=True, exist_ok=True)
    cache_file = CACHE_DIR / f"{preset}_summaries.json"

    # Add file hashes
    for s in summaries:
        s["hash"] = file_hash(s["file"])

    cache_file.write_text(json.dumps(summaries, indent=2))
```

With caching, the first analysis of a 100-file codebase takes 5-10 minutes. Subsequent analyses with different questions (but the same files) reuse the cached summaries and only run the correlation step — a single 32B call that takes 30-60 seconds.

## Presets as Reusable Workflows

Presets let you analyze the same codebase from different angles without rewriting prompts:

```bash
# Architecture overview
python analyze.py ~/projects/my-app --preset architecture

# Security review
python analyze.py ~/projects/my-app --preset security

# Onboarding guide
python analyze.py ~/projects/my-app --preset onboard
```

Each preset changes the summarization focus (what the 7B model looks for in each file) and the correlation question (what the 32B model synthesizes from the summaries).

**Adding a new preset is a one-line change** — define the focus and question. The two-pass infrastructure handles the rest.

## When Two-Pass Breaks Down

The pattern has limits:

- **Summarization is lossy.** The 7B model may miss subtle details that matter for the correlation question. If you get suspicious results, spot-check a few summaries against the original files.
- **Cross-file dependencies at the token level.** If two files share a specific variable name or magic constant that only matters in combination, the summarizer may not preserve that detail. Targeted extraction (asking for specific fields) helps.
- **Very large files.** A single file that exceeds the 7B model's context window needs to be chunked before summarization. Split at function or class boundaries.
- **Real-time analysis.** The parallel summarization step takes minutes for large codebases. This is a batch pattern, not an interactive one.

For these cases, consider RAG (semantic search over the codebase) or targeted extraction (pulling specific structured data from each file instead of free-form summaries).

## Common Mistakes

1. **Using too many parallel workers.** Ollama runs one inference at a time per model. More than 3 workers creates a queue that does not improve throughput but increases memory pressure. Measure actual parallelism before increasing workers.
2. **Not caching summaries.** Re-summarizing 100 files every time you change the correlation question wastes 90% of the work. Cache summaries and invalidate only when files change.
3. **Summarizing with the same model used for correlation.** The point of two passes is using a fast, cheap model for the N-file summarization and a capable model for the single correlation. Using 32B for both is N times slower with no benefit.
4. **Asking the summarizer to answer the question.** The summarizer should capture relevant facts, not draw conclusions. Conclusions from a 7B model analyzing a single file are unreliable. Let the 32B model draw conclusions from the full picture.
5. **Not validating summaries on a sample.** Before trusting a 100-file analysis, read 3-5 summaries and compare them to the original files. If the summaries miss important details, adjust the preset focus or switch to a more capable summarization model.