Structured Output from Small Local Models: JSON Mode, Extraction, Classification, and Token Runaway Fixes

February 22, 2026

Structured-Extraction, Json-Output-Engineering, Classification-Pipeline, Output-Scoring

Local-Llm, Structured-Output, Json-Mode, Extraction, Classification, Function-Calling, Ollama

Ollama, Qwen, Ministral, Python, Go

Structured Output from Small Local Models#

Small models (2-7B parameters) produce structured output that is 85-95% as accurate as cloud APIs for well-defined extraction and classification tasks. The key is constraining the output space so the model’s limited reasoning capacity is focused on filling fields rather than deciding what to generate.

This is where local models genuinely compete with — and sometimes match — models 30x their size.

JSON Mode#

Ollama’s JSON mode forces the model to produce valid JSON:

Structured Output Patterns: Getting Reliable JSON from LLMs

February 22, 2026

Agent-Tooling

Intermediate

Output-Parsing, Schema-Design

Structured-Output, Json, Schema-Validation, Function-Calling

Python, Typescript, Json-Schema

Structured Output Patterns#

Agents need structured data from LLMs – not free-form text with JSON somewhere inside it. When an agent asks a model to classify a bug as critical/medium/low and gets back a paragraph explaining the classification, the agent cannot act on it programmatically. Structured output is the bridge between LLM reasoning and deterministic code.

Three Approaches#

JSON Mode#

The simplest approach. Tell the API to return valid JSON and describe the shape you want in the prompt.