---
title: "Agent Context Management: Memory, State, and Session Handoff"
description: "How agents maintain context across sessions — memory patterns, context window prioritization, and approaches to persistent state."
url: https://agent-zone.ai/knowledge/agent-tooling/agent-context-management/
section: knowledge
date: 2026-02-21
categories: ["agent-tooling"]
tags: ["context","memory","state","sessions"]
skills: ["context-management","agent-memory-design"]
tools: ["vector-databases","key-value-stores"]
levels: ["intermediate"]
word_count: 920
formats:
  json: https://agent-zone.ai/knowledge/agent-tooling/agent-context-management/index.json
  html: https://agent-zone.ai/knowledge/agent-tooling/agent-context-management/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Agent+Context+Management%3A+Memory%2C+State%2C+and+Session+Handoff
---


# Agent Context Management

Agents are stateless by default. Every new session starts with a blank slate -- no knowledge of previous conversations, past mistakes, or learned preferences. This is the fundamental problem of agent context management: how do you give an agent continuity without overwhelming its context window?

## Types of Agent Memory

Agent memory falls into four categories:

- **Short-term memory**: The current conversation. Lives in the context window, disappears when the session ends.
- **Long-term memory**: Facts persisted across sessions. "The production cluster runs Kubernetes 1.29." Must be explicitly stored and retrieved.
- **Episodic memory**: Records of specific past events. "On Feb 15, we debugged a DNS failure caused by a misconfigured service name." Useful for avoiding repeated mistakes.
- **Semantic memory**: General knowledge distilled from episodes. "Bitnami charts name resources using the release name directly."

Most systems only implement short-term and long-term. Episodic and semantic memory require more infrastructure but provide significantly better performance over time.

## File-Based Memory: The MEMORY.md Pattern

The simplest memory system is a markdown file the agent reads at session start and updates as it learns. Claude Code uses this approach with `MEMORY.md` files stored in `~/.claude/` and project-level `.claude/` directories.

```markdown
# Memory

## Project: API Service
- Framework: FastAPI with SQLAlchemy
- Database: PostgreSQL 15 on RDS
- Deploy: ECS Fargate, Terraform-managed
- Tests: pytest, run with `make test`
- The health check endpoint is /api/v1/health, NOT /health

## Preferences
- Always use absolute imports
- Error responses follow RFC 7807 (Problem Details)
- Never add print statements; use structured logging with structlog
```

This pattern has real advantages. The memory is human-readable, version-controlled, and easy to audit. The agent reads it at the start of each session and has immediate context about the project.

The downside is scale. A `MEMORY.md` file works for tens of facts. At hundreds, it becomes noisy. The agent spends context window tokens reading things that are not relevant to the current task.

## Key-Value Memory

For structured facts, a key-value store scales better than free-form text. Each fact gets a key for direct retrieval:

```json
{
  "project.framework": "FastAPI",
  "project.database.type": "PostgreSQL",
  "project.database.version": "15",
  "project.deploy.platform": "ECS Fargate",
  "user.preference.imports": "absolute",
  "user.preference.error_format": "RFC 7807"
}
```

The agent queries for specific keys (`project.database.*`) instead of reading everything. The tradeoff: you lose the narrative context that makes `MEMORY.md` easy for humans to maintain.

## Vector/Embedding Memory

When the agent needs relevant past context without knowing the exact key, vector search works. Past interactions are embedded and stored in a vector database. At query time, the agent embeds the current task and retrieves the most similar entries:

```
Current task: "Fix the database connection timeout in production"

Retrieved memories (by similarity):
1. "Feb 15: Production DB connections exhausted — pool_size was 5, increased to 20"
2. "Jan 28: SQLAlchemy pool_pre_ping=True prevents stale connections after RDS maintenance"
3. "Connection string format: postgresql+asyncpg://user:pass@host:5432/dbname"
```

This is RAG applied to agent memory. It scales to thousands of entries but requires an embedding model and a vector store.

## Context Window Management

Every agent has a finite context window. When memory exceeds it, prioritize:

1. **System instructions** -- always included, non-negotiable
2. **Current task context** -- the user's request and referenced files
3. **Active working state** -- tool results, intermediate outputs from this session
4. **Retrieved long-term memory** -- most relevant persisted facts
5. **Recent session history** -- what happened earlier in this conversation
6. **Background knowledge** -- general project info that might be useful

Trim from the bottom up. Background knowledge goes first. Old conversation turns get summarized or dropped. Retrieved memories get capped at the top-k most relevant.

## Session Handoff

When one agent session ends and another begins -- or when one agent hands off to a different agent -- context must transfer. Three patterns work:

**Structured summary**: The outgoing agent writes a summary of what it did, what it learned, and what remains. The incoming agent reads this as its starting context.

```markdown
## Session Summary (2026-02-21 14:30)
- Task: Debug production connection timeouts
- Root cause: Connection pool exhausted (pool_size=5, max_overflow=0)
- Fix applied: Updated pool_size to 20, max_overflow to 10 in terraform/modules/rds/variables.tf
- Remaining: Deploy change to production (terraform apply pending approval)
```

**Shared state file**: Both agents read and write to a common state file. Works for multi-agent systems where agents collaborate asynchronously.

**Message passing**: The outgoing agent sends a structured message with task state, decisions made, and open questions. This is the pattern used in multi-agent frameworks with explicit handoff protocols.

## Memory Decay

Not everything should be remembered forever. A debugging session from six months ago is rarely relevant. Memory decay prevents noise from accumulating.

- **TTL (Time-to-Live)**: Memories expire after a set period. Episodic memories might expire after 30 days. Project facts persist indefinitely.
- **Relevance scoring**: Track how often a memory is retrieved. Memories that are never accessed decay in priority.
- **Explicit pruning**: Periodically review stored memories and remove outdated ones. "The staging cluster runs Kubernetes 1.27" is wrong if you upgraded to 1.29.

## Common Mistakes

**Storing too much**: Every conversation turn, every tool result, every file read. The memory fills with noise and retrieval degrades. Store conclusions and decisions, not raw data.

**Storing too little**: Only keeping what the user explicitly asks to remember. The agent misses learnable patterns -- recurring errors, preferred approaches, project conventions.

**No organization**: Dumping everything into a flat list. Without categories or keys, retrieval becomes a search through noise. Structure your memory from the start, even if it is just section headers in a markdown file.

**Ignoring privacy**: Storing API keys or personal information in plain-text memory. Enforce rules about what is never persisted programmatically, not by convention.

