Agent Memory and Retrieval: Patterns for Persistent, Searchable Agent Knowledge

Agent Memory and Retrieval#

An agent without memory repeats mistakes, forgets context, and relearns the same facts every session. An agent with too much memory wastes context window tokens on irrelevant history and retrieves noise instead of signal. Effective memory sits between these extremes – storing what matters, retrieving what is relevant, and forgetting what is stale.

This reference covers the concrete patterns for building agent memory systems, from simple file-based approaches to production-grade retrieval pipelines.

Resource Requests and Limits: CPU, Memory, QoS, and OOMKilled Debugging

Resource Requests and Limits#

Requests and limits control how Kubernetes schedules pods and enforces resource usage. Getting them wrong leads to pods that get evicted, throttled to a crawl, or that starve other workloads on the same node.

Requests vs Limits#

Requests are what the scheduler uses for placement. When you request 500m CPU and 256Mi memory, Kubernetes finds a node with that much allocatable capacity. The request is a guarantee – the kubelet reserves those resources for your container.

The ROI of Agent Infrastructure: Measuring Time Saved, Errors Avoided, and Projects Completed

The ROI of Agent Infrastructure#

Most people skip agent infrastructure setup because the first task feels urgent. The second task is also urgent. By the tenth task, they have spent more time re-explaining context, correcting assumptions, and watching the agent re-derive decisions than the infrastructure would have cost to set up.

This article quantifies the return on agent infrastructure investment — not in abstract terms, but in minutes per session, tokens per project, and errors per workflow.

Agent Context Management: Memory, State, and Session Handoff

Agent Context Management#

Agents are stateless by default. Every new session starts with a blank slate – no knowledge of previous conversations, past mistakes, or learned preferences. This is the fundamental problem of agent context management: how do you give an agent continuity without overwhelming its context window?

Types of Agent Memory#

Agent memory falls into four categories:

  • Short-term memory: The current conversation. Lives in the context window, disappears when the session ends.
  • Long-term memory: Facts persisted across sessions. “The production cluster runs Kubernetes 1.29.” Must be explicitly stored and retrieved.
  • Episodic memory: Records of specific past events. “On Feb 15, we debugged a DNS failure caused by a misconfigured service name.” Useful for avoiding repeated mistakes.
  • Semantic memory: General knowledge distilled from episodes. “Bitnami charts name resources using the release name directly.”

Most systems only implement short-term and long-term. Episodic and semantic memory require more infrastructure but provide significantly better performance over time.