Agent Error Handling#
Agents call tools that call APIs that talk to services that query databases. Every link in that chain can fail. The difference between a useful agent and a frustrating one is what happens when something breaks.
Classify the Failure First#
Before deciding how to handle an error, classify it. The strategy depends entirely on whether the failure is transient or permanent.
Transient failures will likely succeed on retry: network timeouts, rate limits (HTTP 429), server overload (HTTP 503), connection resets, temporary DNS failures. These are the majority of failures in practice.