On-Call Rotation Design

Sre

On-Call Is a System, Not a Schedule#

On-call done wrong burns out engineers and degrades reliability simultaneously. Exhausted responders make worse decisions, and teams that dread on-call avoid owning production systems. Done right, on-call is sustainable, well-compensated, and generates signal that drives real reliability improvements.

Rotation Schedule Types#

Weekly Rotation#

Each engineer is primary on-call for one full week, Monday to Monday. This is the simplest model and works for teams of 5 or more in a single timezone.

Secrets Rotation Patterns

Why Rotation Matters#

A credential that never changes is a credential waiting to be exploited. Leaked credentials appear in git history, log files, CI build outputs, developer laptops, and third-party SaaS tools. If a database password has been the same for two years, every person who has ever had access to it still has access – former employees, former contractors, compromised CI systems.

Regular rotation limits the blast radius. A credential that rotates every 24 hours is only useful for 24 hours after compromise. Compliance frameworks (PCI-DSS, SOC2, HIPAA) mandate rotation schedules. But compliance aside, rotation is a pragmatic defense: assume credentials will leak and make the leak time-limited.