Debugging and Tuning Alerts: Why Alerts Don't Fire, False Positives, and Threshold Selection

When an Alert Should Fire but Does Not#

Silent alerts are the most dangerous failure mode in monitoring. The system appears healthy because no one is being paged, but the condition you intended to catch is actively occurring. Work through this checklist in order.

Step 1: Verify the Expression Returns Results#

Open the Prometheus UI at /graph and run the alert expression directly. If the expression returns empty, the alert cannot fire regardless of anything else.