Observability Stack Troubleshooting: Diagnosing Prometheus, Alertmanager, Grafana, and Pipeline Failures

“I’m Not Seeing Metrics” – Systematic Diagnosis#

This is the most common observability complaint. Work through these steps in order to isolate where the pipeline breaks.

Step 1: Is the Target Being Scraped?#

Open the Prometheus UI at /targets. Search for the job name or target address. Look at three things: state (UP or DOWN), last scrape timestamp, and error message.

Status: UP    Last Scrape: 3s ago    Duration: 12ms    Error: (none)
Status: DOWN  Last Scrape: 15s ago   Duration: 0ms     Error: connection refused

If the target does not appear at all, Prometheus does not know about it. This means the scrape configuration (or ServiceMonitor) is not matching the target. Jump to the ServiceMonitor checklist at the end of this guide.