Pagerduty

Status Page Setup and Management

February 22, 2026

Sre

Status-Page-Setup, Component-Organization, Incident-Template-Design, Maintenance-Window-Scheduling, Uptime-Reporting

Status-Page, Statuspage-Io, Cachet, Instatus, Uptime, Incident-Communication, Maintenance-Windows, Subscriber-Notifications

Statuspage-Io, Cachet, Instatus, Prometheus, Grafana, Pagerduty, Slack

Purpose of a Status Page#

A status page is the single source of truth for service health. It communicates current status, provides historical reliability data, and sets expectations during incidents through regular updates. A well-maintained status page reduces support tickets during incidents, builds customer trust, and gives teams a structured communication channel.

Platform Options#

Statuspage.io (Atlassian)#

The most widely adopted hosted solution. Integrates with the Atlassian ecosystem.

# Create a component
curl -X POST https://api.statuspage.io/v1/pages/${PAGE_ID}/components \
  -H "Authorization: OAuth ${API_KEY}" \
  -d '{"component": {"name": "API", "status": "operational", "showcase": true}}'

# Create an incident
curl -X POST https://api.statuspage.io/v1/pages/${PAGE_ID}/incidents \
  -H "Authorization: OAuth ${API_KEY}" \
  -d '{"incident": {"name": "Elevated Error Rates", "status": "investigating",
       "impact_override": "minor", "component_ids": ["id"]}}'

Strengths: Highly reliable, subscriber notifications built-in, custom domains, API-first. Weaknesses: Expensive ($399+/month business plan), limited customization, component limits on lower tiers.

Structuring Effective On-Call Runbooks: Format, Escalation, and Diagnostic Decision Trees

February 22, 2026

Observability

Intermediate

Runbook-Authoring, Escalation-Design, Incident-Triage, Diagnostic-Decision-Trees

Runbooks, On-Call, Incident-Response, Escalation, Alerting, Operations, Sre, Pagerduty, Opsgenie

Alertmanager, Pagerduty, Opsgenie, Grafana, Prometheus, Kubectl

Why Runbooks Exist#

An on-call engineer paged at 3 AM has limited cognitive capacity. They may not be familiar with the specific service that is failing. They may have joined the team two weeks ago. A runbook bridges the gap between the alert firing and the correct human response. Without runbooks, incident response depends on tribal knowledge – the engineer who built the service and knows its failure modes. That engineer is on vacation when the incident hits.