---
title: "Status Page Setup and Management"
description: "Setting up and managing status pages with component organization, incident templates, maintenance windows, subscriber notifications, uptime calculation, and monitoring integration."
url: https://agent-zone.ai/knowledge/sre/status-page-management/
section: knowledge
date: 2026-02-22
categories: ["sre"]
tags: ["status-page","statuspage-io","cachet","instatus","uptime","incident-communication","maintenance-windows","subscriber-notifications"]
skills: ["status-page-setup","component-organization","incident-template-design","maintenance-window-scheduling","uptime-reporting"]
tools: ["statuspage-io","cachet","instatus","prometheus","grafana","pagerduty","slack"]
levels: ["beginner","intermediate"]
word_count: 889
formats:
  json: https://agent-zone.ai/knowledge/sre/status-page-management/index.json
  html: https://agent-zone.ai/knowledge/sre/status-page-management/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Status+Page+Setup+and+Management
---


## Purpose of a Status Page

A status page is the single source of truth for service health. It communicates current status, provides historical reliability data, and sets expectations during incidents through regular updates. A well-maintained status page reduces support tickets during incidents, builds customer trust, and gives teams a structured communication channel.

## Platform Options

### Statuspage.io (Atlassian)

The most widely adopted hosted solution. Integrates with the Atlassian ecosystem.

```bash
# Create a component
curl -X POST https://api.statuspage.io/v1/pages/${PAGE_ID}/components \
  -H "Authorization: OAuth ${API_KEY}" \
  -d '{"component": {"name": "API", "status": "operational", "showcase": true}}'

# Create an incident
curl -X POST https://api.statuspage.io/v1/pages/${PAGE_ID}/incidents \
  -H "Authorization: OAuth ${API_KEY}" \
  -d '{"incident": {"name": "Elevated Error Rates", "status": "investigating",
       "impact_override": "minor", "component_ids": ["id"]}}'
```

**Strengths:** Highly reliable, subscriber notifications built-in, custom domains, API-first.
**Weaknesses:** Expensive ($399+/month business plan), limited customization, component limits on lower tiers.

### Cachet

Open-source, self-hosted status page in PHP/Laravel.

```bash
docker run -d --name cachet -p 8000:8000 \
  -e DB_DRIVER=pgsql -e DB_HOST=postgres \
  -e DB_DATABASE=cachet -e APP_KEY=base64:key \
  cachethq/docker:latest
```

**Strengths:** Free, self-hosted, full data ownership, fully customizable.
**Weaknesses:** Requires hosting infrastructure, community support only, you own the uptime.

### Instatus

Modern hosted status page with competitive pricing.

**Strengths:** Clean UI, lower pricing than Statuspage.io, good API, custom domains.
**Weaknesses:** Smaller integration ecosystem, fewer enterprise features.

### Custom Solutions

Build in-house when you need deep integration with internal systems. Minimum requirements: static site on independent infrastructure, API for monitoring integration, incident history store, and subscriber notification system.

**Critical rule:** Host on completely separate infrastructure from production. If production is down, the status page must still be reachable.

## Component Organization

Components represent services or features users interact with.

```
Production Services        Data Processing
  ├── Website                ├── Real-time Pipeline
  ├── API                    ├── Batch Processing
  ├── Authentication         └── Data Exports
  ├── Dashboard
  └── Mobile App           Infrastructure
                             ├── CDN
Integrations                 ├── DNS
  ├── Webhook Delivery       └── Object Storage
  └── Email Notifications
```

**Design principles:** Group by user experience, not internal architecture. Users do not know your API is 12 microservices -- they care if it works. Include only what users interact with. Keep the count to 10-20 components.

### Component Statuses

| Status | When to Use |
|---|---|
| **Operational** | All metrics within SLO |
| **Degraded Performance** | Latency elevated, some requests slow |
| **Partial Outage** | Major feature down for some users |
| **Major Outage** | Service completely unavailable |

Tie statuses to monitoring thresholds:

```yaml
component_status_rules:
  api:
    operational:     { error_rate: "< 0.1%", p99_latency: "< 500ms" }
    degraded:        { error_rate: "0.1% - 1%", p99_latency: "500ms - 2s" }
    partial_outage:  { error_rate: "1% - 10%" }
    major_outage:    { error_rate: "> 10%" }
```

## Incident Templates

Pre-written templates ensure consistent communication during stressful incidents.

**Investigating:**
```
We are aware of [symptoms] affecting [component]. Our team is
actively investigating. We will provide updates every [cadence].
```

**Identified:**
```
We have identified the cause as [brief explanation]. We are
implementing [mitigation]. Expected resolution: [timeframe].
```

**Monitoring:**
```
A fix has been implemented. We are monitoring to ensure stability.
Error rates have returned to normal. We will mark this resolved
after [monitoring period] of stable operation.
```

**Resolved:**
```
This incident has been resolved as of [timestamp]. [Brief summary
of cause and fix]. Duration: [total]. We apologize for the
disruption and will conduct a post-incident review.
```

## Maintenance Windows

Scheduled maintenance communicates planned work that may affect users.

```bash
# Statuspage.io - schedule maintenance
curl -X POST https://api.statuspage.io/v1/pages/${PAGE_ID}/incidents \
  -H "Authorization: OAuth ${API_KEY}" \
  -d '{"incident": {
    "name": "Scheduled Database Maintenance",
    "status": "scheduled",
    "scheduled_for": "2026-02-25T02:00:00Z",
    "scheduled_until": "2026-02-25T04:00:00Z",
    "body": "Brief interruptions possible during database cluster maintenance.",
    "scheduled_auto_in_progress": true,
    "scheduled_auto_completed": true
  }}'
```

**Best practices:** Announce at least 72 hours in advance. Include expected impact in plain language. Specify timezone (use UTC plus local conversion). Auto-transition status if supported. Send a reminder 24 hours before. Update during the window if timing or impact changes.

## Subscriber Notifications

| Channel | Best For | When |
|---|---|---|
| **Email** | Detailed updates, maintenance | All incidents |
| **SMS** | Critical outages | SEV-1 only |
| **Webhook** | Internal tool integration | All updates |
| **RSS** | Pull-based consumers | All updates |

**Rules:** Do not spam -- update every 30 minutes during major incidents, not on every status change. Include actionable information. Allow granular subscriptions by component. Test delivery quarterly.

## Uptime Calculation

```
Uptime % = ((Total minutes - Downtime minutes) / Total minutes) * 100
```

For a 30-day month: 99.9% allows 43 minutes downtime. 99.95% allows 21 minutes. 99.99% allows 4 minutes.

**What counts as downtime:** Major outage counts as full downtime. Partial outage counts proportionally (30% of users affected for 10 minutes = 3 minutes effective downtime). Degraded performance typically does not count unless below an SLO threshold. Scheduled maintenance during announced windows is excluded.

## Integration with Monitoring

Automate the connection between monitoring and status page updates.

```
Prometheus -> Alertmanager -> Webhook Receiver -> Status Page API
```

```python
# Webhook receiver that updates status page from alerts
@app.route("/webhook/alertmanager", methods=["POST"])
def handle_alert():
    for alert in request.json.get("alerts", []):
        component = alert["annotations"].get("component")
        action = alert["annotations"].get("status_page_action")
        if not component or not action:
            continue
        status = "operational" if alert["status"] == "resolved" \
            else STATUS_MAP.get(action)
        if status:
            update_component(COMPONENT_MAP[component], status)
    return "", 200
```

Define Prometheus alert annotations that specify the status page action:

```yaml
- alert: ComponentDegraded
  expr: component:availability:ratio_5m < 0.999
  for: 5m
  annotations:
    status_page_action: "set_degraded"
    component: "api"
```

## Agent Operational Notes

- **Never delay updates to gather more information.** Post "investigating" immediately and refine later.
- **Use templates.** Do not write incident updates from scratch during an incident.
- **Match component status to monitoring data.** Do not leave a component "operational" when metrics show degradation.
- **Verify independence.** Regularly confirm the status page loads from outside your infrastructure.
- **Close incidents promptly.** An incident left in "monitoring" for days erodes trust.

