---
title: "PromQL Essentials: Practical Query Patterns"
description: "PromQL instant vectors, range vectors, rate functions, aggregation operators, and real queries for the most common monitoring scenarios."
url: https://agent-zone.ai/knowledge/observability/promql-essentials/
section: knowledge
date: 2026-02-22
categories: ["observability"]
tags: ["promql","prometheus","metrics","monitoring","queries","recording-rules"]
skills: ["promql-query-writing","metric-analysis","recording-rule-design"]
tools: ["prometheus","grafana"]
levels: ["intermediate"]
word_count: 812
formats:
  json: https://agent-zone.ai/knowledge/observability/promql-essentials/index.json
  html: https://agent-zone.ai/knowledge/observability/promql-essentials/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=PromQL+Essentials%3A+Practical+Query+Patterns
---


## Instant Vectors vs Range Vectors

An instant vector returns one sample per time series at a single point in time. A range vector returns multiple samples per time series over a time window.

```promql
# Instant vector: current value of each series
http_requests_total{job="api"}

# Range vector: last 5 minutes of samples for each series
http_requests_total{job="api"}[5m]
```

You cannot graph a range vector directly. Functions like `rate()` and `increase()` consume a range vector and return an instant vector, which Grafana can then plot.

## rate() vs irate()

Both compute per-second rates from counter metrics, but they behave differently.

`rate()` calculates the average per-second increase over the entire range window. It smooths out spikes and is the right choice for alerting and dashboards where you want stable trends:

```promql
# Average requests per second over last 5 minutes
rate(http_requests_total[5m])
```

`irate()` uses only the last two data points in the range window. It reacts to spikes immediately but is noisy:

```promql
# Instantaneous rate based on last two samples
irate(http_requests_total[5m])
```

The range window in `irate()` is only a lookback to find two samples. The actual rate is computed over the gap between those two scrapes, regardless of the window size.

Rule of thumb: use `rate()` for alerts and recording rules, `irate()` only for high-resolution interactive dashboards.

## histogram_quantile() for Latency Percentiles

Histogram metrics store observations in buckets. To get percentiles, use `histogram_quantile()`:

```promql
# p99 latency across all instances
histogram_quantile(0.99,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)

# p95 latency broken down by endpoint
histogram_quantile(0.95,
  sum by (le, handler) (rate(http_request_duration_seconds_bucket[5m]))
)

# p50 (median) latency
histogram_quantile(0.50,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)
```

The `le` label (less-than-or-equal) is required in the `by` clause -- it represents the bucket boundaries. Always apply `rate()` before `histogram_quantile()` to compute per-second bucket fill rates, otherwise you get cumulative counts that produce meaningless percentiles.

Bucket boundaries are set in your application instrumentation. Default Go client buckets are `[.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]`. Choose buckets that span your realistic latency range for accurate percentiles.

## Aggregation Operators

Aggregation operators reduce dimensions by collapsing label sets.

```promql
# Sum across all instances, keep only the job label
sum by (job) (rate(http_requests_total[5m]))

# Equivalent using 'without' -- drop the instance label, keep everything else
sum without (instance) (rate(http_requests_total[5m]))

# Average CPU usage per node
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))

# Count the number of pods per namespace
count by (namespace) (kube_pod_info)

# Maximum memory usage across all pods in a deployment
max by (deployment) (container_memory_working_set_bytes{container!=""})

# Minimum available disk space across all nodes
min by (instance) (node_filesystem_avail_bytes{mountpoint="/"})

# Top 5 pods by CPU usage
topk(5, sum by (pod) (rate(container_cpu_usage_seconds_total{container!=""}[5m])))
```

## Offset Modifier and Subqueries

The `offset` modifier shifts a query back in time. Useful for comparing current values to historical baselines:

```promql
# Request rate now vs 1 hour ago
rate(http_requests_total[5m]) / rate(http_requests_total[5m] offset 1h)

# Request rate now vs same time yesterday
rate(http_requests_total[5m]) - rate(http_requests_total[5m] offset 1d)
```

Subqueries evaluate a query over a range at a specified resolution:

```promql
# Maximum 5-minute error rate over the last hour, evaluated every minute
max_over_time(
  rate(http_requests_total{status_code=~"5.."}[5m])[1h:1m]
)

# Average of p99 latency over the last 24 hours
avg_over_time(
  histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))[24h:5m]
)
```

## Label Matching

PromQL supports four label matchers: `=` (exact match), `!=` (not equal), `=~` (regex match), `!~` (negative regex match):

```promql
# All 5xx status codes
http_requests_total{status_code=~"5.."}

# All non-GET methods
http_requests_total{method!="GET"}

# Multiple namespaces
kube_pod_info{namespace=~"production|staging"}

# Exclude system containers
container_memory_working_set_bytes{container!="", container!="POD"}
```

## 10 Common Monitoring Queries

```promql
# 1. Error rate as a percentage
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100

# 2. Request throughput by endpoint
sum by (handler) (rate(http_requests_total[5m]))

# 3. p95 latency per service
histogram_quantile(0.95,
  sum by (le, job) (rate(http_request_duration_seconds_bucket[5m])))

# 4. CPU saturation (load average vs cores)
node_load1 / count without (cpu) (node_cpu_seconds_total{mode="idle"})

# 5. Memory usage percentage per node
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# 6. Disk IOPS
sum by (instance) (rate(node_disk_reads_completed_total[5m])
  + rate(node_disk_writes_completed_total[5m]))

# 7. Container restart count in last hour
increase(kube_pod_container_status_restarts_total[1h])

# 8. Network throughput per pod
sum by (pod) (rate(container_network_receive_bytes_total[5m])) * 8

# 9. PVC usage percentage
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100

# 10. Deployment replica availability
kube_deployment_status_available_replicas / kube_deployment_spec_replicas
```

## Recording Rules

Expensive queries that run frequently (dashboard panels refreshing every 15s, alerts evaluating every minute) should be converted to recording rules. Prometheus pre-computes the result and stores it as a new time series.

```yaml
groups:
  - name: http_recording_rules
    interval: 30s
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))

      - record: job:http_errors:ratio5m
        expr: |
          sum by (job) (rate(http_requests_total{status_code=~"5.."}[5m]))
          / sum by (job) (rate(http_requests_total[5m]))

      - record: job:http_latency:p99_5m
        expr: |
          histogram_quantile(0.99,
            sum by (le, job) (rate(http_request_duration_seconds_bucket[5m])))

  - name: node_recording_rules
    rules:
      - record: instance:node_cpu:utilization5m
        expr: |
          1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))
```

Naming convention: `level:metric_name:operations`. The level indicates the aggregation level (`job`, `instance`, `namespace`). The operations suffix describes what was applied (`rate5m`, `ratio5m`, `p99_5m`).

Use recording rules in your alerts and dashboards by referencing the recorded metric name directly: `job:http_errors:ratio5m > 0.05` instead of the full expression. This reduces query load on Prometheus and makes alert rules more readable.

