Distributed Tracing in Practice

Trace, Span, and Context#

A trace represents a single request flowing through a distributed system. It is identified by a 128-bit trace ID. A span represents one unit of work within that trace – an HTTP handler, a database query, a message publish. Each span has a name, start time, duration, status, attributes (key-value pairs), and events (timestamped annotations). Spans form a tree: every span except the root has a parent span ID.

Grafana Dashboards for Kubernetes Monitoring

Data Source Configuration#

Grafana connects to backend data stores through data sources. For a complete Kubernetes observability stack, you need three: Prometheus for metrics, Loki for logs, and Tempo for traces.

Provision data sources declaratively so they survive Grafana restarts and are version-controlled:

# grafana/provisioning/datasources/observability.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus-operated:9090
    isDefault: true
    jsonData:
      timeInterval: "15s"
      exemplarTraceIdDestinations:
        - name: traceID
          datasourceUid: tempo

  - name: Loki
    type: loki
    access: proxy
    url: http://loki-gateway:3100
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: '"traceID":"(\w+)"'
          url: "$${__value.raw}"
          datasourceUid: tempo

  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3100
    jsonData:
      tracesToMetrics:
        datasourceUid: prometheus
        tags: [{key: "service.name", value: "job"}]
      serviceMap:
        datasourceUid: prometheus
      nodeGraph:
        enabled: true

The cross-linking configuration lets you click from a metric data point to the trace that generated it, and extract trace IDs from log lines to link to Tempo.

OpenTelemetry for Kubernetes

What OpenTelemetry Is#

OpenTelemetry (OTel) is a vendor-neutral framework for generating, collecting, and exporting telemetry data: traces, metrics, and logs. It provides APIs, SDKs, and the Collector – a standalone binary that receives, processes, and exports telemetry. OTel replaces the fragmented landscape of Jaeger client libraries, Zipkin instrumentation, Prometheus client libraries, and proprietary agents with a single standard.

The three signal types:

  • Traces: Record the path of a request through distributed services as a tree of spans. Each span has a name, duration, attributes, and parent reference.
  • Metrics: Numeric measurements (counters, gauges, histograms) emitted by applications and infrastructure. OTel metrics can be exported to Prometheus.
  • Logs: Structured log records correlated with trace context. OTel log support bridges existing logging libraries with trace correlation.

The OTel Collector Pipeline#

The Collector is the central hub. It has three pipeline stages:

Setting Up Full Observability from Scratch: Metrics, Logs, Traces, and Alerting

Setting Up Full Observability from Scratch#

This operational sequence deploys a complete observability stack on Kubernetes: metrics (Prometheus + Grafana), logs (Loki + Promtail), traces (Tempo + OpenTelemetry), and alerting (Alertmanager). Each phase is self-contained with verification steps. Complete them in order – later phases depend on earlier infrastructure.

Prerequisite: a running Kubernetes cluster with Helm installed and a monitoring namespace created.

kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

Phase 1 – Metrics (Prometheus + Grafana)#

Metrics are the foundation. Logging and tracing integrations all route through Grafana, so this phase must be solid before continuing.