---
title: "OpenTelemetry for Kubernetes"
description: "Deploying the OpenTelemetry Collector on Kubernetes with auto-instrumentation, context propagation, sampling strategies, and exporter configuration."
url: https://agent-zone.ai/knowledge/observability/opentelemetry-basics/
section: knowledge
date: 2026-02-22
categories: ["observability"]
tags: ["opentelemetry","otel","tracing","metrics","kubernetes","collector"]
skills: ["otel-collector-config","auto-instrumentation","trace-pipeline-design"]
tools: ["opentelemetry","otel-collector","jaeger","tempo","prometheus","helm","kubectl"]
levels: ["intermediate"]
word_count: 846
formats:
  json: https://agent-zone.ai/knowledge/observability/opentelemetry-basics/index.json
  html: https://agent-zone.ai/knowledge/observability/opentelemetry-basics/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=OpenTelemetry+for+Kubernetes
---


## What OpenTelemetry Is

OpenTelemetry (OTel) is a vendor-neutral framework for generating, collecting, and exporting telemetry data: traces, metrics, and logs. It provides APIs, SDKs, and the Collector -- a standalone binary that receives, processes, and exports telemetry. OTel replaces the fragmented landscape of Jaeger client libraries, Zipkin instrumentation, Prometheus client libraries, and proprietary agents with a single standard.

The three signal types:

- **Traces**: Record the path of a request through distributed services as a tree of spans. Each span has a name, duration, attributes, and parent reference.
- **Metrics**: Numeric measurements (counters, gauges, histograms) emitted by applications and infrastructure. OTel metrics can be exported to Prometheus.
- **Logs**: Structured log records correlated with trace context. OTel log support bridges existing logging libraries with trace correlation.

## The OTel Collector Pipeline

The Collector is the central hub. It has three pipeline stages:

**Receivers** ingest data. They listen on network ports or pull from sources:
- `otlp`: Receives OTLP over gRPC (4317) and HTTP (4318). The primary receiver.
- `prometheus`: Scrapes Prometheus metrics endpoints.
- `jaeger`: Accepts Jaeger Thrift or gRPC spans.
- `filelog`: Tails log files (useful for node-level log collection).

**Processors** transform data in flight:
- `batch`: Batches telemetry before export to reduce network overhead.
- `memory_limiter`: Prevents OOM by dropping data when memory is high.
- `attributes`: Adds, removes, or modifies span/metric attributes.
- `filter`: Drops telemetry matching specified conditions.
- `tail_sampling`: Makes sampling decisions based on complete traces.
- `k8sattributes`: Enriches telemetry with Kubernetes metadata (pod name, namespace, node).

**Exporters** send data to backends:
- `otlp`: Forwards to another OTLP-compatible endpoint (Tempo, Jaeger, vendor backends).
- `prometheus`: Exposes a Prometheus scrape endpoint for collected metrics.
- `loki`: Ships logs to Grafana Loki.
- `debug`: Prints telemetry to stdout for development.

## Collector Configuration

A real collector config for Kubernetes:

```yaml
# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.deployment.name
        - k8s.node.name
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip

exporters:
  otlp/tempo:
    endpoint: tempo.observability:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://prometheus.observability:9090/api/v1/write
  loki:
    endpoint: http://loki.logging:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [loki]
```

## Deployment Modes on Kubernetes

**DaemonSet**: One Collector pod per node. Best for collecting node-level telemetry (logs from files, host metrics) and as a local aggregation point. Applications send telemetry to the Collector on their node via `NODE_IP:4317`.

**Sidecar**: A Collector container in each application pod. Useful when apps need a dedicated processing pipeline or when the Collector must share a network namespace with the app. Higher resource overhead.

**Deployment** (Gateway): A centralized Collector pool behind a Service. Applications send telemetry to `otel-collector.observability:4317`. The gateway handles tail sampling, enrichment, and routing. Scale replicas based on throughput. This is the most common production pattern.

Deploy with the OTel Operator and Helm:

```bash
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-collector open-telemetry/opentelemetry-collector \
  --namespace observability --create-namespace \
  --set mode=deployment \
  --set replicaCount=2 \
  --values otel-collector-values.yaml
```

## Auto-Instrumentation

The OTel Operator supports automatic instrumentation for Java, Python, Node.js, and Go without code changes. Install the operator, then create an `Instrumentation` resource:

```yaml
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: auto-instrumentation
  namespace: default
spec:
  exporter:
    endpoint: http://otel-collector.observability:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
```

Annotate pods to activate injection:

```yaml
metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-java: "true"
    # or inject-python, inject-nodejs, inject-go
```

The operator injects an init container that installs the OTel agent, plus environment variables that configure the SDK. No application code changes needed.

## Context Propagation

Traces span multiple services because context is propagated in HTTP headers. The two main formats:

- **W3C Trace Context**: `traceparent: 00-<trace-id>-<span-id>-<flags>`. The standard. Use this unless you have legacy Zipkin/Jaeger services.
- **B3 (Zipkin)**: `X-B3-TraceId`, `X-B3-SpanId`, `X-B3-Sampled`. Used by older Zipkin-instrumented services.

Configure propagators in the SDK or Instrumentation resource. If your mesh includes both old and new services, set `propagators: [tracecontext, b3multi]` to inject and extract both formats.

## Sampling Strategies

Sampling controls how many traces are recorded, reducing storage and cost.

**Head-based sampling**: Decided at trace creation. The `parentbased_traceidratio` sampler keeps a percentage of traces (e.g., 10% with argument `"0.1"`). Simple but blind -- it drops traces before knowing if they contain errors.

**Tail-based sampling**: Decided after the full trace is assembled. The Collector's `tail_sampling` processor can keep all error traces, slow traces, or traces matching specific attributes. Requires a gateway Collector that sees all spans for a trace:

```yaml
processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-requests
        type: latency
        latency: {threshold_ms: 2000}
      - name: percentage
        type: probabilistic
        probabilistic: {sampling_percentage: 5}
```

This keeps all error traces, all traces over 2 seconds, and 5% of everything else. Tail sampling is more powerful but requires careful memory management since traces must be held in memory until the decision is made.

## Resource Attributes

Resource attributes describe the entity producing telemetry. Set them via environment variables:

```yaml
env:
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: "service.name=payment-api,service.version=1.4.2,deployment.environment=production"
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://otel-collector.observability:4317"
```

The `k8sattributes` processor in the Collector adds Kubernetes-specific attributes automatically, so applications only need to set `service.name` and `service.version`.

