---
title: "Running Kafka on Kubernetes with Strimzi"
description: "How to deploy and operate Apache Kafka on Kubernetes using the Strimzi operator, covering broker configuration, storage, listeners, topic management, monitoring, and common failure modes."
url: https://agent-zone.ai/knowledge/kubernetes/kafka-on-kubernetes/
section: knowledge
date: 2026-02-22
categories: ["kubernetes"]
tags: ["kafka","strimzi","messaging","streaming","operator"]
skills: ["kafka-deployment","strimzi-operator","kafka-topic-management","kafka-monitoring"]
tools: ["kubectl","helm","kafka-cli"]
levels: ["intermediate"]
word_count: 755
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/kafka-on-kubernetes/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/kafka-on-kubernetes/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Running+Kafka+on+Kubernetes+with+Strimzi
---


# Running Kafka on Kubernetes with Strimzi

Running Kafka on Kubernetes without an operator is painful. You need StatefulSets, headless Services, init containers for broker ID assignment, and careful handling of storage and networking. Strimzi eliminates most of this by managing the entire Kafka lifecycle through Custom Resource Definitions.

## Installing Strimzi

```bash
# Option 1: Helm
helm repo add strimzi https://strimzi.io/charts
helm install strimzi strimzi/strimzi-kafka-operator \
  --namespace kafka \
  --create-namespace

# Option 2: Direct YAML install
kubectl create namespace kafka
kubectl apply -f https://strimzi.io/install/latest?namespace=kafka -n kafka
```

Verify the operator is running:

```bash
kubectl get pods -n kafka -l name=strimzi-cluster-operator
```

The operator watches for Kafka CRDs across the cluster (or in specific namespaces, depending on installation configuration).

## Deploying a Kafka Cluster

The `Kafka` CRD defines the entire cluster -- brokers, ZooKeeper (or KRaft), and Entity Operator:

```yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
  namespace: kafka
spec:
  kafka:
    version: 3.7.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      log.retention.hours: 168
      log.segment.bytes: 1073741824
      num.partitions: 6
    storage:
      type: persistent-claim
      size: 50Gi
      class: gp3
      deleteClaim: false
    resources:
      requests:
        memory: 2Gi
        cpu: 500m
      limits:
        memory: 4Gi
        cpu: "2"
    jvmOptions:
      -Xms: 1g
      -Xmx: 2g
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: gp3
      deleteClaim: false
    resources:
      requests:
        memory: 512Mi
        cpu: 250m
  entityOperator:
    topicOperator: {}
    userOperator: {}
```

Apply it and wait:

```bash
kubectl apply -f kafka-cluster.yaml
kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka
```

Strimzi creates StatefulSets for Kafka brokers (`my-cluster-kafka-0/1/2`) and ZooKeeper nodes (`my-cluster-zookeeper-0/1/2`), along with Services, ConfigMaps, and Secrets for inter-broker communication.

For Kafka 3.7+ you can use KRaft mode instead of ZooKeeper by adding the annotation `strimzi.io/kraft: enabled` and replacing the `zookeeper` section with a `nodePool` configuration. KRaft removes the ZooKeeper dependency entirely.

## Storage: Persistent Volumes and JBOD

Single volume (shown above) works for most deployments. For high-throughput workloads, JBOD (Just a Bunch Of Disks) spreads partitions across multiple volumes:

```yaml
storage:
  type: jbod
  volumes:
    - id: 0
      type: persistent-claim
      size: 100Gi
      class: gp3
      deleteClaim: false
    - id: 1
      type: persistent-claim
      size: 100Gi
      class: gp3
      deleteClaim: false
```

Each broker gets two PVCs. Kafka distributes log segments across the volumes. This doubles throughput by parallelizing disk I/O.

Set `deleteClaim: false` in production. When set to `true`, deleting the Kafka resource deletes all PVCs and your data with them.

## Listener Configuration

Listeners control how clients connect to Kafka. Strimzi supports several listener types for external access:

```yaml
listeners:
  # Internal cluster access
  - name: plain
    port: 9092
    type: internal
    tls: false

  # External via NodePort
  - name: external
    port: 9094
    type: nodeport
    tls: true
    authentication:
      type: tls

  # External via LoadBalancer (one per broker)
  - name: extlb
    port: 9095
    type: loadbalancer
    tls: true

  # External via Ingress (requires nginx ingress controller)
  - name: extingress
    port: 9096
    type: ingress
    tls: true
    configuration:
      bootstrap:
        host: kafka-bootstrap.example.com
      brokers:
        - broker: 0
          host: kafka-0.example.com
        - broker: 1
          host: kafka-1.example.com
        - broker: 2
          host: kafka-2.example.com
```

Internal clients connect to `my-cluster-kafka-bootstrap.kafka.svc:9092`. NodePort is free but exposes high-numbered ports; LoadBalancer gives clean endpoints but creates one LB per broker.

## Topic and User Management

The Entity Operator watches for `KafkaTopic` and `KafkaUser` CRDs:

```yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: orders
  namespace: kafka
  labels:
    strimzi.io/cluster: my-cluster
spec:
  partitions: 12
  replicas: 3
  config:
    retention.ms: 604800000    # 7 days
    cleanup.policy: delete
    max.message.bytes: 1048576
    min.insync.replicas: 2
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: order-processor
  namespace: kafka
  labels:
    strimzi.io/cluster: my-cluster
spec:
  authentication:
    type: tls
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: orders
          patternType: literal
        operations: [Read, Write, Describe]
        host: "*"
      - resource:
          type: group
          name: order-processor-group
          patternType: literal
        operations: [Read]
        host: "*"
```

The User Operator creates a Secret `order-processor` containing the client certificate and key. Mount this into your consumer/producer pods.

## Monitoring with JMX and Prometheus

Enable JMX metrics export in the Kafka resource:

```yaml
spec:
  kafka:
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml
```

Key metrics: `UnderReplicatedPartitions` (replication health), `OfflinePartitionsCount` (partitions without a leader), `MessagesInPerSec` (throughput), `RequestHandlerAvgIdlePercent` (broker load), and `kafka_log_Log_Size` (disk usage per partition).

## Common Issues

**Under-replicated partitions.** Causes: a broker is down, disk is slow, or network congestion. Check with `--describe --under-replicated-partitions` via kafka-topics.sh.

**Broker not joining the cluster.** Check headless Service DNS resolution and ensure no NetworkPolicy blocks inter-broker traffic on ports 9091 (replication) and 2181 (ZooKeeper).

**Disk full.** Brokers become unresponsive. Monitor PVC usage and set `log.retention.hours` and `log.retention.bytes`. To recover, expand the PVC or reduce retention and wait for cleanup.

**Consumer group rebalancing storms.** Frequently restarting consumer pods trigger rebalances that pause all consumers. Fix the root cause and increase `session.timeout.ms` and `max.poll.interval.ms` to tolerate brief interruptions.