Running Redis on Kubernetes

Running Redis on Kubernetes#

Redis on Kubernetes ranges from dead simple (single pod for caching) to operationally complex (Redis Cluster with persistence). The right choice depends on whether you need data durability, high availability, or just a fast throwaway cache.

Single-Instance Redis with Persistence#

For development or small workloads, a single Redis Deployment with a PVC is enough:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command: ["redis-server", "--appendonly", "yes", "--maxmemory", "256mb", "--maxmemory-policy", "allkeys-lru"]
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: redis-data
          mountPath: /data
        resources:
          requests:
            cpu: 100m
            memory: 300Mi
          limits:
            cpu: 500m
            memory: 350Mi
      volumes:
      - name: redis-data
        persistentVolumeClaim:
          claimName: redis-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redis-data
spec:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: redis
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379

Set the memory limit in Redis (--maxmemory) lower than the container memory limit. If Redis uses 350Mi and the container limit is 350Mi, the kernel OOM-kills the process during background save operations when Redis forks and temporarily doubles its memory usage. A safe ratio: set maxmemory to 60-75% of the container memory limit.

Running Windows Workloads on Kubernetes: Node Pools, Scheduling, and Gotchas

Running Windows Workloads on Kubernetes#

Kubernetes supports Windows worker nodes alongside Linux worker nodes in the same cluster. This enables running Windows-native applications – .NET Framework services, IIS-hosted applications, Windows-specific middleware – on Kubernetes without rewriting them for Linux. However, Windows nodes are not interchangeable with Linux nodes. There are fundamental differences in networking, storage, container runtime behavior, and resource management that you must account for.

Core Constraints#

Before adding Windows nodes, understand what is and is not supported:

Scenario: Debugging Kubernetes Network Connectivity End-to-End

Scenario: Debugging Kubernetes Network Connectivity End-to-End#

The report comes in as it always does: “my application can’t reach another service.” This is one of the most common and most frustrating categories of Kubernetes issues because the networking stack has multiple layers, and the symptom (timeout, connection refused, 502) tells you almost nothing about which layer is broken.

This scenario walks through a systematic diagnostic process, starting from the symptom and narrowing down to the root cause. Follow these steps in order. Each step either identifies the problem or eliminates a layer from the investigation.

Scenario: Migrating Workloads Between Kubernetes Clusters

Scenario: Migrating Workloads Between Kubernetes Clusters#

You are helping when someone says: “we need to move workloads from cluster A to cluster B.” The reasons vary – Kubernetes version upgrade, cloud provider migration, region change, architecture consolidation, or moving from self-managed to a managed service. The complexity ranges from trivial (stateless services with GitOps) to significant (stateful workloads with zero-downtime requirements).

The core risk in any cluster migration is data loss for stateful workloads and downtime during the traffic cutover. Every decision in this plan aims to minimize both.

Scenario: Preparing for and Handling a Traffic Spike

Scenario: Preparing for and Handling a Traffic Spike#

You are helping when someone says: “we have a big launch next week,” “Black Friday is coming,” or “traffic is suddenly 3x normal and climbing.” These are two distinct problems – proactive preparation for a known event and reactive response to an unexpected surge – but they share the same infrastructure mechanics.

The key principle: Kubernetes autoscaling has latency. HPA takes 15-30 seconds to detect increased load and scale pods. Cluster Autoscaler takes 3-7 minutes to provision new nodes. If your traffic spike is faster than your scaling speed, users hit errors during the gap. Proactive preparation eliminates this gap. Reactive response minimizes it.

Scenario: Recovering from a Failed Deployment

Scenario: Recovering from a Failed Deployment#

You are helping when someone reports: “we deployed a new version and it is causing errors,” “pods are not starting,” or “the service is down after a deploy.” The goal is to restore service as quickly as possible, then prevent recurrence.

Time matters here. Every minute of diagnosis while the service is degraded is a minute of user impact. The bias should be toward fast rollback first, then root cause analysis second.

Security Hardening a Kubernetes Cluster: End-to-End Operational Sequence

Security Hardening a Kubernetes Cluster#

This operational sequence takes a default Kubernetes cluster and locks it down. Phases are ordered by impact and dependency: assessment first, then RBAC, pod security, networking, images, auditing, and finally data protection. Each phase includes the commands, policy YAML, and verification steps.

Do not skip the assessment phase. You need to know what you are fixing before you start fixing it.


Phase 1 – Assessment#

Before changing anything, establish a baseline. This phase produces a prioritized list of findings that drives the order of remediation in later phases.

Service Account Security: Tokens, RBAC Binding, and Workload Identity

Service Account Security#

Every pod in Kubernetes runs as a service account. By default, that is the default service account in the pod’s namespace, with an auto-mounted API token that never expires. This default configuration is overly permissive for most workloads. Hardening service accounts is one of the highest-impact security improvements you can make in a Kubernetes cluster.

The Default Problem#

When a pod starts without specifying a service account, Kubernetes does three things:

StatefulSets and Persistent Storage: Stable Identity, PVCs, and StorageClasses

StatefulSets and Persistent Storage#

Deployments treat pods as interchangeable. StatefulSets do not – each pod gets a stable hostname, a persistent volume, and an ordered startup sequence. This is what you need for databases, message queues, and any workload where identity matters.

StatefulSet vs Deployment#

FeatureDeploymentStatefulSet
Pod namesRandom suffix (web-api-6d4f8)Ordinal index (postgres-0, postgres-1)
Startup orderAll at onceSequential (0, then 1, then 2)
Stable network identityNoYes, via headless Service
Persistent storageShared or nonePer-pod via volumeClaimTemplates
Scaling downRemoves random podsRemoves highest ordinal first

Use StatefulSets when your application needs any of: stable hostnames, ordered deployment/scaling, or per-pod persistent storage. Common examples: PostgreSQL, MySQL, Redis Sentinel, Kafka, ZooKeeper, Elasticsearch.

Upgrading Kubernetes Clusters Safely

Upgrading Kubernetes Clusters Safely#

Kubernetes releases a new minor version roughly every four months. Staying current is not optional – clusters more than three versions behind lose security patches, and skipping versions during upgrade is not supported. Every upgrade must step through each minor version sequentially.

Version Skew Policy#

The version skew policy defines which component version combinations are supported:

  • kube-apiserver instances within an HA cluster can differ by at most 1 minor version.
  • kubelet can be up to 3 minor versions older than kube-apiserver (changed from 2 in Kubernetes 1.28+), but never newer.
  • kube-controller-manager, kube-scheduler, and kube-proxy must not be newer than kube-apiserver and can be up to 1 minor version older.
  • kubectl is supported within 1 minor version (older or newer) of kube-apiserver.

The practical consequence: always upgrade the control plane first, then node pools. Never upgrade nodes past the control plane version.