Kubernetes DNS Deep Dive: CoreDNS, ndots, and Debugging Resolution Failures

Kubernetes DNS Deep Dive: CoreDNS, ndots, and Debugging Resolution Failures#

DNS problems are responsible for a disproportionate number of Kubernetes debugging sessions. The symptoms are always vague – timeouts, connection refused, “could not resolve host” – and the root causes range from CoreDNS being down to a misunderstood setting called ndots.

How Pod DNS Resolution Works#

When a pod makes a DNS query, it goes through the following chain:

  1. The application calls getaddrinfo() or equivalent.
  2. The system resolver reads /etc/resolv.conf inside the pod.
  3. The query goes to the nameserver specified in resolv.conf, which is CoreDNS (reachable via the kube-dns Service in kube-system).
  4. CoreDNS resolves the name – either from its internal zone (for cluster services) or by forwarding to upstream DNS.

Every pod’s /etc/resolv.conf looks something like this:

Kubernetes Events Debugging: Patterns, Filtering, and Alerting

Kubernetes Events Debugging#

Kubernetes events are the cluster’s built-in audit trail for what is happening to resources. When a pod fails to schedule, a container crashes, a node runs out of disk, or a volume fails to mount, the system records an event. Events are the first place to look when something goes wrong, and learning to read them efficiently separates quick diagnosis from hours of guessing.

Event Structure#

Every Kubernetes event has these fields:

Kubernetes FinOps: Decision Framework for Cost Optimization Strategies

Kubernetes FinOps: Decision Framework for Cost Optimization#

FinOps in Kubernetes is the practice of bringing financial accountability to infrastructure spending. The challenge is not a lack of cost-saving techniques – it is knowing which ones to apply first, which combinations work together, and which ones introduce risk that outweighs the savings. This article provides a structured decision framework for selecting and prioritizing Kubernetes cost optimization strategies.

The Five Optimization Levers#

Every Kubernetes cost optimization effort works across five levers. Each has a different risk profile, implementation effort, and savings ceiling.

Kubernetes Namespace Organization: Strategies That Actually Work

Kubernetes Namespace Organization#

Namespaces are Kubernetes’ primary mechanism for dividing a cluster among teams, applications, and environments. Getting the strategy right early saves significant pain later. Getting it wrong means RBAC tangles, resource contention, and deployment confusion.

Strategy 1: Per-Team Namespaces#

Each team gets a namespace (team-platform, team-payments, team-frontend). All applications owned by that team deploy into it.

When it works: Clear team boundaries with shared responsibility for multiple services.

Kubernetes Operators and Crossplane: Extending the Platform

Kubernetes Operators and Crossplane#

The Operator Pattern#

An operator is a CRD (Custom Resource Definition) paired with a controller. The CRD defines a new resource type (like Certificate or KafkaCluster). The controller watches for instances of that CRD and reconciles actual state to match desired state. This is the same reconciliation loop that powers Deployments, extended to anything.

Operators encode operational knowledge into software. Instead of a runbook with 47 steps to create a Kafka cluster, you declare what you want and the operator handles creation, scaling, upgrades, and failure recovery.

Kubernetes Production Readiness Checklist: Everything to Verify Before Going Live

Kubernetes Production Readiness Checklist#

This checklist is designed for agents to audit a Kubernetes cluster before production workloads run on it. Every item includes the verification command and what a passing result looks like. Work through each category sequentially. A failing item in Cluster Health should be fixed before checking Workload Configuration.


Cluster Health#

These are non-negotiable. If any of these fail, stop and fix them before evaluating anything else.

Kubernetes Service Types and DNS-Based Discovery

Kubernetes Service Types and DNS-Based Discovery#

Services are the stable networking abstraction in Kubernetes. Pods come and go, but a Service gives you a consistent DNS name and IP address that routes to the right set of pods. Choosing the wrong Service type or misunderstanding DNS discovery is behind a large percentage of connectivity failures.

Service Types#

ClusterIP (Default)#

ClusterIP creates an internal-only virtual IP. Only pods inside the cluster can reach it. This is what you want for internal communication between microservices.

Kustomize Patterns: Bases, Overlays, and Practical Transformers

Kustomize Patterns#

Kustomize lets you customize Kubernetes manifests without templating. You start with plain YAML (bases) and layer modifications (overlays) on top. It is built into kubectl, so there is no extra tool to install.

Base and Overlay Structure#

The standard layout separates shared manifests from per-environment customizations:

k8s/
  base/
    kustomization.yaml
    deployment.yaml
    service.yaml
    configmap.yaml
  overlays/
    dev/
      kustomization.yaml
      replica-patch.yaml
    staging/
      kustomization.yaml
      ingress.yaml
    production/
      kustomization.yaml
      replica-patch.yaml
      hpa.yaml

The base kustomization.yaml lists the resources:

# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - deployment.yaml
  - service.yaml
  - configmap.yaml

An overlay references the base and adds modifications:

Lightweight Kubernetes at the Edge with K3s

Lightweight Kubernetes at the Edge with K3s#

K3s is a production-grade Kubernetes distribution packaged as a single binary under 100 MB. It was built for environments where resources are constrained and operational simplicity matters: edge locations, IoT gateways, retail stores, factory floors, branch offices, and CI/CD pipelines where you need a real cluster but cannot justify the overhead of a full Kubernetes deployment.

K3s achieves its small footprint by replacing etcd with SQLite (by default), embedding containerd directly, removing in-tree cloud provider and storage plugins, and packaging everything into a single binary. Despite these changes, K3s is a fully conformant Kubernetes distribution – it passes the CNCF conformance tests and runs standard Kubernetes workloads without modification.

Linux Debugging Essentials for Infrastructure

Debugging Workflow#

Start broad, narrow down. Most problems fall into five categories: service not running, resource exhaustion, full disk, network failure, or kernel issue. Work through them in order: service, resources, network, kernel logs.

Services: systemctl and journalctl#

When a service is misbehaving, start with its status:

systemctl status nginx

This shows whether the service is active, its PID, its last few log lines, and how long it has been running. If the service keeps restarting, the uptime will be suspiciously short.