Kubernetes DNS Deep Dive: CoreDNS, ndots, and Debugging Resolution Failures

Kubernetes DNS Deep Dive: CoreDNS, ndots, and Debugging Resolution Failures#

DNS problems are responsible for a disproportionate number of Kubernetes debugging sessions. The symptoms are always vague – timeouts, connection refused, “could not resolve host” – and the root causes range from CoreDNS being down to a misunderstood setting called ndots.

How Pod DNS Resolution Works#

When a pod makes a DNS query, it goes through the following chain:

  1. The application calls getaddrinfo() or equivalent.
  2. The system resolver reads /etc/resolv.conf inside the pod.
  3. The query goes to the nameserver specified in resolv.conf, which is CoreDNS (reachable via the kube-dns Service in kube-system).
  4. CoreDNS resolves the name – either from its internal zone (for cluster services) or by forwarding to upstream DNS.

Every pod’s /etc/resolv.conf looks something like this:

Scenario: Debugging Kubernetes Network Connectivity End-to-End

Scenario: Debugging Kubernetes Network Connectivity End-to-End#

The report comes in as it always does: “my application can’t reach another service.” This is one of the most common and most frustrating categories of Kubernetes issues because the networking stack has multiple layers, and the symptom (timeout, connection refused, 502) tells you almost nothing about which layer is broken.

This scenario walks through a systematic diagnostic process, starting from the symptom and narrowing down to the root cause. Follow these steps in order. Each step either identifies the problem or eliminates a layer from the investigation.

DNS Deep Dive: Record Types, Resolution, Troubleshooting, and Cloud DNS Management

How DNS Resolution Works#

When a client requests api.example.com, the resolution follows a chain of queries. The client asks its configured recursive resolver (often the ISP’s, or a public one like 8.8.8.8). The recursive resolver does the heavy lifting: it asks a root name server for .com, the .com TLD server for example.com, and the authoritative name server for example.com returns the answer for api.example.com. Each level caches the result according to the record’s TTL, so subsequent requests short-circuit the chain.