Running Windows Workloads on Kubernetes: Node Pools, Scheduling, and Gotchas

Running Windows Workloads on Kubernetes#

Kubernetes supports Windows worker nodes alongside Linux worker nodes in the same cluster. This enables running Windows-native applications – .NET Framework services, IIS-hosted applications, Windows-specific middleware – on Kubernetes without rewriting them for Linux. However, Windows nodes are not interchangeable with Linux nodes. There are fundamental differences in networking, storage, container runtime behavior, and resource management that you must account for.

Core Constraints#

Before adding Windows nodes, understand what is and is not supported:

Sandbox to Production: The Complete Workflow for Verified Infrastructure Deliverables

Sandbox to Production#

An agent that produces infrastructure deliverables works in a sandbox. It does not touch production. It does not reach into someone else’s cluster, database, or cloud account. It works in an isolated environment, tests its work, captures the results, and hands the human a verified deliverable they can execute on their own infrastructure.

This is not a limitation – it is a design choice. The output is always a deliverable, never a direct action on someone else’s systems. This boundary is what makes the approach safe enough for production infrastructure work and trustworthy enough for enterprise change management.

Scenario: Debugging Kubernetes Network Connectivity End-to-End

Scenario: Debugging Kubernetes Network Connectivity End-to-End#

The report comes in as it always does: “my application can’t reach another service.” This is one of the most common and most frustrating categories of Kubernetes issues because the networking stack has multiple layers, and the symptom (timeout, connection refused, 502) tells you almost nothing about which layer is broken.

This scenario walks through a systematic diagnostic process, starting from the symptom and narrowing down to the root cause. Follow these steps in order. Each step either identifies the problem or eliminates a layer from the investigation.

Scenario: Preparing for and Handling a Traffic Spike

Scenario: Preparing for and Handling a Traffic Spike#

You are helping when someone says: “we have a big launch next week,” “Black Friday is coming,” or “traffic is suddenly 3x normal and climbing.” These are two distinct problems – proactive preparation for a known event and reactive response to an unexpected surge – but they share the same infrastructure mechanics.

The key principle: Kubernetes autoscaling has latency. HPA takes 15-30 seconds to detect increased load and scale pods. Cluster Autoscaler takes 3-7 minutes to provision new nodes. If your traffic spike is faster than your scaling speed, users hit errors during the gap. Proactive preparation eliminates this gap. Reactive response minimizes it.

Scenario: Recovering from a Failed Deployment

Scenario: Recovering from a Failed Deployment#

You are helping when someone reports: “we deployed a new version and it is causing errors,” “pods are not starting,” or “the service is down after a deploy.” The goal is to restore service as quickly as possible, then prevent recurrence.

Time matters here. Every minute of diagnosis while the service is degraded is a minute of user impact. The bias should be toward fast rollback first, then root cause analysis second.

Secret Management Patterns

The Problem with Environment Variables#

Environment variables are the most common way to pass secrets to applications. Every framework supports them and they require zero dependencies. They are also the least secure option. Any process running as the same user can read them via /proc/<pid>/environ on Linux. Crash dumps include the full environment. Child processes inherit all variables by default.

# Anyone with host access can read another process's environment
cat /proc/$(pgrep myapp)/environ | tr '\0' '\n' | grep DB_PASSWORD

Environment variables are acceptable for local development. For production secrets, use one of the patterns below.

Secure API Design: Authentication, Authorization, Input Validation, and OWASP API Top 10

Secure API Design#

Every API exposed to any network — public or internal — is an attack surface. The difference between a secure API and a vulnerable one is not exotic cryptography. It is consistent application of known patterns: authenticate every request, authorize every action, validate every input, and limit every resource.

Authentication Schemes#

API Keys#

The simplest scheme. The client sends a static key in a header:

GET /api/v1/data HTTP/1.1
Host: api.example.com
X-API-Key: sk_live_abc123def456

API keys are appropriate for:

Securing Docker-Based Validation Templates

Securing Docker-Based Validation Templates#

Validation templates define the environment agents use to test infrastructure changes. If a template runs containers as root, mounts the Docker socket, or skips resource limits, every agent that copies it inherits those risks. This reference covers the security patterns every docker-compose validation template must follow.

1. Non-Root Execution#

Containers run as root by default. A vulnerability in a root-running process gives an attacker full control inside the container and a much larger attack surface for container escapes.

Securing etcd: Encryption at Rest, TLS, and Access Control

Securing etcd#

etcd is the single most critical component in a Kubernetes cluster. It stores everything: pod specs, secrets, configmaps, RBAC rules, service account tokens, and all cluster state. By default, Kubernetes secrets are stored in etcd as base64-encoded plaintext. Anyone with read access to etcd has read access to every secret in the cluster. Securing etcd is not optional.

Why etcd Is the Crown Jewel#

Run this against an unencrypted etcd and you will see why:

Securing Kubernetes Ingress: TLS, Rate Limiting, WAF, and Access Control

Securing Kubernetes Ingress#

The ingress controller is the front door to your cluster. Every request from the internet passes through it, making it both the most exposed component and the best place to enforce security controls. Most teams deploy an ingress controller and stop at basic routing. That leaves the door wide open.

TLS Termination and HTTPS Enforcement#

Every ingress should terminate TLS. Never serve production traffic over plain HTTP. With nginx-ingress, force HTTPS redirects and add HSTS headers: