AKS Troubleshooting#
AKS problems fall into categories: node pool operations stuck or failed, pods not scheduling, storage not provisioning, authentication broken, and ingress not working. Each has Azure-specific causes that generic Kubernetes debugging will not surface.
Node Pool Stuck in Updating or Failed#
Node pool operations (scaling, upgrading, changing settings) can get stuck. The AKS API reports the pool as “Updating” indefinitely or transitions to “Failed.”
# Check node pool provisioning state
az aks nodepool show \
--resource-group myapp-rg \
--cluster-name myapp-aks \
--name workload \
--query provisioningState
# Check the activity log for errors
az monitor activity-log list \
--resource-group myapp-rg \
--query "[?contains(operationName.value, 'Microsoft.ContainerService')].{op:operationName.value, status:status.value, msg:properties.statusMessage}" \
--output tableCommon causes and fixes: