Velero Backup and Restore: Disaster Recovery for Kubernetes

Velero Backup and Restore#

Velero backs up Kubernetes resources and persistent volume data to object storage. It handles scheduled backups, on-demand snapshots, and restores to the same or a different cluster. It is the standard tool for Kubernetes disaster recovery.

Velero captures two things: Kubernetes API objects (stored as JSON) and persistent volume data (via cloud volume snapshots or file-level backup with Kopia).

Installation#

You need an object storage bucket (S3, GCS, Azure Blob, or MinIO) and write credentials.

Choosing a GitOps Tool: ArgoCD vs Flux vs Jenkins vs GitHub Actions for Kubernetes Deployments

Choosing a GitOps Tool#

The term “GitOps” is applied to everything from a simple kubectl apply in a GitHub Actions workflow to a fully reconciled, pull-based deployment architecture with drift detection. These are fundamentally different approaches. Choosing between them depends on your team’s operational maturity, cluster count, and tolerance for running controllers in your cluster.

What GitOps Actually Means#

GitOps, as defined by the OpenGitOps principles (a CNCF sandbox project), has four requirements: declarative desired state, state versioned in git, changes applied automatically, and continuous reconciliation with drift detection. The last two are what separate true GitOps from “CI/CD that uses git.”

Cilium Deep Dive: eBPF Networking, L7 Policies, Hubble Observability, and Cluster Mesh

Cilium Deep Dive#

Cilium replaces the traditional Kubernetes networking stack with eBPF programs that run directly in the Linux kernel. Instead of kube-proxy translating Service definitions into iptables rules and a traditional CNI plugin managing pod networking through bridge interfaces and routing tables, Cilium attaches eBPF programs to kernel hooks that process packets at wire speed. The result is a networking layer that is faster at scale, capable of Layer 7 policy enforcement, and provides built-in observability without application instrumentation.

Custom Resource Definitions (CRDs): Extending the Kubernetes API

Custom Resource Definitions (CRDs)#

CRDs extend the Kubernetes API with your own resource types. Once you create a CRD, you can kubectl get, kubectl apply, and kubectl delete instances of your custom type just like built-in resources. The custom resources are stored in etcd alongside native Kubernetes objects, benefit from the same RBAC, and participate in the same API machinery.

When to Use CRDs#

CRDs make sense when you need to represent application-specific concepts inside Kubernetes:

Database Connection Pooling: PgBouncer, ProxySQL, and Application-Level Patterns

Database Connection Pooling: PgBouncer, ProxySQL, and Application-Level Patterns#

Database connections are expensive resources. PostgreSQL forks a new OS process for every connection. MySQL creates a thread. Both allocate memory for session state, query buffers, and sort areas. When your application scales horizontally in Kubernetes – 10 pods, then 20, then 50 – the connection count multiplies, and most databases buckle long before your application pods do.

Connection pooling solves this by maintaining a smaller set of persistent connections to the database and sharing them across many application clients. Understanding pooling options, deployment patterns, and sizing is essential for any production database workload on Kubernetes.

Helm Release Naming Gotchas: How Resource Names Actually Work

Helm Release Naming Gotchas#

Helm charts derive Kubernetes resource names from the release name, but every chart does it differently. If you assume a consistent pattern, you will get bitten by DNS resolution failures, broken connection strings, and mysterious “service not found” errors.

Bitnami PostgreSQL: Names Are Not What You Expect#

The Bitnami PostgreSQL chart names resources using the release name directly, not {release-name}-postgresql. This catches nearly everyone.

# You deploy like this:
helm upgrade --install dt-postgresql bitnami/postgresql \
  --namespace dream-team \
  --set auth.database=mattermost \
  --set auth.username=mmuser

# You expect these resource names:
#   Pod:     dt-postgresql-postgresql-0   <-- WRONG
#   Service: dt-postgresql-postgresql     <-- WRONG

# Actual names:
#   Pod:     dt-postgresql-0
#   Service: dt-postgresql

This means your application connection string should reference dt-postgresql, not dt-postgresql-postgresql. If you chose release name postgresql, your service is just postgresql – which might collide with other things in your namespace.

Kubernetes Operator Development: Patterns, Frameworks, and Best Practices

Kubernetes Operator Development#

Operators are custom controllers that manage CRDs. They encode operational knowledge – the kind of tasks a human operator would perform – into software that runs inside the cluster. An operator watches for changes to its custom resources and reconciles the actual state to match the desired state, creating, updating, or deleting child resources as needed.

Operator Maturity Model#

The Operator Framework defines five maturity levels:

LevelCapabilityExample
1Basic installHelm operator deploys the application
2Seamless upgradesOperator handles version migrations
3Full lifecycleBackup, restore, failure recovery
4Deep insightsExposes metrics, fires alerts, generates dashboards
5Auto-pilotAuto-scaling, auto-healing, auto-tuning without human input

Most custom operators target Level 2-3. Levels 4-5 are typically reached by mature projects like the Prometheus Operator or Rook/Ceph.

Multi-Cluster Kubernetes: Architecture, Networking, and Management Patterns

Multi-Cluster Kubernetes#

A single Kubernetes cluster is a single blast radius. A bad deployment, a control plane failure, a misconfigured admission webhook – any of these can take down everything. Multi-cluster is not about complexity for its own sake. It is about isolation, resilience, and operating workloads that span regions, regulations, or teams.

Why Multi-Cluster#

Blast radius isolation. A cluster-wide failure (etcd corruption, bad admission webhook, API server overload) only affects one cluster. Critical workloads in another cluster are untouched.

OAuth2 and OIDC for Infrastructure

OAuth2 vs OIDC: What Actually Matters#

OAuth2 is an authorization framework. It answers the question “what is this client allowed to do?” by issuing access tokens. It does not tell you who the user is. OIDC (OpenID Connect) is a layer on top of OAuth2 that adds authentication. It answers “who is this user?” by adding an ID token – a signed JWT containing user identity claims like email, name, and group memberships.

OPA Gatekeeper: Policy as Code for Kubernetes

OPA Gatekeeper: Policy as Code for Kubernetes#

Gatekeeper is a Kubernetes-native policy engine built on Open Policy Agent (OPA). It runs as a validating admission webhook and evaluates policies written in Rego against every matching API request. Instead of deploying raw Rego files to an OPA server, Gatekeeper uses Custom Resource Definitions: you define policies as ConstraintTemplates and instantiate them as Constraints. This makes policy management declarative, auditable, and version-controlled.