ArgoCD Setup and Basics: Installation, CLI, First Application, and Sync Policies

ArgoCD Setup and Basics#

ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes. It watches Git repositories and ensures your cluster state matches what is declared in those repos. When someone changes a manifest in Git, ArgoCD detects the drift and either alerts you or automatically applies the change.

Installation with Plain Manifests#

The fastest path to a running ArgoCD instance:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

This installs the full ArgoCD stack: API server, repo server, application controller, Redis, and Dex (for SSO). For a minimal install without SSO and the UI, use namespace-install.yaml instead, which also scopes ArgoCD to a single namespace.

ArgoCD Sync Waves, Resource Hooks, and Sync Options

ArgoCD Sync Waves, Resource Hooks, and Sync Options#

ArgoCD sync is not just “apply all manifests at once.” You can control the order resources are created, run pre- and post-deployment tasks, restrict when syncs can happen, and tune how resources are applied. This is where ArgoCD moves from basic GitOps to production-grade deployment orchestration.

Sync Waves#

Sync waves control the order in which resources are applied. Every resource has a wave number (default 0). Resources in lower waves are applied and must be healthy before higher waves begin.

ArgoCD with Terraform and Crossplane: Managing Infrastructure Alongside Applications

ArgoCD with Terraform and Crossplane#

Applications need infrastructure – databases, queues, caches, object storage, DNS records, certificates. In a GitOps workflow managed by ArgoCD, there are two approaches to provisioning that infrastructure: Crossplane (Kubernetes-native) and Terraform (external). Each has different strengths and integration patterns with ArgoCD.

Crossplane: Infrastructure as Kubernetes CRDs#

Crossplane extends Kubernetes with CRDs that represent cloud resources. An RDS instance becomes a YAML manifest. A GCS bucket becomes a YAML manifest. ArgoCD manages these manifests exactly like it manages Deployments and Services.

Building a Kubernetes Deployment Pipeline: From Code Push to Production

Building a Kubernetes Deployment Pipeline: From Code Push to Production#

A deployment pipeline connects a code commit to a running container in your cluster. This operational sequence walks through building one end-to-end, with decision points at each phase and verification steps to confirm the pipeline works before moving on.

Phase 1 – Source Control and CI#

Step 1: Repository Structure#

Every deployable service needs three things alongside its application code: a Dockerfile, deployment manifests, and a CI pipeline definition.

Change Management for Infrastructure

Sre

Why Change Management Matters#

Most production incidents trace back to a change. Code deployments, configuration updates, infrastructure modifications, database migrations – each introduces risk. Change management reduces that risk through structure, visibility, and accountability. The goal is not to prevent change but to make change safe, visible, and reversible.

Change Request Process#

Every infrastructure change flows through a structured request. The formality scales with risk, but the basic elements remain constant.

Debugging ArgoCD: Diagnosing Sync Failures, Health Checks, RBAC, and Repo Issues

Debugging ArgoCD#

Most ArgoCD problems fall into predictable categories: sync stuck in a bad state, resources showing OutOfSync when they should not be, health checks reporting wrong status, RBAC blocking operations, or repository connections failing. Here is how to diagnose and fix each one.

Application Stuck in Progressing#

An application stuck in Progressing means ArgoCD is waiting for a resource to become healthy and it never does. The most common causes:

Designing Internal Developer Platforms

What an Internal Developer Platform Actually Is#

An Internal Developer Platform (IDP) is the set of tools, workflows, and self-service capabilities that a platform team builds and maintains so application developers can ship code without filing tickets or waiting on other teams. It is not a single product. It is a curated layer on top of your existing infrastructure that abstracts complexity while preserving the ability to go deeper when needed.

From Empty Cluster to Production-Ready: The Complete Setup Sequence

From Empty Cluster to Production-Ready#

This is the definitive operational plan for taking a fresh Kubernetes cluster and making it production-ready. Each phase builds on the previous one, with verification steps between phases and rollback notes where applicable. An agent should be able to follow this sequence end-to-end.

Estimated timeline: 5 days for a single operator. Phases 1-2 are blocking prerequisites. Phases 3-6 can partially overlap.


Phase 1 – Foundation (Day 1)#

Everything else depends on a healthy cluster with proper namespacing and storage. Do not proceed until every verification step passes.

Release Management Patterns: Versioning, Changelog Generation, Branching, Rollbacks, and Progressive Rollouts

Release Management Patterns#

Releasing software is more than merging to main and deploying. A disciplined release process ensures that every version is identifiable, every change is documented, every deployment is reversible, and failures are contained before they reach all users. This operational sequence walks through each phase of a production release workflow.

Phase 1 – Semantic Versioning#

Step 1: Adopt Semantic Versioning#

Semantic versioning (semver) communicates the impact of changes through the version number itself: MAJOR.MINOR.PATCH.

Scenario: Recovering from a Failed Deployment

Scenario: Recovering from a Failed Deployment#

You are helping when someone reports: “we deployed a new version and it is causing errors,” “pods are not starting,” or “the service is down after a deploy.” The goal is to restore service as quickly as possible, then prevent recurrence.

Time matters here. Every minute of diagnosis while the service is degraded is a minute of user impact. The bias should be toward fast rollback first, then root cause analysis second.