Structured Skill Definitions: Describing What Agents Can Do

Structured Skill Definitions#

When an agent has access to dozens of tools, it needs more than names and descriptions to use them well. It needs to know what inputs each tool expects, what outputs it produces, what other tools or infrastructure must be present, and how expensive or risky a call is. A structured skill definition captures all of this in a machine-readable format.

Why Not Just Use Function Signatures?#

Function signatures tell you the types of parameters. They do not tell you that a skill requires kubectl to be installed, takes 10-30 seconds to run, needs cluster-admin permissions, and might delete resources if called with the wrong flags. Agents making autonomous decisions need this information up front, not buried in documentation they may not read.

Synthetic Monitoring: Proactive Uptime Checks, Blackbox Exporter, and External Probing

What Synthetic Monitoring Is#

Synthetic monitoring means actively probing your services on a schedule rather than waiting for users to report problems. Instead of relying on internal health checks or real user traffic to detect issues, you send controlled requests and measure the results. The fundamental question it answers is: “Is my service reachable and responding correctly right now?”

This is distinct from real user monitoring (RUM), which observes actual user interactions. Synthetic probes run 24/7 regardless of traffic volume, so they catch outages at 3 AM when no users are active. They provide consistent, repeatable measurements that are easy to alert on. The tradeoff is that synthetic probes test a narrow, predefined path – they do not capture the full range of user experience.

systemd Service Management: Units, Timers, Journal, and Socket Activation

Unit Types#

systemd manages the entire system through “units,” each representing a resource or service. The most common types:

  • service: Daemons and long-running processes (nginx, postgresql, your application).
  • timer: Scheduled execution, replacing cron. More flexible and better integrated with logging.
  • socket: Network sockets that trigger service activation on connection. Enables lazy startup and zero-downtime restarts.
  • target: Groups of units that represent system states (multi-user.target, graphical.target). Analogous to SysV runlevels.
  • mount: Filesystem mount points managed by systemd.
  • path: Watches filesystem paths and activates units when changes occur.

Unit files live in three locations, in order of precedence: /etc/systemd/system/ (local admin overrides), /run/systemd/system/ (runtime, non-persistent), and /usr/lib/systemd/system/ (package-installed defaults). Always put custom units in /etc/systemd/system/.

Taints, Tolerations, and Node Affinity: Controlling Pod Placement

Taints, Tolerations, and Node Affinity#

Pod scheduling in Kubernetes defaults to “run anywhere there is room.” In production, that is rarely what you want. GPU workloads should land on GPU nodes. System components should not compete with application pods. Nodes being drained should stop accepting new work. Taints, tolerations, and node affinity give you control over where pods run and where they do not.

Taints: Repelling Pods from Nodes#

A taint is applied to a node and tells the scheduler “do not place pods here unless they explicitly tolerate this taint.” Taints have three parts: a key, a value, and an effect.

Terraform State Management Patterns

Why Remote State#

Terraform stores the mapping between your configuration and real infrastructure in a state file. By default this is a local terraform.tfstate file. That breaks the moment a second person or a CI pipeline needs to run terraform apply. Remote state solves three problems: team collaboration (everyone reads the same state), CI/CD access (pipelines need state without copying files), and disaster recovery (your laptop dying should not lose your infrastructure mapping).

TLS Deep Dive: Certificate Chains, Handshake, Cipher Suites, and Debugging Connection Issues

The TLS Handshake#

Every HTTPS connection starts with a TLS handshake that establishes encryption parameters and verifies the server’s identity. The simplified flow for TLS 1.2:

Client                              Server
  |── ClientHello ──────────────────>|   (supported versions, cipher suites, random)
  |<────────────────── ServerHello ──|   (chosen version, cipher suite, random)
  |<──────────────── Certificate  ──|   (server's certificate chain)
  |<───────────── ServerKeyExchange ─|   (key exchange parameters)
  |<───────────── ServerHelloDone  ──|
  |── ClientKeyExchange ───────────>|   (client's key exchange contribution)
  |── ChangeCipherSpec ────────────>|   (switching to encrypted communication)
  |── Finished ────────────────────>|   (encrypted verification)
  |<──────────── ChangeCipherSpec ──|
  |<──────────────────── Finished ──|
  |<═══════ Encrypted traffic ═════>|

TLS 1.3 simplifies this significantly. The client sends its key share in the ClientHello, allowing the handshake to complete in a single round trip. TLS 1.3 also removed insecure cipher suites and compression, eliminating entire classes of vulnerabilities (BEAST, CRIME, POODLE).

Vertical Pod Autoscaler (VPA): Right-Sizing Resource Requests Automatically

Vertical Pod Autoscaler (VPA)#

Horizontal scaling adds more pod replicas. Vertical scaling gives each pod more (or fewer) resources. VPA automates the vertical side by watching actual CPU and memory usage over time and adjusting resource requests to match reality. Without it, teams guess at resource requests during initial deployment and rarely revisit them, leading to either waste (over-provisioned) or instability (under-provisioned).

What VPA Does#

VPA monitors historical and current resource usage for pods in a target Deployment (or StatefulSet, DaemonSet, etc.) and produces recommendations for CPU and memory requests. Depending on the configured mode, it either reports these recommendations passively or actively applies them by evicting and recreating pods with updated requests.

Writing Custom Prometheus Exporters: Exposing Application and Business Metrics

When to Write a Custom Exporter#

The Prometheus ecosystem has exporters for most infrastructure components: node_exporter for Linux hosts, kube-state-metrics for Kubernetes objects, mysqld_exporter for MySQL, and hundreds more. You write a custom exporter when your application or service does not have a Prometheus endpoint, you need business metrics that no generic exporter can provide (revenue, signups, queue depth), or you need to adapt a non-Prometheus system that exposes metrics in a proprietary format.

Zero Trust Networking

The Core Principle#

Zero trust networking operates on a simple premise: no network location is inherently trusted. Being inside the corporate network, inside a VPC, or inside a Kubernetes cluster does not grant access to anything. Every request must be authenticated, authorized, and encrypted regardless of where it originates.

This is a departure from the traditional castle-and-moat model where a VPN places you “inside” the network and everything inside is implicitly trusted. That model fails because attackers who breach the perimeter have unrestricted lateral movement. Zero trust eliminates the concept of inside versus outside.

Minikube Add-ons for Production-Like Environments

Minikube Add-ons for Production-Like Environments#

A bare minikube cluster runs workloads but lacks the infrastructure that production clusters rely on – metrics collection, ingress routing, TLS, monitoring, and load balancer support. Minikube’s addon system bridges this gap with one-command installs of production components.

Surveying Available Add-ons#

List everything minikube offers:

minikube addons list

This prints dozens of addons with their status. Most are disabled by default. The ones worth enabling depend on what you are testing, but a production-like setup typically needs five to seven of them.