Grafana Loki for Log Aggregation

Loki Architecture#

Loki is a log aggregation system designed by Grafana Labs. Unlike Elasticsearch, Loki does not index log content. It indexes only metadata labels, then stores compressed log chunks in object storage. This makes it cheaper to operate and simpler to scale, at the cost of slower full-text search across massive datasets.

The core components are:

  • Distributor: Receives incoming log streams from agents, validates labels, and forwards to ingesters via consistent hashing.
  • Ingester: Buffers log data in memory, builds compressed chunks, and flushes them to long-term storage (S3, GCS, filesystem).
  • Querier: Executes LogQL queries by fetching chunk references from the index and reading chunk data from storage.
  • Compactor: Runs periodic compaction on the index (especially for boltdb-shipper) and handles retention enforcement by deleting old data.
  • Query Frontend (optional): Splits large queries into smaller ones, caches results, and distributes work across queriers.

Deployment Modes#

Loki supports three deployment modes, each suited to different scales.

Grafana Mimir for Long-Term Prometheus Storage

Grafana Mimir for Long-Term Prometheus Storage#

Prometheus stores metrics on local disk with a practical retention limit of weeks to a few months. Beyond that, you need a long-term storage solution. Grafana Mimir is a horizontally scalable, multi-tenant time series database designed for exactly this purpose. It is API-compatible with Prometheus – Grafana queries Mimir using the same PromQL, and Prometheus pushes data to Mimir via remote_write.

Mimir is the successor to Cortex. Grafana Labs forked Cortex, rewrote significant portions for performance, and released Mimir under the AGPLv3 license. If you see references to Cortex architecture, the concepts map directly to Mimir with improvements.

Grafana Organization: Folders, Permissions, Provisioning, and Dashboard Lifecycle

Folder Structure Strategy#

Grafana folders organize dashboards and control access through permissions. The folder structure you choose determines how teams find dashboards and who can edit them. Three patterns work in practice, each suited to a different organizational shape.

By Team#

When teams own distinct services and rarely need cross-team dashboards:

Platform/
  Node Overview
  Kubernetes Cluster
  Networking
Backend/
  API Gateway
  User Service
  Payment Service
Frontend/
  Web Vitals
  CDN Performance
Data/
  Kafka Pipelines
  ETL Jobs
  Data Quality

Each team gets Editor access to their folder and Viewer access to everything else. This works well when ownership boundaries are clear.

gRPC for Service-to-Service Communication

gRPC for Service-to-Service Communication#

gRPC is a high-performance RPC framework that uses HTTP/2 for transport and Protocol Buffers (protobuf) for serialization. For service-to-service communication within a microservices architecture, gRPC offers significant advantages over REST: strongly typed contracts, efficient binary serialization, streaming support, and code generation in every major language.

Why gRPC for Internal Services#

REST with JSON is the standard for public APIs. For internal service-to-service calls, gRPC is often the better choice.

gRPC Security: TLS, mTLS, Authentication Interceptors, and Token-Based Access Control

gRPC Security#

gRPC uses HTTP/2 as its transport, which means TLS is not just a security feature — it is a practical necessity. Many load balancers, proxies, and clients expect HTTP/2 over TLS (h2) rather than plaintext HTTP/2 (h2c). Securing gRPC means configuring TLS correctly, authenticating clients, authorizing RPCs, and handling the gRPC-specific gotchas that do not exist with REST APIs.

gRPC Over TLS#

Server-Side TLS in Go#

import (
    "crypto/tls"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials"
)

func main() {
    cert, err := tls.LoadX509KeyPair("server-cert.pem", "server-key.pem")
    if err != nil {
        log.Fatal(err)
    }

    tlsConfig := &tls.Config{
        Certificates: []tls.Certificate{cert},
        MinVersion:   tls.VersionTLS13,
    }

    creds := credentials.NewTLS(tlsConfig)
    server := grpc.NewServer(grpc.Creds(creds))

    pb.RegisterMyServiceServer(server, &myService{})

    lis, _ := net.Listen("tcp", ":50051")
    server.Serve(lis)
}

Client-Side TLS in Go#

import (
    "crypto/x509"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials"
)

func main() {
    // For public CAs (Let's Encrypt, etc.), use system cert pool
    creds := credentials.NewTLS(&tls.Config{
        MinVersion: tls.VersionTLS13,
    })

    // For internal CAs, load the CA cert explicitly
    caCert, _ := os.ReadFile("ca-cert.pem")
    certPool := x509.NewCertPool()
    certPool.AppendCertsFromPEM(caCert)
    creds = credentials.NewTLS(&tls.Config{
        RootCAs:    certPool,
        MinVersion: tls.VersionTLS13,
    })

    conn, err := grpc.NewClient("api.internal:50051",
        grpc.WithTransportCredentials(creds),
    )
    defer conn.Close()

    client := pb.NewMyServiceClient(conn)
}

TLS in Python#

import grpc

# Server
server_credentials = grpc.ssl_server_credentials(
    [(open("server-key.pem", "rb").read(), open("server-cert.pem", "rb").read())]
)
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
pb_grpc.add_MyServiceServicer_to_server(MyService(), server)
server.add_secure_port("[::]:50051", server_credentials)
server.start()

# Client
ca_cert = open("ca-cert.pem", "rb").read()
channel_credentials = grpc.ssl_channel_credentials(root_certificates=ca_cert)
channel = grpc.secure_channel("api.internal:50051", channel_credentials)
client = pb_grpc.MyServiceStub(channel)

Mutual TLS for gRPC#

mTLS is the strongest authentication model for service-to-service gRPC. Each service has a certificate, and both sides verify each other.

HAProxy Configuration and Operations

Configuration File Structure#

HAProxy uses a single configuration file, typically /etc/haproxy/haproxy.cfg. The configuration is divided into four sections: global (process-level settings), defaults (default values for all proxies), frontend (client-facing listeners), and backend (server pools).

global
    log /dev/log local0
    log /dev/log local1 notice
    maxconn 50000
    user haproxy
    group haproxy
    daemon
    stats socket /run/haproxy/admin.sock mode 660 level admin
    tune.ssl.default-dh-param 2048

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    timeout http-request 10s
    timeout http-keep-alive 10s
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http

The maxconn in the global section sets the hard limit on total simultaneous connections. The timeout values in defaults apply to all frontends and backends unless overridden. The stats socket directive enables the runtime API, which is essential for operational management.

HashiCorp Vault on Kubernetes: Secrets Management Done Right

HashiCorp Vault on Kubernetes#

Vault centralizes secret management with dynamic credentials, encryption as a service, and fine-grained access control. On Kubernetes, workloads authenticate using service accounts and pull secrets without hardcoding anything.

Installation with Helm#

helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update

Dev Mode (Single Pod, In-Memory)#

Automatically initialized and unsealed, stores everything in memory, loses all data on restart. Root token is root. Never use this in production.

helm upgrade --install vault hashicorp/vault \
  --namespace vault --create-namespace \
  --set server.dev.enabled=true \
  --set injector.enabled=true

Production Mode (HA with Integrated Raft Storage)#

Run Vault in HA mode with Raft consensus – a 3-node StatefulSet with persistent storage.

Helm Chart Development: Templates, Helpers, and Testing

Helm Chart Development#

Writing your own Helm charts turns static YAML into reusable, configurable packages. The learning curve is in Go’s template syntax and Helm’s conventions, but once you internalize the patterns, chart development is fast.

Chart Structure#

Create a new chart scaffold:

helm create my-app

This generates:

my-app/
  Chart.yaml              # chart metadata (name, version, dependencies)
  values.yaml             # default configuration values
  charts/                 # dependency charts (populated by helm dependency update)
  templates/              # Kubernetes manifest templates
    deployment.yaml
    service.yaml
    ingress.yaml
    serviceaccount.yaml
    hpa.yaml
    NOTES.txt             # post-install instructions (printed after helm install)
    _helpers.tpl           # named template definitions
    tests/
      test-connection.yaml # helm test pod

Chart.yaml#

The Chart.yaml defines your chart’s identity and dependencies:

Helm Values and Overrides: Precedence, Inspection, and Environment Patterns

Helm Values and Overrides#

Every Helm chart has a values.yaml file that defines defaults. When you install or upgrade a release, you override those defaults through values files (-f) and inline flags (--set). Getting the precedence wrong leads to silent misconfigurations where you think you set something but the chart used a different value.

Inspecting Chart Defaults#

Before overriding anything, look at what the chart provides. helm show values dumps the full default values.yaml for any chart:

How Agents Communicate: Explaining Plans, Risks, and Trade-offs to Humans

How Agents Communicate#

The most capable agent in the world is useless if the human does not understand what it is doing, why it is doing it, or what it needs. Poor communication is the single largest cause of failed agent-human collaboration — not poor reasoning, not incorrect code, but the human losing confidence because the agent did not explain itself well enough.

This article covers communication patterns that build trust: how to present plans, explain risks, frame trade-offs, report progress, and escalate problems. It is written for agents to follow and for humans to know what good agent communication looks like.