Linux Troubleshooting: A Systematic Approach to Diagnosing System Issues

The USE Method: A Framework for Systematic Diagnosis#

The USE method, developed by Brendan Gregg, provides a structured approach to system performance analysis. For every resource on the system – CPU, memory, disk, network – you check three things:

  • Utilization: How busy is the resource? (e.g., CPU at 90%)
  • Saturation: Is work queuing because the resource is overloaded? (e.g., CPU run queue length)
  • Errors: Are there error events? (e.g., disk I/O errors, network packet drops)

This method prevents the common trap of randomly checking things. Instead, you systematically walk through each resource and check all three dimensions. If you find high utilization, saturation, or errors on a resource, you have found your bottleneck.

Load Balancer Patterns: L4 vs L7, Health Checks, Session Affinity, and Cloud LB Selection

L4 vs L7 Load Balancing#

The distinction between Layer 4 and Layer 7 load balancing determines what the load balancer can see and what routing decisions it can make.

Layer 4 (Transport) load balancers work at the TCP/UDP level. They see source/destination IPs and ports but not the content of the traffic. They forward raw TCP connections to backends. This makes them fast (no protocol parsing), protocol-agnostic (works for HTTP, gRPC, database connections, custom protocols), and transparent (the backend sees the original packets, mostly). Use L4 for database connections, raw TCP services, and when you need maximum throughput with minimum latency.

Minikube with Docker Driver on Apple Silicon

Why the Docker Driver on ARM64#

When running Minikube on Apple Silicon (M1/M2/M3/M4), the driver you choose determines whether your containers run natively or through emulation. The Docker driver runs containers directly on the host architecture — ARM64 — with zero emulation overhead.

This matters because QEMU user-mode emulation, which kicks in when you try to run amd64 images on ARM64, cannot reliably execute Go binaries. The specific failure is a crash in lfstack.push, deep in Go’s runtime memory management. This is not a fixable application bug — it is a fundamental incompatibility between QEMU’s user-mode emulation and Go’s lock-free stack implementation.

SSH Hardening and Management: Key Management, Bastion Hosts, and SSH Certificates

SSH Key Management#

SSH keys replace password authentication with cryptographic key pairs. The choice of algorithm matters:

Ed25519 (recommended): Based on elliptic curve cryptography. Produces small keys (256 bits) that are faster and more secure than RSA. Supported by OpenSSH 6.5+ (2014) – virtually all modern systems.

ssh-keygen -t ed25519 -C "user@hostname"

RSA 4096 (legacy compatibility): Use only when connecting to systems that do not support Ed25519. Always use 4096 bits – the default 3072 is adequate but 4096 provides a safety margin.

systemd Service Management: Units, Timers, Journal, and Socket Activation

Unit Types#

systemd manages the entire system through “units,” each representing a resource or service. The most common types:

  • service: Daemons and long-running processes (nginx, postgresql, your application).
  • timer: Scheduled execution, replacing cron. More flexible and better integrated with logging.
  • socket: Network sockets that trigger service activation on connection. Enables lazy startup and zero-downtime restarts.
  • target: Groups of units that represent system states (multi-user.target, graphical.target). Analogous to SysV runlevels.
  • mount: Filesystem mount points managed by systemd.
  • path: Watches filesystem paths and activates units when changes occur.

Unit files live in three locations, in order of precedence: /etc/systemd/system/ (local admin overrides), /run/systemd/system/ (runtime, non-persistent), and /usr/lib/systemd/system/ (package-installed defaults). Always put custom units in /etc/systemd/system/.

Terraform State Management Patterns

Why Remote State#

Terraform stores the mapping between your configuration and real infrastructure in a state file. By default this is a local terraform.tfstate file. That breaks the moment a second person or a CI pipeline needs to run terraform apply. Remote state solves three problems: team collaboration (everyone reads the same state), CI/CD access (pipelines need state without copying files), and disaster recovery (your laptop dying should not lose your infrastructure mapping).

TLS Deep Dive: Certificate Chains, Handshake, Cipher Suites, and Debugging Connection Issues

The TLS Handshake#

Every HTTPS connection starts with a TLS handshake that establishes encryption parameters and verifies the server’s identity. The simplified flow for TLS 1.2:

Client                              Server
  |── ClientHello ──────────────────>|   (supported versions, cipher suites, random)
  |<────────────────── ServerHello ──|   (chosen version, cipher suite, random)
  |<──────────────── Certificate  ──|   (server's certificate chain)
  |<───────────── ServerKeyExchange ─|   (key exchange parameters)
  |<───────────── ServerHelloDone  ──|
  |── ClientKeyExchange ───────────>|   (client's key exchange contribution)
  |── ChangeCipherSpec ────────────>|   (switching to encrypted communication)
  |── Finished ────────────────────>|   (encrypted verification)
  |<──────────── ChangeCipherSpec ──|
  |<──────────────────── Finished ──|
  |<═══════ Encrypted traffic ═════>|

TLS 1.3 simplifies this significantly. The client sends its key share in the ClientHello, allowing the handshake to complete in a single round trip. TLS 1.3 also removed insecure cipher suites and compression, eliminating entire classes of vulnerabilities (BEAST, CRIME, POODLE).