API Gateway Patterns: Selection, Configuration, and Routing

API Gateway Patterns#

An API gateway sits between clients and your backend services. It handles cross-cutting concerns – authentication, rate limiting, request transformation, routing – so your services do not have to. Choosing the right gateway and configuring it correctly is one of the first decisions in any microservices architecture.

Gateway Responsibilities#

Before selecting a gateway, clarify which responsibilities it should own:

  • Routing – directing requests to the correct backend service based on path, headers, or method.
  • Authentication and authorization – validating tokens, API keys, or certificates before requests reach backends.
  • Rate limiting – protecting backends from traffic spikes and enforcing usage quotas.
  • Request/response transformation – modifying headers, rewriting paths, converting between formats.
  • Load balancing – distributing traffic across service instances.
  • Observability – emitting metrics, logs, and traces for every request that passes through.
  • TLS termination – handling HTTPS so backends can speak plain HTTP internally.

No gateway does everything equally well. The right choice depends on which of these responsibilities matter most in your environment.

Building an API with Cloudflare Workers and D1: From Zero to Production

Building an API with Cloudflare Workers and D1#

This tutorial walks through building a production API on Cloudflare Workers with a D1 database, KV caching, rate limiting, full-text search, and request logging. The patterns come from a real production deployment – not a toy example.

By the end you will have: a TypeScript Worker handling multiple API routes, a D1 database with FTS5 full-text search, KV-based caching and rate limiting, CORS support, request logging with IP hashing for privacy, and a deployment to Cloudflare’s global network.

Nginx Configuration Patterns for Production

Configuration File Structure#

Nginx configuration follows a hierarchical block structure. The main configuration file is typically /etc/nginx/nginx.conf, which includes files from /etc/nginx/conf.d/ or /etc/nginx/sites-enabled/.

# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;           # Match CPU core count
error_log /var/log/nginx/error.log warn;
pid /run/nginx.pid;

events {
    worker_connections 1024;     # Per worker process
    multi_accept on;             # Accept multiple connections at once
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    '$request_time $upstream_response_time';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    client_max_body_size 16m;

    include /etc/nginx/conf.d/*.conf;
}

The worker_processes auto directive sets one worker per CPU core. Each worker handles worker_connections simultaneous connections, so total capacity is worker_processes * worker_connections. For most production deployments, auto with 1024 connections per worker is sufficient.

Rate Limiting Implementation Patterns

Rate Limiting Implementation Patterns#

Rate limiting controls how many requests a client can make within a time period. It protects services from overload, ensures fair usage across clients, prevents abuse, and provides a mechanism for graceful degradation under load. Every production API needs rate limiting at some layer.

Algorithm Comparison#

Fixed Window#

The simplest algorithm. Divide time into fixed windows (e.g., 1-minute intervals) and count requests per window. When the count exceeds the limit, reject requests until the next window starts.

Scenario: Preparing for and Handling a Traffic Spike

Scenario: Preparing for and Handling a Traffic Spike#

You are helping when someone says: “we have a big launch next week,” “Black Friday is coming,” or “traffic is suddenly 3x normal and climbing.” These are two distinct problems – proactive preparation for a known event and reactive response to an unexpected surge – but they share the same infrastructure mechanics.

The key principle: Kubernetes autoscaling has latency. HPA takes 15-30 seconds to detect increased load and scale pods. Cluster Autoscaler takes 3-7 minutes to provision new nodes. If your traffic spike is faster than your scaling speed, users hit errors during the gap. Proactive preparation eliminates this gap. Reactive response minimizes it.

Secure API Design: Authentication, Authorization, Input Validation, and OWASP API Top 10

Secure API Design#

Every API exposed to any network — public or internal — is an attack surface. The difference between a secure API and a vulnerable one is not exotic cryptography. It is consistent application of known patterns: authenticate every request, authorize every action, validate every input, and limit every resource.

Authentication Schemes#

API Keys#

The simplest scheme. The client sends a static key in a header:

GET /api/v1/data HTTP/1.1
Host: api.example.com
X-API-Key: sk_live_abc123def456

API keys are appropriate for:

Securing Kubernetes Ingress: TLS, Rate Limiting, WAF, and Access Control

Securing Kubernetes Ingress#

The ingress controller is the front door to your cluster. Every request from the internet passes through it, making it both the most exposed component and the best place to enforce security controls. Most teams deploy an ingress controller and stop at basic routing. That leaves the door wide open.

TLS Termination and HTTPS Enforcement#

Every ingress should terminate TLS. Never serve production traffic over plain HTTP. With nginx-ingress, force HTTPS redirects and add HSTS headers: