The Physics of Load Balancing: Algorithms & Health Checks

The "Traffic Cop" Problem

The goal of load balancing is to maximize throughput, minimize response time, and avoid overloading any single server while fully utilizing each resource in the pool. A naively configured load balancer will simply route to any available server, but the choice of algorithm has profound effects on how gracefully the system degrades under high load, how efficiently caches are used, and whether stateful application requirements are satisfied.

Technical Trio: Core Algorithms

Round Robin: Requests are sent sequentially to each server in the list. Simple, deterministic, and zero overhead — but doesn't account for server load or request complexity. A heavy long-lived query gets the same weight as a trivial health-check response.
Least Connections: Sends the next request to the server with the fewest active connections. Ideal for long-lived connections (e.g., database streams, WebSockets). Requires the LB to maintain real-time state of open connections per backend.
IP Hash: Uses the client's IP address (or a consistent hash of it) to consistently route them to the same server. Essential for applications that store session state in local memory. Predictable but inflexible — a single large-IP client (like a corporate NAT) can overload one backend.

Traffic Distribution Lab

L7 Algorithm Efficiency & Failover Simulation

Ingress Traffic Rate1 req/s

Server A

Server B

Server C

Client 1

Client 2

Client 3

PROXY_LB

Server A

Load: 0 connCap: 20

Server B

Load: 0 connCap: 15

Server C

Load: 0 connCap: 25

Round Robin: Best for clusters where all servers have identical specs and requests take roughly the same time to process. Ignores actual server performance.

Least Connections: The dynamic choice. If Server B is processing a heavy 1GB download, the LB knows and sends new lighter requests to Server A/C instead.

IP Hash / Persistence: Critical for legacy apps. By hashing the source IP, we ensure Client 1 always hits Server A, maintaining their local session state.

Advanced Algorithms: Weighted and Adaptive

Beyond the core three, production environments often require more nuanced routing logic:

Weighted Round Robin: Assigns a "weight" to each backend based on its capacity. A server with 4 CPU cores gets twice the weight of a 2-core server. Simple to configure but static — cannot react to real-time load changes.
Least Response Time: Routes to the backend with the lowest average response time in addition to fewest connections. Captures application-level performance (not just connection count), but requires continuous measurement by the LB probe.
Random with Two Choices (Power of Two): Picks two random servers and routes to the one with fewer connections. Achieves near-optimal load distribution with constant O(1) complexity — used at massive scale by systems like NGINX and HAProxy for their default upstream selection.

Health Checks: The Pulse of the Pool

An algorithm is useless if it sends traffic to a dead server. Load balancers perform periodic Health Checks to validate backend availability before routing. The depth of the check determines both accuracy and overhead:

L3/L4 Check: Can I ping the server? Is port 443 open? Detects network-level failures (server rebooted, firewall blocked) but misses application hangs where the port accepts connections but the app deadlocks.
L7 (Active) Check: Does the /health endpoint return a 200 OK with a valid JSON body? This detects application-level hangs even if the network stack is up — the most accurate but most expensive check to run frequently.

Session Persistence (Sticky Sessions)

Many legacy applications store user data in local server RAM (e.g., PHP session files). If a user's first request goes to Server A and their second goes to Server B, their session data is missing and they appear 'logged out.'

We solve this with Session Affinity using two approaches: (1) Cookie-based — the LB injects a sticky cookie (SERVERID=A), ensuring all subsequent requests route to the same backend regardless of their source IP; or (2) Source IP hashing — deterministic but fragile when clients share NAT addresses. Modern architecture prefers Stateless Services where all session data is stored externally in a shared Redis cache, allowing any backend to serve any request with no affinity required.

Conclusion

Load balancing is the foundation of high availability and horizontal scalability. By choosing the right algorithm for your traffic pattern — Round Robin for stateless burst workloads, Least Connections for long-lived streams, and IP Hash for legacy stateful applications — and pairing it with accurate L7 health checks and slow-start ramp policies, you transform a fragile single point of failure into a resilient, auto-healing cluster that can absorb the failure of any individual node without user impact.

Engineering Knowledge Expansion

Traffic Engineering

Load Balancing Algorithms

In a Nutshell

The "Traffic Cop" Problem

Advanced Algorithms: Weighted and Adaptive

Health Checks: The Pulse of the Pool

Session Persistence (Sticky Sessions)

Conclusion

Layer 4 & 7 Load Balancing Algorithms | Pingdo Network Hub

API Gateway Architecture

Proactive Network Maintenance: Reliability Engineering Principles | Pingdo

Technical Standards & References