Load Balancing Algorithms
The Logic of Horizontal Scale
The "Traffic Cop" Problem
The goal of load balancing is to maximize throughput, minimize response time, and avoid overloading any single server while fully utilizing each resource in the pool. A naively configured load balancer will simply route to any available server, but the choice of algorithm has profound effects on how gracefully the system degrades under high load, how efficiently caches are used, and whether stateful application requirements are satisfied.
Advanced Algorithms: Weighted and Adaptive
Beyond the core three, production environments often require more nuanced routing logic:
- Weighted Round Robin: Assigns a "weight" to each backend based on its capacity. A server with 4 CPU cores gets twice the weight of a 2-core server. Simple to configure but static — cannot react to real-time load changes.
- Least Response Time: Routes to the backend with the lowest average response time in addition to fewest connections. Captures application-level performance (not just connection count), but requires continuous measurement by the LB probe.
- Random with Two Choices (Power of Two): Picks two random servers and routes to the one with fewer connections. Achieves near-optimal load distribution with constant O(1) complexity — used at massive scale by systems like NGINX and HAProxy for their default upstream selection.
Health Checks: The Pulse of the Pool
An algorithm is useless if it sends traffic to a dead server. Load balancers perform periodic Health Checks to validate backend availability before routing. The depth of the check determines both accuracy and overhead:
- L3/L4 Check: Can I ping the server? Is port 443 open? Detects network-level failures (server rebooted, firewall blocked) but misses application hangs where the port accepts connections but the app deadlocks.
- L7 (Active) Check: Does the
/healthendpoint return a200 OKwith a valid JSON body? This detects application-level hangs even if the network stack is up — the most accurate but most expensive check to run frequently.
Session Persistence (Sticky Sessions)
Many legacy applications store user data in local server RAM (e.g., PHP session files). If a user's first request goes to Server A and their second goes to Server B, their session data is missing and they appear 'logged out.'
We solve this with Session Affinity using two approaches: (1) Cookie-based — the LB injects a sticky cookie (SERVERID=A), ensuring all subsequent requests route to the same backend regardless of their source IP; or (2) Source IP hashing — deterministic but fragile when clients share NAT addresses. Modern architecture prefers Stateless Services where all session data is stored externally in a shared Redis cache, allowing any backend to serve any request with no affinity required.
Conclusion
Load balancing is the foundation of high availability and horizontal scalability. By choosing the right algorithm for your traffic pattern — Round Robin for stateless burst workloads, Least Connections for long-lived streams, and IP Hash for legacy stateful applications — and pairing it with accurate L7 health checks and slow-start ramp policies, you transform a fragile single point of failure into a resilient, auto-healing cluster that can absorb the failure of any individual node without user impact.