1. Layer 4 vs. Layer 7: Speed vs. Context
The first decision in traffic engineering is the plane of resolution. **Layer 4 (L4)** operates at the transport layer, while **Layer 7 (L7)** understands the application payload.
The Performance Trade-off
Layer 4 (Speed)
Routes based on Source/Dest IP and Port. Extremely low latency (ASIC/DPDK speed) because it doesn't wait for the full packet to arrive. Ideal for simple load distribution.
Layer 7 (Logic)
Routes based on URLs, Headers, and Cookies. Consumes more CPU but allows for 'Smart Routing' (e.g., sending /api to the Go pool and /images to the S3 bucket).
Load Distribution Engine
Visualize how incoming traffic is distributed across backend servers.
Round Robin guarantees an equal number of requests sent to each server over time. However, it blindly sends traffic without considering the actual load (active connections) on the servers, which can lead to imbalance if some requests take longer to process than others.
2. Consistent Hashing: Protecting the Cache
In a standard IP Hash (Source IP % N Servers), adding a single server changes 'N', which re-maps almost every client to a new server. This destroys cache affinity. **Consistent Hashing** (based on the Ketama algorithm) solves this.
The Ring Equation
Servers and request keys are hashed onto a 160-bit ring. When a server is removed, only the requests that belonged to that specific server are reassigned to the next clockwise neighbor. This ensures that only 1/N connections are disrupted.
P2c: Power of Two Choices
In massive clusters, checking the health of 1,000 servers for every request is too slow. P2c picks 2 servers at random and chooses the best one. This achieves nearly the same performance as 'Least Connections' but with constant-time computation.
3. GSLB & Anycast: Global Traffic Steering
How does a user in London get a different server than a user in Tokyo? We use **GSLB** (Global Server Load Balancing) and **Anycast BGP**.
The TTL War: DNS Steering
GSLB is just a smart DNS server. It returns the 'nearest' IP based on the user's source IP. The challenge is TTL (Time To Live). If a data center dies, you must lower the TTL to 60s or less to ensure the DNS records expire quickly, otherwise, users will be 'stuck' to a dead site.
BGP Anycast Paradox:
Anycast uses the same IP advertised from multiple locations. The network (BGP) naturally sends users to the 'closest' node. However, Anycast is blind to application health. If the 'closest' node is on fire, BGP will still send you there until the route is withdrawn.
4. Adaptive Balancing: EWMA & Gray Failures
A server that is 'up' but slow is more dangerous than a server that is 'down.' We use **EWMA (Exponentially Weighted Moving Average)** to detect these 'Gray Failures.'
The Latency Tracker
By giving more weight to the most recent responses (), the load balancer can detect if a server is starting to throttle within milliseconds and 'Soft Drain' its traffic before a formal health check fails.
Frequently Asked Questions
Technical Standards & References
Related Engineering Resources
Maglev: Google's Consistent Hash Table at Global Scale
Google's Maglev (NSDI 2016) is a software-based load balancer that processes 1 Gbps per core with a connection lookup table that uses consistent hashing. Unlike hardware LBs that rely on TCAM-based flow tables limited to a few hundred thousand entries, Maglev uses a Consistent Hash Table (CHT) that maps the 5-tuple of each connection to one of the backend servers. The CHT is a lookup table of size (a prime number), where each entry points to one backend. When a backend is added or removed, the CHT is recomputed and the affected entries (approximately for backends) are updated:
The key performance metric is Connection Tracking Rate: a server receiving 10 Mpps with 100+ byte packets must classify and forward each packet in under 100 ns. Maglev achieves this by (1) hashing the 5-tuple using a CRC32c hardware instruction (12 ns), (2) looking up the CHT entry via array indexing (3 ns), and (3) forwarding the packet to the backend's virtual MAC address via a pre-populated neighbor table. The total per-packet processing cost is 50-80 ns, well under the 100 ns budget. The CHT must be updated within 10 ms of a backend failure to prevent new connections from being assigned to the dead backend. Maglev uses a Rendezvous Hash mechanism where each connection is first hashed to a virtual "rendezvous point" in the CHT, and the two nearest backends clockwise from the point are selected. This provides Affinity for Consistent Hashing: existing connections to the surviving backend remain uninterrupted, while only the connections previously assigned to the failed backend are redirected to the new second-choice backend, minimizing the disruption to live traffic.
Direct Server Return: The Asymmetric Path Optimization
Direct Server Return (DSR), also known as Triangular Routing, eliminates the load balancer as a bottleneck for return traffic. In the standard proxy model, the client sends a request to the VIP, the load balancer rewrites the destination MAC to the backend server's MAC, the backend processes the request, and the response must flow back through the load balancer (which then rewrites the source IP back to the VIP). This creates a bottleneck: the LB must process both inbound and outbound traffic, doubling its throughput requirement. In DSR, the backend server sends the response directly to the client, bypassing the LB entirely. The path is asymmetric: request goes LB → server, response goes server → client:
This halves the LB throughput requirement—a 100 Gbps LB can terminate 100 Gbps of connections instead of 50 Gbps. The implementation requires that the backend server configures a loopback interface with the VIP address (for the client to see the correct source IP on the response) and enables reverse path filtering to accept the response's source MAC from the directly connected router rather than the LB. In Linux, this is done by setting (loose mode) and adding the VIP to the loopback interface. DSR is the standard configuration for L4 LBs in 2026 (AWS NLB, Google's Maglev, Azure ILB) because it halves the hardware cost and eliminates the LB as a latency bottleneck for response data. The trade-off is that DSR cannot perform Connection Draining: if a backend fails, in-flight response packets are lost because the LB cannot buffer the TCP stream. The application must handle retransmission at the client side or use a dual-LB configuration where a secondary LB monitors for server failure and injects RST packets on behalf of the failed server.
"You are our partner in accuracy. If you spot a discrepancy in calculations, a technical typo, or have a field insight to share, don't hesitate to reach out. Your expertise helps us maintain the highest standards of reliability."
Contributors are acknowledged in our technical updates.