1. Layer 4 vs. Layer 7: Speed vs. Context
The first decision in traffic engineering is the plane of resolution. **Layer 4 (L4)** operates at the transport layer, while **Layer 7 (L7)** understands the application payload.
The Performance Trade-off
Layer 4 (Speed)
Routes based on Source/Dest IP and Port. Extremely low latency (ASIC/DPDK speed) because it doesn't wait for the full packet to arrive. Ideal for simple load distribution.
Layer 7 (Logic)
Routes based on URLs, Headers, and Cookies. Consumes more CPU but allows for 'Smart Routing' (e.g., sending /api to the Go pool and /images to the S3 bucket).
Load Distribution Engine
Visualize how incoming traffic is distributed across backend servers.
Round Robin guarantees an equal number of requests sent to each server over time. However, it blindly sends traffic without considering the actual load (active connections) on the servers, which can lead to imbalance if some requests take longer to process than others.
2. Consistent Hashing: Protecting the Cache
In a standard IP Hash (Source IP % N Servers), adding a single server changes 'N', which re-maps almost every client to a new server. This destroys cache affinity. **Consistent Hashing** (based on the Ketama algorithm) solves this.
The Ring Equation
Servers and request keys are hashed onto a 160-bit ring. When a server is removed, only the requests that belonged to that specific server are reassigned to the next clockwise neighbor. This ensures that only 1/N connections are disrupted.
P2c: Power of Two Choices
In massive clusters, checking the health of 1,000 servers for every request is too slow. P2c picks 2 servers at random and chooses the best one. This achieves nearly the same performance as 'Least Connections' but with constant-time computation.
3. GSLB & Anycast: Global Traffic Steering
How does a user in London get a different server than a user in Tokyo? We use **GSLB** (Global Server Load Balancing) and **Anycast BGP**.
The TTL War: DNS Steering
GSLB is just a smart DNS server. It returns the 'nearest' IP based on the user's source IP. The challenge is TTL (Time To Live). If a data center dies, you must lower the TTL to 60s or less to ensure the DNS records expire quickly, otherwise, users will be 'stuck' to a dead site.
BGP Anycast Paradox:
Anycast uses the same IP advertised from multiple locations. The network (BGP) naturally sends users to the 'closest' node. However, Anycast is blind to application health. If the 'closest' node is on fire, BGP will still send you there until the route is withdrawn.
4. Adaptive Balancing: EWMA & Gray Failures
A server that is 'up' but slow is more dangerous than a server that is 'down.' We use **EWMA (Exponentially Weighted Moving Average)** to detect these 'Gray Failures.'
The Latency Tracker
By giving more weight to the most recent responses (), the load balancer can detect if a server is starting to throttle within milliseconds and 'Soft Drain' its traffic before a formal health check fails.
Frequently Asked Questions
Technical Standards & References
Related Engineering Resources
"You are our partner in accuracy. If you spot a discrepancy in calculations, a technical typo, or have a field insight to share, don't hesitate to reach out. Your expertise helps us maintain the highest standards of reliability."
Contributors are acknowledged in our technical updates.