TCP Congestion Control
The Math Behind the Internet's Speed
1. The Sliding Window: Flow Control
Before managing the network traffic, TCP must manage the individual connection. The Receiver Window (rwnd) is how much data the receiver can handle at once. If the receiver has a small buffer, the sender must slow down, regardless of how fast the network is.
2. The Congestion Phases
A TCP session involves a constant "feeling out" of the network:
- Slow Start: Double the number of packets sent every RTT. (1, 2, 4, 8...).
- Congestion Avoidance: Once a threshold is hit, increase linearly (+1 packet per RTT).
- Fast Recovery: If a packet is lost, cut the speed in half and start growing again.
TCP Congestion Control Simulator
Visualizing Window Scaling Algorithms (cwnd)
Standard Sawtooth: Linear growth during Avoidance. Cuts window by 50% on loss.
3. The Evolution of Congestion Algorithms
TCP has evolved significantly since its inception. Understanding the history of these algorithms is crucial for modern network tuning.
TCP Tahoe & Reno (The Classics)
TCP Tahoe (1988): Introduced Slow Start, Congestion Avoidance, and Fast Retransmit. If a packet was lost, the window size (CWND) effectively crashed to 1 MSS (Maximum Segment Size), forcing a slow restart.
TCP Reno (1990): Improved upon Tahoe with Fast Recovery. Instead of resetting to 1 MSS on packet loss, it halved the CWND and entered a linear growth phase. This kept the pipe "fuller" during loss events.
TCP CUBIC (The Standard)
CUBIC became the default in Linux kernels because it solves the "RTT unfairness" of Reno. Reno grows its window based on RTT (shorter RTT = faster growth). CUBIC uses a cubic function of time since the last congestion event, making window growth independent of RTT.
The Mathis Equation
This formula approximates the maximum throughput of a TCP Reno connection based on packet loss ():
Implication: As latency (RTT) increases, throughput drops linearly. But as packet loss () increases, throughput drops by the square root. A small amount of loss on a long link is catastrophic.
4. Google's BBR: Breaking the Rules
BBR (Bottleneck Bandwidth and RTT) discards the "loss = congestion" assumption entirely.
Traditional TCP fills buffers until they overflow (drop packets). BBR models the network pipe to find: 1. BtlBw: The bottleneck bandwidth (how fast the slowest link is). 2. RTprop: The round-trip propagation time (latency without queueing).
By pacing packets exactly at the BtlBw rate, BBR prevents queues from forming in the first place, solving Bufferbloat.
5. ECN: Explicit Congestion Notification
Traditional TCP relies on packet drops to signal congestion. This is a binary signal: either everything is fine, or the network is full. Explicit Congestion Notification (ECN), defined in RFC 3168, allows routers to mark the IP header with a Congestion Experienced (CE) bit before they are forced to drop data.
From ECN to L4S
While standard ECN helps, it still triggers a massive rate reduction (halving the window). This causes "sawtooth" patterns in throughput. L4S (Low Latency, Low Loss, Scalable Throughput) is the evolution.
- 1. Dual Queue FQ: Modern routers separate L4S traffic into a zero-latency queue.
- 2. Scalable Marking: L4S uses a much higher marking frequency to allow tiny, smooth adjustments.
5.1. The Prague Requirements
For L4S to work, the endpoint must follow the TCP Prague requirements. This is a collection of best practices for low-latency congestion control:
- Reduced RTT Dependence: The algorithm should not favor short-path connections over long ones.
- Accurate ECN: Uses all available bits in the TCP header to transmit the precise percentage of marking, not just a binary "yes/no" mark.
- Pacing: Must use packet pacing to avoid "bursty" behavior that overwhelms small router buffers.
6. Protocol Fairness: The BBR vs. CUBIC War
When different algorithms share the same link, "fairness" becomes an issue.
- Loss-based (CUBIC) grows until the buffer is full.
- Model-based (BBR) stops before the buffer is full.
In an unmanaged queue (DropTail), BBR tends to "bully" CUBIC. Because BBR keeps the queue slightly full to probe for bandwidth, CUBIC sees this as constant high latency and backs off, while BBR continues to consume the link. This has led to the development of BBRv2 and BBRv3, which are more "polite" to loss-based traffic.
7. Beyond TCP: The QUIC Paradigm Shift
QUIC (RFC 9000), the foundation of HTTP/3, moves congestion control out of the kernel and into userspace.
QUIC Innovation Highlights
-
No Head-of-Line Blocking: In TCP, if one packet is lost, all streams stop. In QUIC, only the affected stream waits; others continue. This prevents a single drop from ruining a complex webpage load.
-
Connection Migration: QUIC uses a Connection ID instead of the IP:Port tuple. You can move from 5G to Wi-Fi without dropping your download session.
Conclusion
Congestion control is the primary reason the internet survived its transition from megabits to terabits. As we move toward 6G and satellite links (Starlink) with highly variable latency, the mathematics of BBR and other advanced controllers will become even more critical to the user experience and overall system MTBF.