In a Nutshell

TCP is designed to be 'polite'—it probes the network for available capacity and backs off when it encounters congestion. However, different algorithms use different triggers. This article compares traditional loss-based algorithms like CUBIC with Google's newer delay-based BBR (Bottleneck Bandwidth and Round-trip propagation time).

The Congestion Window (CWND)

Regardless of the algorithm, TCP uses a Congestion Window to limit how many packets can be 'in flight'—sent but not yet acknowledged.

AllowedData=min(RWND,CWND)AllowedData = \min(RWND, CWND)

TCP Window Scaling

Stop-and-Wait vs. Sliding Window Pipeline

SERVER
CLIENT
Throughput940 Mbps
Latency Impact120ms RTT

Pipeline (Sliding Window): The server fills the "pipe" with packets. It doesn't wait for ACKs to send the next packet. As long as the window is open, data flows continuously.

The Receive Window (RWND) is the receiver's buffer advert. In the original 1981 TCP specification (RFC 793), this was limited to 16 bits (64 KB). In modern high-speed networks, this is a massive bottleneck.

1. Window Scaling Math ($2^n$)

To overcome the 64 KB limit, RFC 1323 introduced Window Scaling. It uses a bit shift count in the TCP options during the 3-way handshake.

EffectiveWindow=AdvertisedWindow×2ScaleFactorEffectiveWindow = AdvertisedWindow \times 2^{ScaleFactor}

With a maximum Scale Factor of 14, the TCP window can grow up to 1 GB ($65,535 \times 2^14 \approx 1,073,725,440$ bytes). This is essential for filling the Bandwidth-Delay Product (BDP) on long-haul fiber links.

The difference between congestion algorithms lies in how they grow and shrink the Congestion Window (CWND).

2. CUBIC: Growing until Failure

CUBIC is a loss-based algorithm and the current default in Linux. It increases the window according to a cubic function of the time since the last congestion event ($t$).

W(t)=C(tK)3+WmaxW(t) = C \cdot (t - K)^3 + W_{max}
  • WmaxW_{max}: The window size before the last reduction.
  • KK: The time period it takes to reach WmaxW_{max} again.
  • CC: A scaling constant (typically 0.4).

3. BBR: Bottleneck Bandwidth & RTT

Developed by Google, BBR is model-based. Unlike CUBIC, which reacts to loss, BBR attempts to find the Kleinrock optimal operating point where the delivery rate is maximized and the round-trip time is minimized.

The BDP Control Loop

TargetValue=BWmax×RTTmin×GainTargetValue = BW_{max} \times RTT_{min} \times Gain

BBR maintains a moving windowed max of delivery rate and a moving windowed min of RTT. It regulates the pacing rate and the congestion window to match the physical capacity of the pipe.

BBR iterates through four states to maintain its model:

  • Startup: Exponentially increases pacing rate until delivery rate plateaus (similar to Slow Start).
  • Drain: Lowers the rate to drain any queues built up during Startup.
  • ProbeBW: Most steady-state time. It cycles its gain (e.g., 1.25, 0.75, 1.0) to probe for bandwidth while clearing potential queues.
  • ProbeRTT: Every 10 seconds, it reduces the window to just 4 packets to measure the true physical RTTminRTT_{min}.

4. High-Speed Tuning: SACK, TS, and Autotuning

Beyond congestion algorithms, several TCP options are essential for modern high-bandwidth connections.

4.1. Selective Acknowledgments (SACK)

Without SACK (RFC 2018), if one packet in a window is lost, the sender might have to retransmit the entire window. SACK allows the receiver to say "I got packets 1, 2, 4, and 5, but I missed 3," enabling precise retransmission.

4.2. TCP Timestamps (PAWS)

At Gigabit speeds, TCP sequence numbers (32-bit) can wrap around in seconds. Timestamps (RFC 1323) provide a protection against Wrapped Sequence Numbers (PAWS), ensuring that old packets from a previous wrap don't get accepted as new data.

Linux Autotuning

Modern Linux kernels use TCP Autotuning. They dynamically adjust tcp_rmem and tcp_wmem based on available system memory and connection BDP. Manual tuning is rarely needed unless you are dealing with extreme high-performance compute (HPC) environments.

ECN Ready

Explicit Congestion Notification (ECN) allows routers to mark packets instead of dropping them. When combined with BBR or CUBIC, this dramatically reduces retransmission overhead.

Optimization is not a one-size-fits-all process. Understanding the physical constraints of your path—latency, jitter, and buffer depth—is the first step toward effective transport layer engineering.

Choosing the right congestion algorithm is a critical step in Reliability Engineering for long-distance data centers.

Share Article

Technical Standards & References

REF [TCP-OPT]
IETF
TCP Performance Optimization
VIEW OFFICIAL SOURCE
REF [BBR]
Google
BBR Congestion Control
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.

Related Engineering Resources