The Signal Before the Stall

In a high-speed AI fabric, congestion is inevitable. When thousands of GPUs synchronize their model weights simultaneously (All-Reduce), switches experience massive bursts of traffic. **Explicit Congestion Notification (ECN)** is the protocol mechanism that prevents these bursts from filling switch buffers entirely, a phenomenon known as "Buffer Bloat" that causes massive latency spikes.

WRED Detection

Switches use Weighted Random Early Detection (WRED) to identify buffer growth. Instead of dropping, they change the ECN bits in the IP header to binary '11' (CE - Congestion Experienced).

CNP Notification

When a receiver sees a 'CE' marked packet, it sends a **Congestion Notification Packet (CNP)** back to the source NIC, requesting immediate rate reduction.

The DCQCN Algorithm Cycle

Loading Visualization...

For RoCE v2 fabrics, the industry standard is **DCQCN** (Data Center Quantized Congestion Control). It operates in a 4-step recurring cycle:

01

Marking Threshold (Kmin/Kmax)

The switch starts marking ECN bits when the buffer exceeds Kmin and increases probability until Kmax.

02

RC Feedbacks

The Receiver (Reaction Point) generates CNPs whenever it encounters CE bits in the RDMA stream.

03

Rate Reduction

The Source (Sender) processes CNPs and immediately throttles its transmission speed to prevent buffer overflow.

04

Rate Recovery

If no CNPs are received for a specific time window, the source incrementally ramps back up to full line-rate.

ECN Threshold Tuner

Tuning Kmin and Kmax is an art. Too aggressive and you sacrifice throughput. Too lax and you trigger PFC pause storms. Use our tuner to find the optimal DCQCN settings.

Managing 'Incast' Congestion

The most common cause of buffer bloat in AI clusters is **TCP/Incast**, where multiple senders transmit to a single receiver simultaneously. ECN is exceptionally good at handling Incast because:

  • Fine-Grained Throttling: Unlike PFC which pauses the whole link, ECN marks only the affected flows.
  • Zero-Packet Loss: By signaling at 50-80% buffer utilization, ECN prevents the buffer from ever reaching 100%, ensuring no packets are dropped.
Share Article

Technical Standards & References

REF [rfc-3168]
RFC 3168 (2001)
The Addition of Explicit Congestion Notification (ECN) to IP
Published: IETF
VIEW OFFICIAL SOURCE
REF [roce-dcqcn-2015]
Mellanox/Microsoft (2015)
DCQCN: Data Center Quantized Congestion Control
Published: SIGCOMM
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.