ECN & DCQCN Threshold Modeler
A precision simulator for congestion control in lossless fabrics. Optimize your Kmin, Kmax, and Pmax parameters to maximize AI training goodput.
Fabric Topology
Threshold where CE bits start being marked in packet headers.
Threshold where 100% of packets are marked for proactive throttling.
DCQCN Probability Curve
Calculated marking probability for a 400G line-rate congestion event.
BDP (Giant)
488.3 KB
Buffer Usage
1.4%
XOFF Safety
Headroom: 32318 KB
"Optimal ECN ensures throughput remains at 99.9% while preventing PFC pauses from ever firing."
1. The Marking Probability: Modeling Buffer Occupancy
ECN does not work like a toggle; it is a probabilistic engine. The goal is to provide a "Soft Brake" that scales in intensity as the queue grows.
Linear Marking Equation
By setting too low, we trigger rate limiting prematurely, hurting throughput. By setting it too high, we risk hitting the switch ASIC's physical buffer limit, triggering **PFC PAUSE** which kills performance for all flows.
2. DCQCN: The RDMA Feedback Loop
Standard ECN was designed for TCP. **DCQCN** is a specialized version for hardware-accelerated RDMA (RoCE v2).
The Feedback (CNP)
When the receiver receives a packet with the ECN bit set (CE=11), it calculates a 'Congestion Notification Packet' (CNP) and sends it back to the sender. This 64-byte frame is the "Brake Pedal" signal.
The Rate Limiter (RL)
The sender's NIC receives the CNP and instantly clamps the hardware rate limiter for that specific flow. It then slowly 'probes' for higher bandwidth (Additive Increase) until it sees another CNP.
3. The PFC Collision Course: Kmax vs. XOFF
The ultimate goal of ECN tuning in an AI fabric is to ensure that ECN rate limiting happens **before** PFC flow control is triggered.
Threshold Strategy
1. **Kmax < PFC XOFF**: If Kmax is reached, the router is marking 100% of packets, signaling a 'Hard Brake' to the senders.
2. **PFC XOFF**: If the buffer continues to grow, PFC sends a PAUSE frame. This is a cluster-wide 'Emergency Stop' that causes Jitter.
3. **Best Practice**: Set Kmax significantly lower than the PFC headroom to allow the DCQCN algorithm time to stabilize the flows before the fabric locks up.
4. Industrial Tuning: The Optimization Matrix
Tuning ECN is highly dependent on your link speed and target "Braking Distance."
Frequently Asked Questions
Technical Standards & References
Related Engineering Resources
"You are our partner in accuracy. If you spot a discrepancy in calculations, a technical typo, or have a field insight to share, don't hesitate to reach out. Your expertise helps us maintain the highest standards of reliability."
Contributors are acknowledged in our technical updates.
