PFC Threshold & Config Generator
Precision calculator for L2 buffer requirements in lossless AI fabrics. Calculate XOFF/XON values and watchdog timers for Arista, Cisco, and NVIDIA-Mellanox switches.
Policy Parameters
Warning
Incorrect PFC configuration can cause **Head-of-Line Blocking** or **PFC Storms**. Always verify watch-dog timers (PFC-WD) are enabled.
! Arista EOS Configuration for RoCE v2
! Priority Flow Control (PFC) & Enhanced Transmission Selection (ETS)
!
interface Ethernet1-32
priority-flow-control mode on
priority-flow-control watch dog action errdisable
!
dcbx pfc 3
!
traffic-policy PFC-POLICY
class RDMA-CLASS
bandwidth percent 50
priority 3
class DEFAULT-CLASS
bandwidth percent 50
!
qos list pfc 3
Flow Control Logic
PFC enables per-priority pause frames. When the egress buffer of queue 3 reaches the XOFF threshold, a PAUSE frame is sent back to the upstream switch port.
Bandwidth Guarantee
ETS ensures that RDMA traffic (TC 3) receives at least 50% of the link capacity during congestion, preventing starvation from best-effort traffic.
1. The XOFF Threshold: Braking Distance Calculus
PFC is fundamentally about "Braking Distance." If a switch is full, and THEN sends a PAUSE frame, the bytes already in transit (the "In-Flight" data) will overflow the buffer.
XOFF Threshold Equation
For a 400Gbps port at 100 meters, the XOFF threshold must be roughly 25KB to accommodate the ~500ns in-flight persistence. Failure to account for cable length in hyperscale pods leads to intermittent RDMA Sequence Errors that are notoriously difficult to debug.
2. Congestion Spreading: The Viral PFC Paradox
The greatest risk of PFC is Head-of-Line Blocking (HoLB). If one GPU (Receiver A) is slow, the switch will PAUSE the sender. If that sender was also sending to a FAST receiver, that receiver is now starved.
Immediate Blocking
The slow port's queue fills, triggering a Layer 2 PAUSE signal back to the NIC. Intended local protection.
Horizontal Spread
The PAUSE signal propagates upstream to the Spine tier, eventually stopping unrelated training jobs on the other side of the datacenter.
3. The PFC Watchdog: Preventing Fabric Collapse
To prevent total collapse, modern switches implement a Watchdog. If a queue has been in a "PAUSE state" for more than 100ms, it is considered deadlocked.
Recovery Sequence Logic
Detection Criteria
ASIC registers the queue as a 'Victim' or 'Aggressor' based on PAUSE counter intervals.
The Hard Kill
The switch 'discards' all packets in the offending queue. While one flow fails, the thousands of others are released from the deadlock.
4. Industrial Design: Lossless Implementation Blueprint
Implementing PFC requires careful mapping of Class-of-Service (CoS) values to the "no-drop" hardware buffers.
CoS 3 Mapping
The industry standard for RDMA traffic. Dedicate a specific queue to ensure storage-class traffic never contends with best-effort web traffic.
DCQCN Strategy
Combining PFC (last resort) with ECN (proactive throttle) to ensure the network slows down before it has to stop. It is the golden standard for AI fabrics.
Buffer Headroom
Configuring separate 'Static' and 'Dynamic' buffer pools to prevent a single port from starving the entire packet buffer of the ASIC.
Frequently Asked Questions
Technical Standards & References
Related Engineering Resources
"You are our partner in accuracy. If you spot a discrepancy in calculations, a technical typo, or have a field insight to share, don't hesitate to reach out. Your expertise helps us maintain the highest standards of reliability."
Contributors are acknowledged in our technical updates.
