Managing Lossless Ethernet
The Ethernet Evolution
Traditional Ethernet was built on the principle of Best-Effort Delivery. If a switch buffer overflowed, the router dropped packets, relying on TCP retransmissions to fill the gaps. For RDMA-based AI clusters, this "drop-and-retry" cycle introduces millisecond-level tail latencies that kill performance. **Priority Flow Control (PFC)** and **Enhanced Transmission Selection (ETS)** transform lossy Ethernet into a high-performance lossless fabric.
Interactive Lossless Fabric Simulator
LOSSLESS FABRIC SIMULATOR
Real-time Buffer Management & Scheduling
Priority Flow Control (PFC)
PFC operates at Layer 2 to pause specific traffic classes when buffers fill up (XOFF). This prevents frame loss without blocking the entire physical link, maintaining a 'lossless' environment for RDMA.
Observe how PFC pauses specific traffic classes (RDMA) while ETS manages bandwidth allocation across the link.
PFC: Priority Flow Control (802.1Qbb)
PFC operates at the link layer to provide flow control independently for each of the eight traffic classes. When a downstream switch’s buffer reaches a critical threshold (XOFF), it sends a PAUSE frame for that specific class ID (e.g., Priority 3 for RDMA).
XOFF/XON Thresholds
The "Xoff" threshold triggers a pause, while "Xon" signals resumes. At 800G, these thresholds must be tuned with micro-precision to avoid wasting buffer space or risking a drop.
Pause Storm Risk
If a device continuously sends PAUSE frames without clearing its buffer, it can block the entire traffic path. **PFC Watchdogs** are critical for identifying and disabling misbehaving endpoints.
ETS: Enhanced Transmission Selection (802.1Qaz)
While PFC prevents drops, ETS ensures fair bandwidth distribution. It allows network architects to define **Bandwidth Groups** and assign weights, replacing the primitive "Strict Priority" scheduling which could easily starve management and storage traffic.
| Traffic Class | Weight (Example) | PFC Status |
|---|---|---|
| Priority 3 (RoCE v2) | 80% | ENABLED |
| Priority 4 (Storage) | 15% | ENABLED |
| Priority 0 (Management) | 5% | DISABLED |
