The Death of Static Hashing
The ECMP Collision Problem.
Standard IP networks use **ECMP (Equal Cost Multi-Pathing)**. Each packet is hashed (IP Source/Dest, Port) and assigned to a static link. If two heavy AI flows hash to the same link, that link saturates while others sit idle at 0% load.
This is the **Elephant Flow Problem**. In a GPU cluster, every flow is an Elephant. Static hashing in a Fat-Tree topology leads to "Hot Paths," causing queuing delays and Packet Drops that ultimately throttle the collective training pass.
Ethernet ECMP
- — Static Flow Hashing (Deterministic)
- — Risk of Hash Polarization/Hot Spots
- — Cannot Rebalance During a Flow
Adaptive Routing
- — Dynamic Packet-Level Spraying
- — Near-Perfect Load Equilibrium
- — Routes Around Failed Switch Links
SHARPv3 In-Network Computing.
In-Network Computing (SHARP) takes routing a step further. Instead of just moving data, the **Switch Fabric** itself performs the All-Reduce operation. It collects the gradients from eight GPUs, sums them in the switch's ASIC, and broadcasts the result back. This cuts the network data volume in **half**.
