Alignment: The Art of Scaling
Parallel Rails.
In an 8-GPU node (like HGX/DGX), each GPU has its own dedicated 400G Network Interface Card (NIC). These GPUs communicate with their peers on other nodes using **Collective Algorithms** like Ring or Tree.
**Rail Optimization** is the physical topology strategy of connecting all "GPU 0's" in a cluster to one Leaf switch, all "GPU 1's" to another, and so on. This ensures that the most throughput-intensive part of the training—the All-Reduce—is resolved at the lowest possible tier of the network (L1), minimizing "East-West" traffic at the Spine (L2).
Vertical Stacks
Each GPU ID acts as an independent network "rail". These rails never cross until they reach the upper spine.
Leaf Affinity
Minimizing leaf-to-spine crossings reduces congestion at the top of the fabric, freeing up spine bandwidth for global synchronization.
