Parallel Rails.

In an 8-GPU node (like HGX/DGX), each GPU has its own dedicated 400G Network Interface Card (NIC). These GPUs communicate with their peers on other nodes using **Collective Algorithms** like Ring or Tree.

**Rail Optimization** is the physical topology strategy of connecting all "GPU 0's" in a cluster to one Leaf switch, all "GPU 1's" to another, and so on. This ensures that the most throughput-intensive part of the training—the All-Reduce—is resolved at the lowest possible tier of the network (L1), minimizing "East-West" traffic at the Spine (L2).

Vertical Stacks

Each GPU ID acts as an independent network "rail". These rails never cross until they reach the upper spine.

Leaf Affinity

Minimizing leaf-to-spine crossings reduces congestion at the top of the fabric, freeing up spine bandwidth for global synchronization.

Rail Modeler.

Simulate your cable plan and verify if your cluster is truly rail-optimized. Calculate the potential "Hop Reduction" for your specific node count.

Share Article

Technical Standards & References

REF [nvidia-dgx-topology]
NVIDIA Engineering (2024)
DGX H100 System Architecture and Network Optimization
Published: NVIDIA Whitepapers
VIEW OFFICIAL SOURCE
REF [rail-optimization-paper]
S. Gupta et al. (2023)
Optimizing Collective Communications in Heterogeneous GPU Clusters
Published: IEEE Transactions
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.