Rail-Optimized Design: Maximizing GPU Node Scaling

Parallel Rails.

In an 8-GPU node (like HGX/DGX), each GPU has its own dedicated 400G Network Interface Card (NIC). These GPUs communicate with their peers on other nodes using **Collective Algorithms** like Ring or Tree.

**Rail Optimization** is the physical topology strategy of connecting all "GPU 0's" in a cluster to one Leaf switch, all "GPU 1's" to another, and so on. This ensures that the most throughput-intensive part of the training—the All-Reduce—is resolved at the lowest possible tier of the network (L1), minimizing "East-West" traffic at the Spine (L2).

Vertical Stacks

Each GPU ID acts as an independent network "rail". These rails never cross until they reach the upper spine.

Leaf Affinity

Minimizing leaf-to-spine crossings reduces congestion at the top of the fabric, freeing up spine bandwidth for global synchronization.

Rail Modeler.

Simulate your cable plan and verify if your cluster is truly rail-optimized. Calculate the potential "Hop Reduction" for your specific node count.

Rail
Aligned.

Alignment: The Art of Scaling