NVLink vs. NVSwitch: The Memory Fabric
Scaling the GPU Node beyond PCIe limits
In the world of AI training, the limiting factor isn't just compute (TFLOPS); it's the speed at which GPUs can share their local memory state. While PCIe remains the standard for host-to-device communication, it is woefully inadequate for the multi-GPU peer-to-peer traffic required by model parallelism. This is where **NVLink** and **NVSwitch** come in—creating a unified memory address space that makes 8 GPUs look like a single giant processor.
NVLink: Point-to-Point
Originally introduced to bypass the PCIe bottleneck, NVLink is a high-speed, wire-level protocol. In early generations (P100), it was a point-to-point mesh. Every GPU had a fixed number of 'lanes' it could use to talk to its neighbors.
- • NVLink 4.0: 900 GB/s aggregate per H100
- • High-bandwidth, low-latency (sub-μs)
- • Memory Coherency (loads/stores to remote RAM)
NVSwitch: The Fabric
NVSwitch is the silicon switch that sits *between* GPUs. Instead of hard-wiring GPU 0 to GPU 1, all GPUs plug into NVSwitch. This provides an All-to-All non-blocking fabric inside the server.
- • Enables any-to-any full speed communication
- • Foundation of the HGX and DGX systems
- • Powering the external NVLink Switch for pods
The 900 GB/s Bottleneck
An H100 Hopper GPU has 18 NVLink 4.0 links. Each link provides 25GB/s (bidirectional), resulting in 900GB/s of total aggregate bandwidth. Compare this to PCIe Gen5 x16, which only offers ~64GB/s. Without NVLink, model parallelism (where a single model layer is split across GPUs) would be impossible due to the communication overhead overmatching the compute time.
Scale-Up vs. Scale-Out
**Scale-Up** refers to making a single node bigger (NVLink). **Scale-Out** refers to connecting multiple nodes together (InfiniBand/Ethernet).
NVIDIA's recent innovation is the **NVLink Switch System**, which uses external cables to extend the NVLink fabric beyond the 8-GPU node. In a Blackwell (B200) rack, 72 GPUs can be interconnected via NVLink as if they were a single GPU with 72x the memory.
Next Gen: Copper vs. Optics
As NVLink speeds hit 1.8TB/s per GPU in the Blackwell generation, the physical reach of copper traces is becoming a crisis. NVIDIA is moving toward **NVLink over Optics** and high-density copper cable cartridges to bridge the distance between B200 Compute trays and the NVSwitch spine.
