NVLink vs. NVSwitch: Scaling the Intra-Node Fabric

In the world of AI training, the limiting factor isn't just compute (TFLOPS); it's the speed at which GPUs can share their local memory state. While PCIe remains the standard for host-to-device communication, it is woefully inadequate for the multi-GPU peer-to-peer traffic required by model parallelism. This is where **NVLink** and **NVSwitch** come in—creating a unified memory address space that makes 8 GPUs look like a single giant processor.

NVLink: Point-to-Point

Originally introduced to bypass the PCIe bottleneck, NVLink is a high-speed, wire-level protocol. In early generations (P100), it was a point-to-point mesh. Every GPU had a fixed number of 'lanes' it could use to talk to its neighbors.

• NVLink 4.0: 900 GB/s aggregate per H100
• High-bandwidth, low-latency (sub-μs)
• Memory Coherency (loads/stores to remote RAM)

NVSwitch: The Fabric

NVSwitch is the silicon switch that sits *between* GPUs. Instead of hard-wiring GPU 0 to GPU 1, all GPUs plug into NVSwitch. This provides an All-to-All non-blocking fabric inside the server.

• Enables any-to-any full speed communication
• Foundation of the HGX and DGX systems
• Powering the external NVLink Switch for pods

The 900 GB/s Bottleneck

An H100 Hopper GPU has 18 NVLink 4.0 links. Each link provides 25GB/s (bidirectional), resulting in 900GB/s of total aggregate bandwidth. Compare this to PCIe Gen5 x16, which only offers ~64GB/s. Without NVLink, model parallelism (where a single model layer is split across GPUs) would be impossible due to the communication overhead overmatching the compute time.

Scale-Up vs. Scale-Out

**Scale-Up** refers to making a single node bigger (NVLink). **Scale-Out** refers to connecting multiple nodes together (InfiniBand/Ethernet).

NVIDIA's recent innovation is the **NVLink Switch System**, which uses external cables to extend the NVLink fabric beyond the 8-GPU node. In a Blackwell (B200) rack, 72 GPUs can be interconnected via NVLink as if they were a single GPU with 72x the memory.

H100 NVLink900 GB/s

B200 NVLink1.8 TB/s

PCIe Gen564 GB/s

Next Gen: Copper vs. Optics

As NVLink speeds hit 1.8TB/s per GPU in the Blackwell generation, the physical reach of copper traces is becoming a crisis. NVIDIA is moving toward **NVLink over Optics** and high-density copper cable cartridges to bridge the distance between B200 Compute trays and the NVSwitch spine.