Bypassing the OS Kernel.

Traditional network storage (NFS, iSCSI) relies on a heavy stack of OS kernel drivers, context switches, and memory copies. For AI data loaders (like PyTorch DataLoader), these legacy protocols become the primary bottleneck back-pressuring the GPUs.

**NVMe-over-Fabrics (NVMe-oF)** extends the NVMe protocol across the network. By using RDMA (RoCE or InfiniBand), it performs "Direct Memory Access" from the storage target's controller to the host's memory, achieving millions of IOPS with sub-10μs latency overhead.

Infinite Queues

NVMe supports 64,000 I/O queues, each with 64,000 commands. NVMe-oF preserves this incredible parallelism across the wire.

GDS Integration

When combined with NVIDIA GPUDirect Storage (GDS), data travels from the remote NVMe drive to the GPU HBM3 without ever touching a CPU core.

The Transport Layer Decision.

NVMe-oF over RDMA

  • Requires RoCE V2 or InfiniBand
  • Zero-Copy Performance
  • Lowest Latency (~5-8μs)

NVMe-oF over TCP

  • Works on Standard Ethernet
  • Easy to Deploy
  • Uses CPU for Transport Logic

Capacity Modeler.

Calculate your training cluster's total bisection bandwidth for remote storage targets. Maximize throughput for your petabyte-scale datasets.

Share Article

Technical Standards & References

REF [nvme-spec]
NVMe Working Group (2023)
NVM Express Base Specification Revision 2.0
Published: NVM Express Group
VIEW OFFICIAL SOURCE
REF [mellanox-nvmeof]
NVIDIA Networking (2024)
Optimizing NVMe over Fabrics Performance with ConnectX-6
Published: Mellanox whitepaper
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.