The Encapsulation Tax

Unlike native InfiniBand, which uses a proprietary L2/L3 transport, **RoCE v2 (RDMA over Converged Ethernet)** leverages the existing IP and UDP stacks to allow RDMA traffic to traverse standard L3 routers. While this enables massive scalability across multi-vendor fabrics, it introduces a significant "encapsulation tax" that reduces the effective Goodput of the network.

Ethernet + IP + UDP

Standard Ethernet (14-18B), IPv4 (20B), and UDP (8B) headers form the outer envelope. In a RoCE v2 fabric, these 42-46 bytes are mandatory per packet, regardless of payload size.

BTH + ICRC

The InfiniBand Base Transport Header (12B) and the Invariant CRC (4B) are nested inside the UDP payload. This is where the RDMA magic happens, providing reliability and direct memory access.

Comparative Efficiency Table

Header LayerRoCE v2 sizeImpact
L2 Ethernet + FCS18 BytesStandard framing
L3 IPv4 Header20 BytesAllows L3 scalability
L4 UDP Header8 BytesUsed for entropy/routing
IB BTH + ICRC16 BytesRDMA Transport Payload
Total Overhead74 BytesIncluding L1 Inter-packet gap

The Efficiency Modeler

Calculate your effective Goodput and analyze the tax of L3/L4 headers across different MTU and payload sizes for your AI cluster.

Optimizing Effective Bandwidth

To minimize the impact of the header tax, AI infrastructure teams typically focus on two variables:

  • Jumbo Frames (MTU 9000): By increasing the payload size from 1500 to 9000 bytes, the relative weight of the 74-byte header drops from ~5% to less than 1%.
  • Hardware Offloading: Modern BlueField-3 DPUs handles header encapsulation/decapsulation in silicon, ensuring that the GPU's memory bandwidth is dedicated entirely to the model weights.
Share Article

Technical Standards & References

REF [roce-consortium-2014]
IBTA (2014)
Supplement to InfiniBand Architecture Specification: RoCE v2
Published: InfiniBand Trade Association
VIEW OFFICIAL SOURCE
REF [nvidia-rdma-perf]
NVIDIA Engineering (2023)
Performance Analysis of RDMA over Converged Ethernet
Published: NVIDIA Networking
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.