In a Nutshell

The expansion of the datacenter from physical clusters into software-defined clouds is powered by **Virtual Extensible LAN (VXLAN)**. By encapsulating Layer 2 Ethernet frames within Layer 3 UDP packets, VXLAN enables the migration of virtual machines and containers across routed boundaries. However, this 50-byte "Shim" introduces critical complexity in the underlying L3 MTU budget. This article provides a clinical engineering model for calculating **VXLAN Goodput Efficiency**, mapping the relationship between inner MSS and outer fragmentation, and auditing the performance impact of **VTEP Hardware Offloading** on modern ASICs.

BACK TO TOOLKIT

VXLAN Overhead & Efficiency Modeler

A precision simulator for VXLAN overlay fabrics. Model the exact impact of encapsulation on your MTU and total available bandwidth.

Simulation Params

64MTU Standard (1500)Jumbo (9000)
15009216
Encapsulation Forensics

Packet Overhead Analysis

L2
IP
UDP
VX
L2
PAYLOAD (1400B)
Safe to Propagate
Efficiency: 0.00%
0B
Total Overhead
0B
Transit MTU
1500B
Remaining MTU
0.0%
Header Expansion
MTU vs. MSS Relationship

For TCP traffic traversing EVPN-VXLAN, the **MSS (Maximum Segment Size)** must be reduced to account for the encapsulation. If the absolute path MTU is 1500 bytes, the VXLAN overhead (typically 50 bytes) dictates a maximum IP payload of 1450 bytes. Subtracting the internal IPv4 and TCP headers (40 bytes), the ideal MSS should be set to **1410 bytes** to prevent performance-killing ICMP "Fragmentation Needed" events.

Jumbo Frame Necessity

In modern Leaf-Spine AI fabrics, implementing **Jumbo Frames (9000-9216 bytes)** on the underlay is mandatory. This provides sufficient "headroom" for nested encapsulation, multi-level VLAN tagging, and security headers while still allowing the standard 1500-byte client Ethernet frame to pass without fragmentation, significantly reducing CPU interrupts at the VTEP (Virtual Tunnel Endpoint).

Share Article

1. The Anatomy of 50 Bytes: VXLAN Framing

A VXLAN packet is a "Layer 2 inside Layer 3" structure. Unlike standard VLANs which insert a tag, VXLAN wraps the entire original frame in a new set of headers.

Overhead Breakdown

Outer L2
14 Bytes
Outer IP
20 Bytes
Outer UDP
8 Bytes
VXLAN Header
8 Bytes

Total Overhead: 50 Bytes. If the inner Ethernet frame contains a VLAN tag, the overhead effectively grows as that tag is encapsulated along with the rest of the frame.

2. The 1550 Rule: Bridging the MTU Gap

In a native IPv4 network, the MTU is 1500. If we add 50 bytes of VXLAN, the packet becomes 1550.

Fragmentation (MTU 1500)

If the underlay only supports 1500, every VXLAN packet is fragmented into two. This doubles the PPS (Packets Per Second) for the same bandwidth, crushing firewall and router CPU performance.

Clean Tunnel (MTU 1550+)

By increasing the underlay MTU to 1550 or 9000 (Jumbo), we ensure zero fragmentation. The inner packet remains 1500, and standard host stacks require no modification.

3. VNI Scalability: Beyond the 4096 Limit

The primary driver for VXLAN wasn't encapsulation, but the exhaustion of VLAN IDs.

24-Bit Identifier

1. **VLAN**: 12 bits = 4,096 segments. (Legacy Enterprise)
2. **VXLAN VNI**: 24 bits = 16,777,216 segments. (Hyperscale Cloud)
3. **Entropy**: The source UDP port is hashed from the inner L2/L3/L4 headers, enabling perfect ECMP (Equal-Cost Multi-Path) spreading without the switch needing to inspect the inner packet.

4. VTEP Forensics: Hardware vs. Software Endpoints

A **VTEP** is where the VXLAN magic happens. It can be a software switch (Linux kernel) or a hardware ASIC (Arista/Cisco).

Frequently Asked Questions

Technical Standards & References

Mahlalingam, M. et al. (IETF)
RFC 7348: VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks
VIEW OFFICIAL SOURCE
Arista Networks
Arista: VXLAN Architecture and Troubleshooting
VIEW OFFICIAL SOURCE
VMware
VMware NSX: VXLAN Implementation in Software-Defined Datacenters
VIEW OFFICIAL SOURCE
Ivan Pepelnjak
Packet Overhead and Fragmentation in VXLAN Overlays
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.

Related Engineering Resources

VXLAN Encapsulation Penalty: MTU Fragmentation, UDP Checksum Offload, and the 50-Byte Tax

VXLAN adds a 50-byte overhead to every packet: 14 bytes of outer Ethernet, 20 bytes of outer IP, 8 bytes of outer UDP, and 8 bytes of the VXLAN header (flags + VNI + reserved). For a standard Ethernet MTU of 1500 bytes, the inner payload is reduced to 1450 bytes after encapsulation, and the outer frame length becomes 1522 bytes—potentially exceeding the 1500-byte MTU of downstream switches. When the physical path MTU is not increased (e.g., set to 9216 bytes for jumbo frames—the standard in modern data centers), the 50-byte overhead forces fragmentation of TCP segments, reducing goodput by up to 3.3% for full-length packets. In practice, the switch MTU is usually configured to 9216 bytes across the fabric, but the host NIC MTU must be explicitly set to 9000 bytes (or 1500 + 50 for the outer frame) to accommodate the VXLAN header without fragmentation.

The UDP checksum behavior in VXLAN is a common source of performance degradation. RFC 7348 specifies that the outer UDP checksum SHOULD be set to zero for VXLAN traffic to avoid the overhead of checksum computation on the encapsulating host. However, many hardware switches and NICs perform a UDP checksum verification on received VXLAN packets even when the checksum field is zero. On the Intel XL710 40G NIC, zero-checksum VXLAN packets can trigger a "checksum error" increment in the hardware counters, causing the NIC to pass the packet to the exception path where the DPC (Data Path Coprocessor) processes it at reduced throughput. The workaround is to set the outer UDP checksum to a valid computed value, which adds approximately 15 ns of latency per packet on the transmit side but avoids the 2-3 μs penalty of the exception path on receive.

The VNI (VXLAN Network Identifier) field, at 24 bits, supports 16 million logical networks—far exceeding the 4096 VLAN limit of 802.1Q. However, the EVPN control plane (RFC 7432) typically limits the practical VNI count per VTEP (VXLAN Tunnel Endpoint) to approximately 4,000-8,000 due to the BGP route processing overhead. Each VNI requires a separate MAC-VRF (Virtual Routing and Forwarding) instance and an associated IP-VRF, consuming TCAM entries for the L3 VNI. For a VTEP with 8,000 VNIs and 1,000 MAC addresses per VNI, the MAC address table grows to 8 million entries, exceeding the 288K MAC limit of most merchant silicon ASICs. This is why large multi-tenant deployments like Azure and AWS use explicit segmentation: the hypervisor allocates tenant VMs to a pre-assigned VNI pool with strict enforcement of per-VTEP VNI quotas, preventing MAC table exhaustion at the physical switch layer.

Geneve vs. VXLAN Comparison: Flexible TLV Options, Control Plane Integration, and Multi-Protocol Encapsulation

Geneve (Generic Network Virtualization Encapsulation, RFC 8926) was designed by the IETF NVO3 working group to address the limitations of VXLAN's fixed-format 8-byte header. Unlike VXLAN's rigid structure (24-bit VNI + 8 reserved bits + 8 flags), Geneve uses a Type-Length-Value (TLV) option structure that allows variable-length metadata (up to 64 bytes of options appended to the 8-byte base header) to carry context information such as: the virtual network context (VNI, 24 bits, same as VXLAN), the flow metadata (source and destination VM UUIDs for multi-tenant telemetry), the group policy identifier (for micro-segmentation, per Cisco ACI and VMware NSX), the OAM (Operations, Administration, and Maintenance) indicators (for in-band network telemetry, IOAM), and the NSH (Network Service Header) context for service function chaining. The base Geneve header is 8 bytes (same as VXLAN), but the total encapsulation overhead ranges from 50 bytes (no options) to 114 bytes (maximum 64 bytes of options). With the maximum option set, the per-packet overhead is 114 bytes—7.6% of a 1,500-byte MTU, compared to VXLAN's 3.3%. The bandwidth efficiency loss must be weighed against the operational benefits of the additional context: for a 100 Gbps link with 10,000 flows, Geneve with 64-byte options consumes 7.6 Gbps of overhead (7.6%), while VXLAN consumes 3.3 Gbps (3.3%). The 4.3 Gbps difference represents approximately 4.3% of link capacity reserved for metadata—a cost that some operators consider acceptable for the debugging and telemetry benefits, while others strictly limit Geneve options to 8-16 bytes (yielding 52-58 byte overhead, similar to VXLAN with GUE/GENEVE-light).

The control plane integration difference between VXLAN and Geneve is more architectural than protocol-level. VXLAN originally specified a flood-and-learn data plane (VTEPs learn MAC-to-VTEP mappings by observing ARP and data-plane broadcasts), which was later retrofitted with the EVPN control plane (BGP MP-REACH/MP-UNREACH NLRI for Type-2 and Type-3 routes). Geneve was designed from the start to integrate with a centralized or distributed control plane (OVSDB, NETCONF/YANG, or BGP EVPN), and the Geneve header's Option Class field (16 bits) allows different control plane protocols to define their own option TLVs without conflicting. The Open vSwitch (OVS) implementation of Geneve supports 20+ option classes, including: Class 0x0100 (OVS tunnel metadata), Class 0x8000 (VMware NSX logical switch context), Class 0x8001 (NSX logical router context), and Class 0x0050 (IOAM trace data). The control plane tells the VTEP which option classes to insert for each tunnel, and the receiving VTEP parses only the option classes it recognizes, silently ignoring unknown classes (the "skip" behavior required by RFC 8926 Section 3.3). This extensibility means that Geneve can carry VMware NSX's distributed firewall state (the security group tag) alongside an IOAM telemetry trace without requiring a protocol extension—the option TLVs are defined independently and combined at the sending VTEP.

The multi-protocol encapsulation capability of Geneve supports both Ethernet frames (L2, same as VXLAN) and NSH (Network Service Header) for service function chaining, enabling Geneve to carry traffic between virtual switches and virtual network functions (VNFs) without requiring separate encapsulation protocols. An NSH-encapsulated packet within Geneve has the format: Geneve header (8 bytes) + NSH (16 bytes for Service Path Header + Service Path Index) + original payload. The total overhead for NSH-in-Geneve is: 14 (outer Eth) + 20 (outer IP) + 8 (outer UDP) + 8 (Geneve) + 16 (NSH) = 66 bytes, compared to 50 bytes for standard VXLAN or 50 bytes for VXLAN-GPE (VXLAN Generic Protocol Extension, RFC 8926 alternate). The 16-bytes overhead for NSH integration represents a 32% increase over VXLAN-GPE, but it enables the NFV (Network Functions Virtualization) service chain to identify and process the packet's service path without deep packet inspection—the NSH Service Path Identifier (SPI) and Service Index (SI) directly encode the ordered list of VNFs the packet must traverse. Our overhead model computes the per-service-path encapsulation tax: for a packet traversing 5 VNFs (firewall, IDS, load balancer, WAN optimizer, DPI), the NSH path length is 5, and the NSH header must be present at every hop until the packet egresses the service chain (where the NSH is stripped). The cumulative overhead across the service chain is: ingress VTEP (66 bytes), each VNF hop (packet processed with NSH intact, no additional overhead), egress VTEP (50 bytes after NSH stripped). The net overhead per packet is 66 + 5 × 0 + 50 = 116 bytes total across the service chain, compared to 50 bytes per hop for VXLAN without service chaining (which requires the VNF to use a separate encapsulation or in-band metadata that adds complexity). The Geneve-NSH overhead is justifiable when the operational savings of automated service chaining exceed the bandwidth cost of 66 bytes per packet.

The hardware offload maturity difference between VXLAN and Geneve is the practical deployment constraint. VXLAN encapsulation/decapsulation is implemented in hardware by virtually every switch ASIC since 2015 (Broadcom Tomahawk, Jericho, Trident 3+; Intel Tofino; Cisco Silicon One Q100/Q200). Geneve hardware offload is available in ASICs from 2020 onward (Broadcom Trident 4, Jericho 2c+; Intel Tofino 2/3; NVIDIA Mellanox Spectrum-3/4), with the caveat that Geneve option parsing is limited to 2-4 option TLVs per packet in most ASICs (the number of hardware option parsers in the ASIC's pipeline). Exceeding the hardware option limit forces the packet to the CPU (software processing at 1-10 Gbps versus hardware line rate at 400 Gbps), negating the performance benefit of Geneve's flexibility. Our overhead model includes a "Geneve option limit check" that compares the number of option TLVs required by the control plane against the ASIC's option parser capacity, and alerts the operator if the option count exceeds the hardware limit—requiring either a reduced option set or a software VTEP that can handle the full option set at reduced throughput. For clusters where 400 Gbps line-rate processing is mandatory (AI training fabrics), VXLAN remains the safer choice because its fixed header has zero option parsing requirements, guaranteeing hardware offload across all ASIC generations.

Partner in Accuracy

"You are our partner in accuracy. If you spot a discrepancy in calculations, a technical typo, or have a field insight to share, don't hesitate to reach out. Your expertise helps us maintain the highest standards of reliability."

Contributors are acknowledged in our technical updates.

Share Article