In a Nutshell

In the hydraulics of a network fabric, packet size is destiny. The Maximum Transmission Unit (textMTU\\text{MTU}) and Maximum Segment Size (textMSS\\text{MSS}) define the 'bore' of the network pipe. Choose them correctly, and the fabric hums with wire-speed efficiency; choose them poorly, and the network collapses under the 'Fragmentation Tax' or disappears into 'MTU Black Holes.' This 4,000-word Masterwork deconstructs the forensics of packet sizing. We analyze the binary hydraulics of the textTCPMSS\\text{TCP MSS} negotiation, the mechanics of Path textMTU\\text{MTU} Discovery (textPMTUD\\text{PMTUD}), and the radical impact of overlay encapsulation (textVXLAN/GENEVE\\text{VXLAN/GENEVE}) on efficiency. Beyond the numbers, we explore the forensics of fragmentation-induced textCPU\\text{CPU} spikes and the 'Silent Killer' of textICMP\\text{ICMP} blocking. This is the definitive engineering guide to the precision calculus of the packet path.
The Layer 2 Ceiling

1. textMTU\\text{MTU}: The Maximum Transmission Unit

The textMTU\\text{MTU} is the largest frame that a physical network interface can transmit without fragmentation. On Ethernet, the 'Magic Number' is 1,500,textbytes1{,}500\\, \\text{bytes}. Everything—from your laptop to the switches in the core—is tuned to this number.

The Header Forensics

textEthernetPayload(MTU)=1,500,textbytes\\text{Ethernet Payload (MTU)} = 1{,}500\\, \\text{bytes}

This 1,500,textbytes1{,}500\\, \\text{bytes} includes the textIP\\text{IP} header (20,textbytes20\\, \\text{bytes}) and the textTCP\\text{TCP} header (20,textbytes20\\, \\text{bytes}). Therefore, the actual data (the Maximum Segment Size - textMSS\\text{MSS}) is 1,460,textbytes1{,}460\\, \\text{bytes}. If you add textVLAN\\text{VLAN} tags (text802.1Q\\text{802.1Q}), an extra 4,textbytes4\\, \\text{bytes} are consumed, although most modern textNICs\\text{NICs} allow 1,5001{,}500 payload bytes *exclusive* of the textL2\\text{L2} framing.

Loading Visualization...
The Layer 4 Treaty

2. textMSS\\text{MSS}: The Maximum Segment Size Negotiation

Unlike textMTU\\text{MTU}, which is a hardware limit, the textMSS\\text{MSS} is a negotiated agreement between two textTCP\\text{TCP} hosts. During the 3-Way Handshake, each side says: 'I can accept a payload up to textX\\text{X} bytes.'

RFC 879: The textMSS\\text{MSS} Logic

textMSS=textMTUtextlocal40,textbytes\\text{MSS} = \\text{MTU}_{\\text{local}} - 40\\, \\text{bytes}

If textH1\\text{H1} has an textMTU\\text{MTU} of 1,5001{,}500, it sends a textSYN\\text{SYN} with textMSS=1,460\\text{MSS}=1{,}460. If textH2\\text{H2} is on a textVPN\\text{VPN} with an textMTU\\text{MTU} of 1,4001{,}400, it responds with textMSS=1,360\\text{MSS}=1{,}360. The textTCP\\text{TCP} stack automatically chooses the *lowest common denominator* for the session.

The Path Discovery

3. Path textMTU\\text{MTU} Discovery: The Feedback Loop

How does a server in California know that a home router in London has an textMTU\\text{MTU} of 1,4921{,}492 (textPPPoE\\text{PPPoE})? textPMTUD\\text{PMTUD}.

The textDF\\text{DF}-Bit Protocol

  1. Sender sets the Don't Fragment (textDF\\text{DF}) bit in the textIP\\text{IP} header.
  2. Small-bore router encounters the packet.
  3. Router sends back an textICMP\\text{ICMP} Type 3 Code 4 (Fragmentation Needed).
  4. Sender receives the textICMP\\text{ICMP}, updates its path textMTU\\text{MTU}, and resends.
The Silent Killer:

Many security 'experts' block textICMP\\text{ICMP} 'for security.' This breaks textPMTUD\\text{PMTUD}, creating an textMTU\\text{MTU} Black Hole. Small packets (textSYNs\\text{SYNs}) work, but full-sized data packets are silently discarded. The connection hangs indefinitely. NEVER BLOCK ALL textICMP\\text{ICMP}.

The Overhead Cost

4. Encapsulation Hydraulics: VXLAN & Jumbo Frames

Modern fabrics use overlays (textVXLAN\\text{VXLAN}, textGENEVE\\text{GENEVE}). These add headers to the packet, creating a 'Sizing Paradox.'

The textVXLAN\\text{VXLAN} Math

  • Standard Packet: 1,500,textbytes1{,}500\\, \\text{bytes}.
  • textVXLAN\\text{VXLAN} Overhead: 50,textbytes50\\, \\text{bytes} (textEthernet\\text{Ethernet}, textIP\\text{IP}, textUDP\\text{UDP}, textVXLAN\\text{VXLAN}).
  • Encapsulated Packet: 1,550,textbytes1{,}550\\, \\text{bytes}.

If the core network only supports 1,500, we must either shrink the server textMTU\\text{MTU} to 1,4501{,}450 (which hurts performance) or enable Jumbo Frames (9,000,textbytes9{,}000\\, \\text{bytes}) on the physical switches. In 2026, Jumbo Frames are mandatory for any high-performance fabric.

// Scientific Audit: Verified against textRFC791\\text{RFC 791} (textIP\\text{IP}), textRFC793\\text{RFC 793} (textTCP\\text{TCP}), textRFC1191\\text{RFC 1191} (textPMTUD\\text{PMTUD}), and textRFC7348\\text{RFC 7348} (textVXLAN\\text{VXLAN}) as of textQ22026\\text{Q2 2026}.

Frequently Asked Questions

Technical Standards & References

IETF
RFC 791: Internet Protocol Specification
VIEW OFFICIAL SOURCE
IETF
RFC 1191: Path MTU Discovery
VIEW OFFICIAL SOURCE
IETF
RFC 4821: Packetization Layer Path MTU Discovery
VIEW OFFICIAL SOURCE
Cloudflare Engineering
The Story of the MTU
VIEW OFFICIAL SOURCE
Cisco Systems
VXLAN MTU: Design Considerations
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.

Related Engineering Resources

Overlay Encapsulation Overhead: VXLAN, GENEVE, and MTU Planning

Modern data center networks rely heavily on overlay encapsulation technologies that add significant header overhead to every packet, creating MTU challenges that did not exist in traditional flat Ethernet networks. Virtual Extensible LAN (VXLAN), standardized in RFC 7348, encapsulates an entire Ethernet frame inside a UDP packet for transport over a Layer 3 IP network. The VXLAN encapsulation adds 50 bytes of overhead: 8 bytes for the VXLAN header (including the 24-bit VXLAN Network Identifier, or VNI), 8 bytes for the outer UDP header (destination port 4789), 20 bytes for the outer IP header, and 14 bytes for the outer Ethernet header. When the inner payload is a standard 1,500-byte Ethernet frame, the outer frame becomes 1,550 bytes—50 bytes larger than the standard Ethernet MTU of 1,500 bytes. If the physical network's MTU is not increased to accommodate this overhead, the packet must be fragmented, which VXLAN explicitly discourages (the outer IP header should have the Don't Fragment bit set). The result is that the packet is silently dropped by a switch along the path that has an MTU of 1,500 bytes, causing a connectivity failure that is extremely difficult to diagnose because the underlying physical links appear operational.

The solution is to configure a "baby giant" MTU of 1,600 bytes or larger on all physical interfaces in the data center fabric. The industry standard for VXLAN-enabled data centers is an MTU of 9,000 bytes (jumbo frames) on the underlay network, which provides ample headroom for the VXLAN overhead plus any additional encapsulation layers such as IPsec or MACsec. The network engineer must ensure that every device in the underlay path—from the originating server's NIC, through the top-of-rack (ToR) switch, the spine switches, and the destination ToR switch—supports and is configured for the larger MTU. A single switch port with an MTU of 1,500 bytes in the path will silently drop any VXLAN-encapsulated packet that exceeds 1,500 bytes total, causing "black hole" connectivity failures that affect only traffic traversing that particular path. This is why MTU verification is a mandatory step in data center network commissioning: the engineer must test the path MTU between every pair of VTEPs using ICMP packets with the Don't Fragment bit set and increasing payload sizes, confirming that the 1,600-byte (or larger) MTU is supported end-to-end before VXLAN traffic is enabled.

GENEVE (Generic Network Virtualization Encapsulation), standardized in RFC 8926, takes the encapsulation concept further by providing a flexible variable-length header format that can carry arbitrary metadata. Unlike VXLAN's fixed 50-byte overhead, GENEVE's overhead varies from 50 bytes (with no options) to over 200 bytes (with the maximum options field of 128 bytes plus the mandatory GENEVE base header). This variable overhead introduces a new challenge for MTU planning: the network engineer cannot simply configure a single MTU value for all traffic because different encapsulated flows may have different total sizes. The current best practice is to configure the physical underlay MTU to 9,000 bytes (which accommodates any GENEVE option combination) and to configure the overlay MTU (the MTU that the VM or container sees) to a value that ensures the total encapsulated packet size does not exceed the underlay MTU. For GENEVE with 64 bytes of options (a typical maximum for current use cases), the overlay MTU should be set to 9,000 - 50 - 64 = 8,886 bytes, allowing the encapsulated packet to fit within the 9,000-byte underlay MTU with margin for additional lower-layer headers such as MPLS or IS-IS.

The application of TCP MSS clamping becomes essential in overlay networks because the end hosts (virtual machines or containers) are typically unaware of the overlay encapsulation overhead. A VM configured with the standard Ethernet MTU of 1,500 bytes will generate TCP segments with an MSS of 1,460 bytes (1,500 - 20 IP - 20 TCP). After VXLAN encapsulation adds 50 bytes, the total frame on the wire is 1,550 bytes—50 bytes above the underlay MTU if the underlay is still at 1,500 bytes. The solution is to configure MSS clamping on the VTEP or on the physical switch at the edge of the overlay network, reducing the MSS value advertised by the end hosts so that the encapsulated packet fits within the underlay MTU. For a VXLAN overlay on a 1,500-byte underlay, the MSS should be clamped to 1,410 bytes (1,500 - 50 overhead - 20 IP - 20 TCP). The MSS clamping configuration on Cisco Nexus switches uses the "ip tcp adjust-mss 1410" command on the SVI or physical interface connecting to the overlay network. This is one of the most frequently overlooked configuration items in VXLAN deployments and is a common root cause of "one-way communication" failures where small DNS queries work (because they fit within the MTU) but large HTTP transfers fail (because they exceed the MTU after encapsulation).

The future evolution of overlay MTU management points toward automatic MTU discovery and negotiation between the overlay endpoints. The IETF's Path MTU Discovery (PMTUD) standard (RFC 1191 for IPv4, RFC 8201 for IPv6) was designed for simple IP networks and does not account for encapsulation overhead added by intermediate tunnel endpoints. The newer Packetization Layer Path MTU Discovery (PLPMTUD, RFC 4821 and RFC 8899) uses a probe-based approach that can detect the end-to-end MTU regardless of encapsulation overhead, because it probes the actual path with packets of increasing size and observes whether they are delivered successfully. PLPMTUD is supported by modern operating systems (Linux kernel 5.10+) and can be used by overlay networks to automatically determine the correct overlay MTU without manual configuration. When implemented in a VXLAN VTEP, PLPMTUD allows the VTEP to probe the path MTU to each remote VTEP individually, detecting whether a particular path supports the configured overlay MTU and falling back to fragmentation or signaling an MTU reduction to the end hosts if the path is constrained. This automated MTU management is expected to become a standard feature of overlay networking platforms and will significantly reduce the operational burden of MTU planning in large-scale data center fabrics.

MTU Black Hole Detection and Remediation at Scale

An MTU black hole is a path through the network where packets larger than a certain size are silently dropped without any notification to the sender. Unlike other network failures (link down, routing loop) that generate SNMP traps or syslog messages, MTU black holes are silent because the ICMP "Fragmentation Needed" message (Type 3, Code 4) that should inform the sender of the MTU restriction is itself dropped by the same device that is causing the problem—typically a firewall or security appliance that is configured to drop all ICMP traffic for "security reasons." The result is that the sender continues to transmit packets that exceed the path MTU with the Don't Fragment bit set, and those packets are silently discarded. From the sender's perspective, TCP connections time out (the SYN segment is lost) or hang after the initial handshake (the first data segment is lost), while UDP traffic silently disappears. The diagnostic signature of an MTU black hole is a network that appears functional for small packets (DNS queries, ARP, ICMP echo requests that are kept small) but fails for large packets (HTTP transfers, file uploads, database replication).

The most common cause of MTU black holes in enterprise networks is the interaction between ICMP filtering and PMTUD. Many security teams, following a well-intentioned but misguided security policy, configure firewalls to drop all ICMP traffic, including the ICMP Type 3 Code 4 messages that are essential for PMTUD. When a router or switch along the path receives a packet larger than the outgoing interface MTU with the DF bit set, it generates an ICMP Type 3 Code 4 message containing the MTU of the restricting interface and sends it back to the original source. If a firewall between the restricting router and the source drops this ICMP message, the source never learns about the MTU restriction and continues to send oversized packets that are silently discarded. The remediation is straightforward: configure the firewall to permit ICMP Type 3 Code 4 messages (and only those messages) through the security policy. This can be done with a specific access control entry that permits "icmp type 3 code 4 any any" before the general "deny icmp any any" rule. This targeted ICMP permit preserves the security benefit of blocking reconnaissance ICMP traffic while allowing the essential PMTUD mechanism to function.

Network address translation introduces a second source of MTU black holes that is particularly insidious because it is intermittent and dependent on the specific address translation state. When a NAT gateway translates the source IP address of a packet, it must also recalculate the IP header checksum. Some NAT implementations, particularly on low-end consumer routers, do not correctly handle the DF bit during fragmentation, a misbehavior documented in RFC 4459. When the NAT gateway receives a packet that needs fragmentation (because the outgoing interface has a smaller MTU than the incoming interface), it should either fragment the packet (clearing the DF bit) or send an ICMP Type 3 Code 4 message to the original source. Some NAT gateways do neither: they simply drop the packet without notification, creating an MTU black hole that is specific to traffic traversing the NAT gateway. This type of black hole is extremely difficult to diagnose because it affects only traffic that is actively being translated, and it may appear and disappear as the NAT table entries expire and are re-created. The definitive diagnostic test is to compare the path MTU for traffic through the NAT gateway (measured by sending packets from a host behind the NAT to a host outside) with the path MTU for native traffic (measured between two hosts on the same side of the NAT). If the NAT path has a reduced MTU but no ICMP Type 3 Code 4 messages are received during the probe, the NAT gateway is the source of the MTU black hole.

IPsec VPN tunnels are a rich source of MTU black holes because they add significant encapsulation overhead that is invisible to the end hosts. A typical IPsec tunnel in ESP tunnel mode adds 50–60 bytes of overhead: 20 bytes for the outer IP header, 8 bytes for the ESP header, 16 bytes for the ESP trailer (padding), and 16 bytes for the ESP authentication data (HMAC-SHA256). When a host sends a 1,500-byte packet to a destination through the IPsec tunnel, the encapsulated packet becomes 1,550–1,560 bytes—exceeding the 1,500-byte MTU of the physical path. The IPsec gateway should fragment the encapsulated packet (clearing the DF bit on the outer header) or use PMTUD to inform the sending host of the reduced MTU. However, many IPsec implementations fragment the inner packet before encapsulation, which requires the inner packet's DF bit to be cleared—something the end host may not do because it expects the end-to-end path MTU to be at least 1,500 bytes. The practical solution widely deployed in enterprise networks is to configure MSS clamping on the IPsec gateway, reducing the TCP MSS for traffic traversing the tunnel to a value that accounts for the IPsec overhead: typically 1,380 bytes for an IPsec tunnel over standard Ethernet (1,500 - 40 outer headers - 50 IPsec - 20 IP - 20 TCP). This MSS clamping is configured on the IPsec gateway's crypto map or tunnel interface and is transparent to the end hosts, which simply see a slightly reduced TCP segment size for connections through the tunnel.

The emerging trend in MTU black hole remediation is the use of PLPMTUD (Packetization Layer Path MTU Discovery, RFC 8899) as a replacement for the legacy PMTUD mechanism that relies on ICMP. PLPMTUD works by probing the path with packets of increasing size and observing whether they are delivered successfully. The probe packets are sent with the DF bit set, and the sender deduces the path MTU by observing which probe sizes succeed and which fail. Crucially, PLPMTUD does not require ICMP messages—it relies solely on the receipt or non-receipt of probe acknowledgments. This makes PLPMTUD immune to the MTU black hole problem caused by ICMP filtering. The probe mechanism is bandwidth-efficient because it uses a binary search to find the path MTU, requiring only 10–15 probes to determine the MTU of a typical path. QUIC (HTTP/3) uses PLPMTUD as its MTU discovery mechanism, which is one reason why QUIC connections are more robust than TCP connections in the presence of MTU black holes. Linux kernel 5.10+ includes PLPMTUD support for TCP, and it is expected to become the default MTU discovery mechanism in future operating system releases. For the network engineer, the adoption of PLPMTUD will mean fewer MTU-related troubleshooting tickets, but it will also require understanding the new probing behavior and its interaction with firewall and NAT devices that may drop or modify the probe packets.

Partner in Accuracy

"You are our partner in accuracy. If you spot a discrepancy in calculations, a technical typo, or have a field insight to share, don't hesitate to reach out. Your expertise helps us maintain the highest standards of reliability."

Contributors are acknowledged in our technical updates.

Share Article