Bandwidth vs. Throughput
The Engineering Reality of Data Transmission
The Capacity Gap: Theoretical Potential vs. Goodput Reality
In consumer marketing, "Bandwidth" is often sold as a synonym for "Speed." In high-performance networking, this is a dangerous oversimplification. Bandwidth is merely the **Spectral Capacity** of the medium—the width of the pipe—while **Throughput** is the actual volume of data that successfully traverses that medium over time.
The gap between the two is defined by the "Headers Tax," signal-to-noise dynamics, and the physics of the transport protocols. To understand why a 10Gbps link rarely delivers 10Gbps of application data, we must dissect the transmission into its constituent electromagnetic and protocol-level parts.
The Physical Limit: Shannon-Hartley Deep Dive
Every communication channel is bounded by the **Shannon-Hartley Theorem**, which defines the maximum amount of error-free information that can be transmitted over a bandwidth $B$ in the presence of noise $N$.
This equation tells us that capacity $C$ is a function of both the **Spectral Width** (Hertz) and the **Signal-to-Noise Ratio (SNR)**. If you increase the frequency (Bandwidth) but the noise rises proportionally (common in copper cabling), your capacity remains flat. This is the fundamental constraint of **Physics Layer (L1) Mechanics**.
The Header Tax: Dissecting Protocol Efficiency
To move application data, it must be encapsulated. Every layer of the OSI model adds its own "tax." On a standard Gigabit Ethernet link, the efficiency is strictly capped by the architecture of the **Ethernet Frame**.
The Anatomy of an 1500B MTU Tax
| Layer | Overhead Component | Bytes |
|---|---|---|
| Layer 1 (L1) | Preamble (7B) + SFD (1B) + Inter-Frame Gap (12B) | 20 Bytes |
| Layer 2 (L2) | MAC Header (14B) + FCS/CRC (4B) | 18 Bytes |
| Layer 3 (L3) | IPv4 Header (No options) | 20 Bytes |
| Layer 4 (L4) | TCP Header (No options) | 20 Bytes |
Total overhead is **78 bytes** per packet. For a 1500-byte MTU, we can calculate the **Maximum Theoretical Throughput Strategy**:
This means a 1Gbps link can **never** deliver more than **949Mbps** of application data, even with zero latency and zero interference. If you add VLAN tagging (802.1Q), you lose another 4 bytes per frame. If you use MPLS labels, you lose another 4 bytes per label.
The Long Fat Pipe: Bandwidth-Delay Product (BDP)
In a noise-free environment with zero header overhead, your throughput can still collapse due to the **Bandwidth-Delay Product (BDP)**. The BDP defines the "volume of the pipe"—the total amount of data that can be in flight between the sender and receiver at any given time.
The Satellite Link Trough
Consider a 10Gbps satellite link with a 600ms RTT. The BDP is:
If the sender's **TCP Receive Window (RWIN)** is limited to the legacy default of 64KB, the sender will transmit 64KB and then stop, waiting 600ms for an ACK before sending the next 64KB. In this scenario, the effective throughput is a pathetic 853Kbps—rendering your 10Gbps link 99.9% idle.
To solve this, modern systems use **TCP Window Scaling (RFC 1323)**, allowing windows up to 1GB. However, even with scaling, a single packet loss on a "Long Fat Pipe" causes the TCP congestion window to cut in half, leading to a massive recovery time that drains throughput.
Goodput: The Application's Perspective
**Goodput** is the metric that actually matters to the CEO and the end-user. It is the rate at which useful, non-duplicate application data is delivered. It is always less than throughput because it excludes retransmissions and protocol overhead.
The relationship between packet loss ($p$) and TCP throughput ($T$) is non-linear and brutal. According to the **Mathis Equation**, the maximum throughput of a TCP connection is inversely proportional to the square root of the loss rate:
Where $C$ is a constant (~1.22). This equation proves that on a high-latency link (large $RTT$), even a tiny loss rate ($p=0.001$) can cap your throughput at a fraction of your bandwidth. This is why "Bandwidth" upgrades are useless for fixing throughput issues caused by L1/L2 instability.
The MTU Leverage: Reducing the Interrupt Storm
The **Maximum Transmission Unit (MTU)** is the largest packet or frame size, specified in octets, that can be sent in a single network transaction. Standard Ethernet uses an MTU of 1500 bytes. In high-throughput environments like SANs (Storage Area Networks) or AI Compute Clusters, this is often too small.
The "Jumbo" Advantage
By increasing the MTU to **9000 bytes (Jumbo Frames)**, you reduce the number of packets required to move the same amount of data by 6x. This reduces the "Header Tax" and, more importantly, reduces the number of **CPU Interrupts** processed by the network interface card (NIC).
The "Fragmentation" Trap
If an MTU mismatch occurs (e.g., a 9000B packet hits a 1500B router interface), the router must fragment the packet. This consumes CPU cycles and increases latency. If the "Don't Fragment" (DF) bit is set, the packet is simply dropped, leading to "ICMP Destination Unreachable" errors.
Engineering Encyclopedia
BDP (Bandwidth-Delay Product)
The total volume of data that can be "in flight" on a link, calculated as throughput multiplied by RTT.
CIR (Committed Information Rate)
The average rate of traffic that a provider guarantees will be delivered across their network.
Goodput
The quantity of useful information delivered per unit of time to a specific application, excluding protocol overhead and retransmissions.
IFG (Inter-Frame Gap)
The idle time between Ethernet frames (standard 96 bit times) required for receiver synchronization and L1 stability.
MSS (Maximum Segment Size)
The largest amount of data that a device can receive in a single TCP segment, usually MTU minus IP and TCP headers.
MTU (Maximum Transmission Unit)
The size of the largest protocol data unit (PDU) that can be communicated in a single network layer transaction.
Preamble
A sequence of bits (usually 56 bits) used to synchronize the clock of the receiver before the actual frame data arrives.
RWIN (Receive Window)
The amount of data a receiver is willing to buffer for a connection; acts as a buffer flow control mechanism.
Shannon Capacity
The theoretical maximum bit rate of a communication channel for a given noise level.
TCP Window Scaling
An option to increase the maximum allowed 16-bit window size field to 32 bits using a scale factor.
Throughput
The amount of data moved successfully from one place to another in a given time period.
Utilization
The percentage of the available bandwidth currently being used by traffic.
The Engineering Standard: RFC 6349 Methodology
Standard "Speed Tests" are virtually useless for infrastructure troubleshooting because they conflate application performance with link capacity. **RFC 6349** provides a rigorous framework for TCP throughput testing:
- Step 1: Path MTU Discovery. Ensure the test is using the actual MTU of the path to avoid fragmentation overhead.
- Step 2: Baseline RTT. Measure the round-trip time under zero load to calculate the ideal BDP.
- Step 3: TCP Window Optimization. Force the host to use a window size $\ge BDP$.
- Step 4: Concurrent Flows. Use enough parallel streams to saturate the ASIC pathways without causing congestion collapse.
Engineering Conclusion
Bandwidth is the road; throughput is the traffic that actually moves. Every physical characteristic of the network—noise, distance, cable quality—reduces your headroom from Shannon's theoretical limit. Every protocol layer adds an additional tax.
A master performance engineer does not "upgrade" until they have measured the **Goodput Efficiency**. If your efficiency is below 90%, you don't have a bandwidth problem; you have a protocol, windowing, or stability problem. Solving those is the difference between a technician and an engineer.
Advanced Queue Disciplines: The Router's Role in Throughput Collapse
Even with optimal physical-layer configuration and proper TCP window scaling, the router's queue discipline (qdisc) can single-handedly decimate throughput. The qdisc is the packet scheduling algorithm that determines which packet gets transmitted next when the output interface is congested. The default Linux qdisc, , is a simple First-In-First-Out (FIFO) queue with three priority bands that lacks any active queue management (AQM). In a FIFO queue, when the buffer fills, newly arriving packets are simply dropped at the tail—a behavior known as Tail Drop—which causes TCP's congestion control algorithm to discover loss only after the buffer has already bloated to its maximum capacity. This fundamental mismatch between buffer sizing and TCP's window dynamics is the root cause of the Bufferbloat phenomenon.
The relationship between buffer size and throughput under Tail Drop is governed by the interaction between TCP's additive-increase-multiplicative-decrease (AIMD) algorithm and the buffer depth. The average queue occupancy in a Tail Drop system can be expressed as:
When , the buffer dominates the end-to-end latency, and throughput oscillates between the link rate during window growth and half the link rate after a loss event. The mean throughput under Tail Drop is approximately:
This equation reveals a counterintuitive truth: increasing buffer size without bound does not increase throughput—it merely increases latency. Once the buffer exceeds the BDP, throughput asymptotically approaches the link rate, but at the cost of latency that grows linearly with buffer size. This is the Bufferbloat tradeoff that AQM algorithms aim to resolve.
Active Queue Management: From RED to CoDel
Active Queue Management (AQM) algorithms address Tail Drop pathology by dropping packets proactively to signal TCP before the buffer is completely full. The classic Random Early Detection (RED) algorithm computes a drop probability based on the exponentially weighted moving average of the queue depth:
RED smooths the congestion signal across flows, preventing the global TCP synchronization problem where all flows simultaneously detect loss and halve their windows, causing a throughput collapse. However, RED requires careful tuning of , , , and the queue weight factor, making it fragile in heterogeneous environments. The CoDel (Controlled Delay) algorithm, introduced by Nichols and Jacobson in 2012, eliminates parameter tuning by measuring packet sojourn time rather than queue depth. CoDel maintains a target delay of 5ms and drops packets according to a square-root control law when the minimum sojourn time exceeds this target:
The square-root control law is mathematically elegant: it responds aggressively to persistent congestion (where count grows and the inter-drop interval shrinks) while remaining transparent to brief microbursts that complete within the 100ms control interval. FQ-CoDel extends CoDel with per-flow queuing, ensuring that a single aggressive flow cannot starve others. Measurements from real-world deployments show that FQ-CoDel reduces the 99th percentile flow completion time for short flows by up to 80% compared to Tail Drop, while reducing bulk throughput by less than 5%—a critical improvement for mixed-traffic environments.
Hardware Offload and Qdisc Bypass
An increasingly critical concern in high-speed networking is that NIC hardware offloads (TSO/GRO) bypass the software qdisc entirely. When TCP Segmentation Offload (TSO) is active, the kernel passes super-sized segments (up to 64KB) to the NIC, which splits them into MTU-sized packets after the qdisc has made its scheduling decision. The AQM algorithm is effectively blind to individual packets, operating on TSO segments that are up to 44x larger than the actual wire packets. Recent kernel work has introduced "segmentation-aware" qdiscs that peek inside TSO segments, but these remain experimental. The throughput engineer must verify qdisc visibility by checking —if the dropped counter remains at zero under sustained load, the qdisc is likely being bypassed, and your throughput is being shaped solely by the NIC's internal ring buffer operating in Tail Drop mode.
Precision Throughput Measurement: From Wire to Application
Measuring throughput is deceptively simple: send data, measure time, divide. In practice, the methodology chosen determines whether the result reflects link capacity, protocol efficiency, or application performance—and conflating these three is the most common error in network troubleshooting. The International Telecommunication Union's Y.1564 standard defines a multi-layer testing framework that separates these concerns, beginning with Layer 2 throughput (the raw bit-carrying capacity of the medium) and progressing through Layer 3 (IP forwarding rate), Layer 4 (TCP goodput), and Layer 7 (application-level throughput).
Each layer introduces its own measurement artifacts. At Layer 2, the throughput calculation must account for the Inter-Frame Gap (IFG) of 96 bit times (12 bytes at 1Gbps) and the preamble (8 bytes), which are invisible to higher-layer tests. The maximum achievable L2 throughput on Ethernet is:
This means that even for maximum-sized frames (1518 bytes), the L2 efficiency is only 98.7%, and for minimum-sized frames (64 bytes), it collapses to 76.2%. This is not a performance problem—it is a fundamental constraint of the Ethernet protocol. The iPerf3 tool, the de facto standard for network throughput testing, defaults to Layer 4 (TCP) measurement, which introduces additional overhead from headers, congestion control, and the three-way handshake. Running iPerf3 in UDP mode removes the congestion control variable but introduces packet loss as a measurement artifact, since UDP has no retransmission mechanism.
RFC 6349: The Standard for TCP Throughput Testing
Standard "speed tests" are virtually useless for infrastructure troubleshooting because they conflate application performance with link capacity. RFC 6349 provides a rigorous framework that isolates these variables through a four-step methodology. Step 1 performs Path MTU Discovery to ensure the test uses the actual path MTU, avoiding fragmentation overhead. Step 2 establishes a baseline RTT under zero load, which is used to compute the ideal BDP. Step 3 configures the TCP buffer (SO_RCVBUF/SO_SNDBUF) to be at least as large as the BDP. Step 4 runs multiple concurrent streams to saturate the link without causing congestion collapse—typically four to eight streams for a 10Gbps link, depending on the NIC's RSS (Receive Side Scaling) configuration.
The output of an RFC 6349 test produces a "Throughput Efficiency Ratio" and a "Buffer Delay Metric" that quantify how close the connection comes to theoretical performance:
A ratio below 0.9 indicates either buffer misconfiguration, excessive packet loss, or suboptimal window scaling. When the measured throughput deviates from the BDP-derived ideal, the root cause is almost never "not enough bandwidth" and almost always a protocol or configuration issue at L3 or L4.
Practical Tooling: iPerf3, ntttcp, and Application-Aware Monitoring
iPerf3 is the workhorse of TCP throughput testing, but its default parameters produce misleading results. The default test duration (10 seconds) is insufficient for high-BDP links, where TCP slow start may not reach steady state before the test ends. For a link with 200ms RTT, the TCP congestion window grows by one MSS per RTT during the congestion avoidance phase, requiring:
For a 10Gbps link with 200ms RTT and 1460-byte MSS, is approximately 171 seconds—nearly 3 minutes. A 10-second iPerf3 test would measure only the slow-start phase, reporting throughput that is 2-3x higher than sustainable. The flag (5-minute test) is the minimum recommended duration for high-latency links.
On Windows platforms, Microsoft's ntttcp (NT TCP/IP Test) provides more granular control over buffer sizes, concurrent threads, and CPU affinity. It also reports per-connection CPU utilization, enabling the calculation of throughput-per-CPU-cycle efficiency—a critical metric in virtualized environments where CPU oversubscription constrains throughput independently of link capacity. For application-aware monitoring, NetFlow/IPFIX and sFlow provide sampled packet analysis that reveals throughput by application, source, and destination. The key metric from flow data is not peak throughput but the "95th percentile sustained rate," which determines whether the observed throughput pattern matches the application's expected profile or indicates congestion-induced throttling.
The final and most often overlooked measurement is the "Application Goodput Ratio"—the ratio of application-layer data to total bytes transmitted. This can be derived from flow data:
Where accounts for retransmitted bytes. A ratio below 0.85 suggests either excessive overhead (small packets, many connections) or a high retransmission rate, both of which are actionable engineering signals. Throughput is not a single number—it is a stack of nested measurements, and the engineer's art lies in knowing which layer to measure and how to interpret the result.