Port Mirroring & SPAN
Visibility without Disruption
The Mirroring Mechanic
In a switched environment, traffic is delivered only to the specific port where the destination MAC address resides. To monitor this traffic—for security auditing or troubleshooting—we must instruct the switch fabric to 'mirror' (copy) packets.
Port Mirroring (SPAN) Simulator
Traffic Replication & Visibility
Source and Destination Dynamics
A SPAN session consists of a Source (ports or VLANs being monitored) and a Destination (the port where the sniffer/IDS is connected).
- Ingress (RX): Monitors traffic entering the source.
- Egress (TX): Monitors traffic leaving the source.
- Both: Full duplex visibility.
1. Header Deconstruction: RSPAN vs. ERSPAN
How the traffic is moved across the network depends on the scale and topology of the monitoring session.
- RSPAN (L2): Uses a dedicated Remote VLAN. The packet is preserved exactly as it enters the switch, but capped with an RSPAN VLAN tag. This is limited to the same L2 domain.
- ERSPAN (L3): High-fidelity encapsulation for routed networks. [IP Header] + [GRE Header (Proto 0x88BE)] + [ERSPAN Header] + [Original Frame]
Hardware Perspective: ASIC vs. CPU
Most enterprise-grade switches perform mirroring at the ASIC (Application-Specific Integrated Circuit) level. This is "Non-Blocking," meaning the switch fabric can duplicate the packet with wire-speed performance.
Network TAPs: The Physical Alternative
While SPAN is a logic-based duplication in the switch fabric, a Network TAP (Test Access Point) is a physical hardware device inserted into the cable run.
- Passive Optical TAPs: Use an optical splitter (prism) to physically divide the light. For example, a 70/30 split sends 70% of the light to the destination and 30% to the monitoring tool. It requires no power and cannot fail logically.
- Active Copper TAPs: Mechanically replicate the electrical signal. Importantly, TAPs often provide "Failsafe" bypass, meaning if the TAP loses power, the production link remains closed (connected).
Optimization: Packet Slicing & Masking
Capturing every byte of every packet is expensive in terms of storage and disk I/O. For most monitoring needs (like Flow analysis or Header auditing), we use Packet Slicing.
6. The Network Packet Broker (NPB)
In large enterprises, simply mirroring a port is not enough. We use a Network Packet Broker (NPB) as a middleware layer between the TAPs/SPAN ports and the monitoring tools.
NPB Functions:
- Aggregation: Combine multiple low-bandwidth SPAN feeds into one high-bandwidth tool port.
- Deduplication: If a packet is seen on multiple mirrored ports, the NPB removes the duplicates before sending them to the IDS.
- Filtering: Forward only specific protocols (e.g., HTTP/TLS) to the relevant analyzer.
- Load Balancing: Distribute traffic across a cluster of Wireshark or Zeek sensors.
Calculating Oversubscription & Direction
A common mistake is mirroring a full-duplex 1Gbps link to a single 1Gbps destination.
If , the switch will drop packets at the egress port. To avoid this, use Packet Slicing to only capture the first 64-128 bytes (headers) of each packet.
Security & VLAN Leakage
Monitoring ports are a massive security risk. Because a SPAN destination port receives traffic from other VLANs, it can be exploited to bypass logical segmentations.
Conclusion: Visibility Optimization
Network visibility requires a balance between fidelity and impact. While ERSPAN offers incredible flexibility for virtualized environments, Local SPAN remains the lowest-latency option for real-time forensics. By deconstructing the hardware buffer impact and header overhead, engineers can build visibility fabrics that enhance security without compromising the production link.
ERSPAN Deployment at Scale: Encapsulated Remote Mirroring Across the WAN
Encapsulated Remote SPAN (ERSPAN) extends the port mirroring concept beyond the local switch to span across Layer 3 networks, enabling a security monitoring center in one geographic location to receive mirrored traffic from switches in remote data centers or branch offices. ERSPAN encapsulates the mirrored Ethernet frame inside a GRE (Generic Routing Encapsulation) tunnel with an IP header, allowing the mirrored traffic to traverse routed networks as a standard unicast IP packet. The ERSPAN header includes a 4-byte ERSPAN Type II header (or 8-byte Type III header for more advanced features) that carries the original VLAN ID, the frame index, and timestamp information. The encapsulated packet is then transmitted to the monitoring station's IP address, where the decapsulation software extracts the original Ethernet frame and presents it to the network analyzer as if the traffic were locally connected. The total encapsulation overhead for ERSPAN Type II is 38 bytes (20 IP + 4 GRE + 4 ERSPAN + optional 8-12 bytes for the original Ethernet framing), which means that an ERSPAN-encapsulated 1,500-byte frame becomes 1,538 bytes and may exceed the path MTU if the underlay network is not configured for larger frames.
The deployment of ERSPAN at scale introduces significant bandwidth and processing considerations that are often underestimated during the planning phase. A single ERSPAN session can generate 1-10 Gbps of encapsulated mirror traffic, depending on the utilization of the source ports. In a data center with 48-port 10 Gbps switches, mirroring all ports to ERSPAN can generate up to 480 Gbps of encapsulated traffic that must be transported to the monitoring station. The network engineer must ensure that the ERSPAN traffic does not compete with production traffic for bandwidth on the underlay network, which would cause congestion and packet loss for both the production and the mirror traffic. The standard solution is to provision a dedicated ERSPAN transport VLAN or VRF that is separate from the production routing table, ensuring that the ERSPAN traffic takes a different path through the network and does not affect production traffic. The dedicated transport typically uses a separate set of interfaces on the monitoring switches and a dedicated uplink from each leaf switch to the monitoring aggregation point. The monitoring aggregation point must be provisioned with sufficient interface capacity to receive the sum of all ERSPAN flows, which may require 400 Gbps or more of total uplink capacity in a large data center deployment.
The processing overhead of ERSPAN encapsulation on the source switch is a critical performance consideration. Each mirrored frame must be encapsulated with the GRE and IP headers, which requires the switch's CPU or line card processor to perform header insertion and checksum calculation. On switches that perform ERSPAN encapsulation in software (older models or lower-end switches), each ERSPAN session can consume 10-30% of the CPU, limiting the total number of ERSPAN sessions that can be supported. Modern data center switches (such as Cisco Nexus 9000 series with CloudScale ASIC, or Arista 7280 series) perform ERSPAN encapsulation in hardware at line rate, supporting up to 64 ERSPAN source sessions per switch with zero CPU impact. The hardware-based ERSPAN implementation uses the switch ASIC's tunnel termination and encapsulation pipeline, which is the same hardware path used for VXLAN encapsulation, providing wire-rate performance for all ERSPAN sessions simultaneously. When evaluating switches for ERSPAN deployment, the network engineer must verify the ERSPAN hardware offload capability and the maximum number of concurrent sessions, as these specifications vary significantly between switch models and software versions.
The monitoring station that receives the ERSPAN traffic must be sized to handle the aggregate bandwidth of all ERSPAN sessions. A typical deployment might have 10 source switches each generating 5 Gbps of ERSPAN traffic, for a total of 50 Gbps that must be received, decapsulated, and analyzed by the monitoring infrastructure. The monitoring station typically uses a dedicated network interface card (NIC) that supports Receive Side Scaling (RSS) and multiple receive queues to distribute the ERSPAN decapsulation load across multiple CPU cores. The ERSPAN GRE decapsulation is performed by the kernel's GRE tunnel driver (or by a specialized data plane framework such as DPDK for high-performance deployments), which strips the outer IP and GRE headers and presents the original Ethernet frame to the packet capture application. A modern server with a 100 Gbps NIC, 32 CPU cores, and DPDK-based packet processing can handle 50-80 Gbps of ERSPAN traffic before the CPU becomes the bottleneck. For deployments exceeding 100 Gbps of aggregate ERSPAN traffic, the monitoring infrastructure must be distributed across multiple servers, with each server responsible for decapsulating and analyzing traffic from a subset of the ERSPAN sources.
The operational management of a large ERSPAN deployment requires comprehensive monitoring of the ERSPAN infrastructure itself. Key metrics to monitor include: ERSPAN source session status (active/inactive/in error), ERSPAN bandwidth utilization (percentage of the provisioned transport capacity), ERSPAN packet drops (due to encapsulation errors or transport congestion), and the decapsulation rate on the monitoring servers. The ERSPAN monitoring dashboard should provide per-switch, per-session visibility into these metrics, with alerting thresholds that notify the network operations team when any ERSPAN session exceeds 80% of the provisioned bandwidth, drops more than 0.1% of packets, or transitions to an error state. The operational procedures for ERSPAN must include a standardized troubleshooting methodology for the most common failure scenarios: ERSPAN session not starting (check the source switch configuration and the ERSPAN transport connectivity), ERSPAN traffic not reaching the monitoring station (check the IP reachability between the source switch and the monitoring station, including any firewall rules that may block GRE protocol 47 or the ERSPAN UDP destination port), and ERSPAN traffic arriving with incorrect timestamps (check the NTP synchronization on all switches and monitoring stations, as ERSPAN Type III timestamps rely on the accuracy of the system clocks). The systematic management of ERSPAN at scale is essential for maintaining the security monitoring capability that justifies the investment in the ERSPAN infrastructure, and it requires the same level of operational discipline that the network engineering team applies to the production network itself.
Network TAP vs SPAN: Engineering Decision Framework for Traffic Visibility
The choice between deploying a physical network Test Access Point (TAP) and using SPAN/ERSPAN port mirroring is one of the most consequential decisions in security monitoring architecture design. A network TAP is a hardware device that is placed inline between two network devices, creating a physical copy of all traffic traversing the link without any interaction with the switch's forwarding ASIC. The TAP operates at Layer 1: it receives the optical or electrical signal from the transmitting device, splits it using a passive optical splitter (for fiber TAPs) or a magnetic coupling (for copper TAPs), and forwards one copy to the receiving device and another copy to the monitoring port. The passive TAP has no IP address, no MAC address, and no software that can be compromised—it is physically impossible for an attacker to disable the TAP or block traffic to the monitoring port without physically cutting the fiber or copper link. This Layer 1 isolation is the fundamental security advantage of TAPs over SPAN: a SPAN session can be disrupted by a misconfiguration, a software bug, or a deliberate attack on the switch's control plane, while a passive TAP is immune to all software-based attacks.
The engineering trade-off between TAPs and SPAN involves several factors that the network engineer must evaluate for each monitoring deployment. The first factor is cost: a single 10 Gbps fiber TAP costs $500-$2,000, while a SPAN session on an existing switch has zero marginal hardware cost. For a deployment monitoring 100 links across a data center, the TAP cost is $50,000-$200,000, which is significant but may be justified for high-security environments. The second factor is signal degradation: a passive optical TAP introduces 1-3 dB of insertion loss (depending on the split ratio), which reduces the optical power budget available to the production link. For a link that is already operating near the minimum receiver sensitivity (e.g., a 10 km single-mode link with an optical budget of 5 dB), adding a 3 dB TAP can push the link below the receiver sensitivity threshold, causing bit errors or link instability. The third factor is physical deployment complexity: TAPs require rack space, power (for active TAPs that regenerate the signal), and physical cabling, while SPAN requires no additional hardware beyond the switch that is already in place. The network engineer must weigh these factors against the security and reliability requirements of each specific monitoring deployment.
The reliability implications of TAPs vs SPAN are a critical consideration for mission-critical links. A passive fiber TAP is a physical device with a mean time between failures (MTBF) exceeding 1 million hours (114 years), making it one of the most reliable components in the network. If the TAP fails, the failure mode for a passive optical TAP is a reduction in optical power rather than a complete link failure, because the passive splitter continues to pass the optical signal (at reduced power) even if the monitoring port output fails. For an active copper TAP (which regenerates the electrical signal), the failure mode can be either "fail-open" (the TAP passes the production traffic even if the TAP electronics fail) or "fail-closed" (the TAP blocks the production traffic if the TAP electronics fail). The fail-open behavior is essential for maintaining production link continuity, and it must be explicitly verified during TAP procurement. SPAN, by contrast, has the opposite reliability characteristic: a SPAN session cannot affect the production traffic on the monitored ports (the SPAN feature is designed to be "fail-safe" with respect to the production forwarding path), but the SPAN session itself may stop working if the switch CPU is overloaded or if the SPAN configuration is corrupted. The practical consequence is that SPAN is acceptable for non-critical monitoring (traffic analysis, capacity planning) where the loss of monitoring data is not a security incident, while TAPs are recommended for critical security monitoring (intrusion detection, forensic analysis) where the continuous availability of the monitoring data is essential.
The timing accuracy of TAPs vs SPAN is another factor that may be critical for certain use cases. A passive TAP introduces zero timing delay (beyond the propagation delay of the optical/electrical path, which is less than 1 microsecond for a short cable), and the timestamps applied to the captured packets reflect the exact time the traffic passed the TAP. SPAN introduces variable timing delay because the switch's forwarding ASIC must copy the frame to the SPAN destination port, which may involve buffering the frame in the ASIC's internal memory while the output port is busy transmitting another frame. The SPAN-induced timing delay can range from less than 1 microsecond (when the SPAN destination port is idle) to hundreds of microseconds (when the SPAN destination port is congested). For applications that require precise timing correlation between multiple monitoring points (such as tracking the exact path of a packet through the network for latency measurement or timing attack detection), the SPAN-induced timing jitter introduces measurement noise that can obscure the signal. The US Securities and Exchange Commission's "Market Access Rule" (SEC Rule 15c3-5) requires financial trading firms to have precise timing of all trading messages with microsecond accuracy, which has driven many high-frequency trading firms to deploy TAPs exclusively for their market data monitoring feeds.
The practical recommendation for most enterprise deployments is a hybrid approach: use TAPs for the critical links that carry sensitive data (internet edge, data center core, financial trading links) and use SPAN/ERSPAN for the less critical links (campus access switches, branch office uplinks, management network links). The TAPs provide guaranteed visibility for the security-critical traffic, while SPAN provides cost-effective visibility for the operational monitoring of the broader network. The monitoring infrastructure must be designed to accept both TAP-originated and SPAN-originated traffic, with the monitoring tools capable of handling the different timing characteristics and reliability profiles of each source. The monitoring aggregation and packet broker (such as Gigamon or Ixia's Vision ONE) receives traffic from both TAPs and SPAN sessions, normalizes the timing, filters the traffic based on security and monitoring requirements, and load-balances the filtered traffic to the appropriate analysis tools. This hybrid TAP+SPAN architecture, combined with the monitoring aggregation layer, provides the optimal balance of security, reliability, cost, and operational simplicity for the vast majority of enterprise network visibility deployments, and it is the recommended architecture in the SANS Institute's "Network Visibility and Monitoring" best practice guide.