In a Nutshell

A firewall is more than a gate; it is a laboratory. Every packet must be analyzed, and as the depth of that analysis increases, the throughput decreases. This article examines the performance delta between Stateful Inspection and Deep Packet Inspection (DPI), and the massive overhead introduced by TLS Decryption.

The Inspection Spectrum

Not all firewalling is created equal. The more 'layers' the firewall uncurls, the more CPU cycles it consumes.

Hardware Architecture: ASIC vs. x86

Modern firewalls diverge into two paths: Software-defined (x86) and Hardware-accelerated (ASIC/FPGA).

x86 (CPU Based)

Flexible but sequential. Great for complex logic (Next-Gen Firewall layers), but struggles with raw packet bashing. Latency increases linearly with load.

ASIC (Hardware)

Parallel processing. FPGA chips handle packet switching and basic ACLs at wire speed (nanoseconds). They offload the CPU, keeping latency consistent under load.

The "Elephant Flow" Problem

A single large file transfer (e.g., a 100GB backup) is a single TCP stream. Most firewalls use Receive Side Scaling (RSS) to distribute load, but a single stream must stay on one CPU core to maintain order.

Stateful Inspection Engine

Connection Tracking System

CLIENT192.168.1.10
INSPECT
FIREWALL
SERVER10.0.0.5
INTERNET

Traffic Generator

STATE TABLE

Allow: 0Drop: 0
Empty State Table

The TLS Decryption Tax

Today, >90% of web traffic is encrypted (HTTPS). To inspect this traffic for malware, the firewall must perform a Man-in-the-Middle (MITM) decryption:

  1. Intercept the client's handshake.
  2. Decrypt the traffic using a local certificate.
  3. Inspect the payload.
  4. Re-encrypt the traffic for the destination.

This process is computationally expensive. Enabling full TLS Decryption can drop a firewall's rated throughput by 50% to 80%.

SSL Offloading vs. Decrypt-Mirror

To mitigate this "decryption tax," engineers deploy two primary architectures:

SSL Offloading

The firewall terminates the TLS session, inspects the plaintext, and establishes a new session to the server. This provides full security but consumes the most CPU resources per packet.

Decrypt-Mirror

The firewall decrypts once and sends a copy to an external out-of-band sensor (like an IDS or DLP tool). This reduces latency for the primary traffic path while maintaining visibility.

Connection Saturation: CPS vs. Concurrent

Throughput is not the only limit. Firewalls are also bounded by their State Table size and processing speed.

Cipher suite selection also plays a role. Older algorithms like AES-CBC are slower on modern CPUs compared to AES-GCM (Galois/Counter Mode), which is designed for hardware acceleration and parallelization.

Performance Optimizations

  • Hardware Offload (ASICs/FPGA): Moving encryption and pattern matching into dedicated chips.
  • Single-Pass Architecture: Performing all security checks (AV, IPS, App Control) in a single unified scan rather than serial processing.

Conclusion

A firewall is a compromise between safety and speed. By understanding where your bottlenecks lie—whether in CPU-bound encryption or packet-header logic—you can design a perimeter that protects without choking the business.

Share Article

Technical Standards & References

REF [FIREWALL-ARCH]
NIST
Firewall Architecture and Performance
VIEW OFFICIAL SOURCE
REF [NEXT-GEN-FW]
Gartner
Next-Generation Firewall Architecture
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.

Related Engineering Resources