Firewall Performance Trade-offs
Throughput vs. Security Inspection
The Inspection Spectrum
Not all firewalling is created equal. The more 'layers' the firewall uncurls, the more CPU cycles it consumes.
Hardware Architecture: ASIC vs. x86
Modern firewalls diverge into two paths: Software-defined (x86) and Hardware-accelerated (ASIC/FPGA).
x86 (CPU Based)
Flexible but sequential. Great for complex logic (Next-Gen Firewall layers), but struggles with raw packet bashing. Latency increases linearly with load.
ASIC (Hardware)
Parallel processing. FPGA chips handle packet switching and basic ACLs at wire speed (nanoseconds). They offload the CPU, keeping latency consistent under load.
The "Elephant Flow" Problem
A single large file transfer (e.g., a 100GB backup) is a single TCP stream. Most firewalls use Receive Side Scaling (RSS) to distribute load, but a single stream must stay on one CPU core to maintain order.
Stateful Inspection Engine
Connection Tracking System
Traffic Generator
STATE TABLE
The TLS Decryption Tax
Today, >90% of web traffic is encrypted (HTTPS). To inspect this traffic for malware, the firewall must perform a Man-in-the-Middle (MITM) decryption:
- Intercept the client's handshake.
- Decrypt the traffic using a local certificate.
- Inspect the payload.
- Re-encrypt the traffic for the destination.
This process is computationally expensive. Enabling full TLS Decryption can drop a firewall's rated throughput by 50% to 80%.
SSL Offloading vs. Decrypt-Mirror
To mitigate this "decryption tax," engineers deploy two primary architectures:
SSL Offloading
The firewall terminates the TLS session, inspects the plaintext, and establishes a new session to the server. This provides full security but consumes the most CPU resources per packet.
Decrypt-Mirror
The firewall decrypts once and sends a copy to an external out-of-band sensor (like an IDS or DLP tool). This reduces latency for the primary traffic path while maintaining visibility.
Connection Saturation: CPS vs. Concurrent
Throughput is not the only limit. Firewalls are also bounded by their State Table size and processing speed.
Cipher suite selection also plays a role. Older algorithms like AES-CBC are slower on modern CPUs compared to AES-GCM (Galois/Counter Mode), which is designed for hardware acceleration and parallelization.
Performance Optimizations
- Hardware Offload (ASICs/FPGA): Moving encryption and pattern matching into dedicated chips.
- Single-Pass Architecture: Performing all security checks (AV, IPS, App Control) in a single unified scan rather than serial processing.
Conclusion
A firewall is a compromise between safety and speed. By understanding where your bottlenecks lie—whether in CPU-bound encryption or packet-header logic—you can design a perimeter that protects without choking the business.
