QoS Priority Queuing: Congestion Management

The Necessity of Quality of Service

In a best-effort network, all traffic is treated equally. However, real-time applications like VoIP and Video conferencing are sensitive to Jitter and Latency, while file transfers are only sensitive to throughput. QoS allows us to classify and prioritize these streams.

Fundamental Queuing Algorithms

1. FIFO (First-In, First-Out)

The default. Packets departure in the order they arrived. No prioritization.

2. Priority Queuing (PQ)

Packets are placed into strictly prioritized queues (High, Medium, Normal, Low). The High queue must be completely empty before any packet in the Medium queue is sent.

Risk: Lower-priority queues can suffer from 'Starvation'.

3. Weighted Fair Queuing (WFQ)

Automatically divides bandwidth among active flows based on weight. Lower-volume flows (like ICMP or VoIP control) get better prioritization than high-volume TCP streams.

4. Low Latency Queuing (LLQ)

A hybrid of PQ and CBWFQ. It adds a "Priority" class to a weighted queuing system, ensuring that real-time voice packets are always serviced first without starving other classes entirely.

Traffic Scheduler Simulation

Ingress rate exceeds Egress rate (Congestion). Observe how packets are delayed.

Scheduling Algorithm

Average Queue DelayTarget

VoIP (EF)0ms < 150

Video (AF41)0ms < 300

Data (BE)0ms N/A

Ingress (1000Mbps)

Egress (100Mbps)

Default Queue

Hardware Scheduler

Advanced Congestion Management

The Token Bucket Algorithm (Policing/Shaping)

Imagine a bucket that gets filled with 'tokens' at a constant rate (CIR). To send a packet, you must pay a token.

Traffic Policing: If the bucket is empty, the packet is typically dropped (hard limit).
Traffic Shaping: If the bucket is empty, the packet is queued (delayed) until a token arrives. This smoothes out bursts.

WRED (Weighted Random Early Detection)

TCP Global Synchronization occurs when a router's buffer is full and it drops all incoming packets. Every TCP session slows down at once, then speeds up at once, creating waves of congestion.

WRED solves this by randomly dropping some packets before the buffer is full. It drops low-priority flows first, signaling specific TCP sessions to slow down without punishing the entire network.

fq_codel (Flow Queueing + Controlled Delay)

The modern standard for fighting Bufferbloat. instead of one giant FIFO queue, fq_codel creates disparate queues for every single flow (hashed). It then prioritizes 'sparse' flows (like VoIP or DNS) over 'fat' flows (like Netflix), ensuring that a large download never blocks a ping.

Deficit Round Robin (DRR)

DRR is an evolution of Weighted Round Robin (WRR) that handles variable-length packets fairly. In standard WRR, a queue sending 1500B packets gets more bandwidth than a queue sending 64B packets, even if they have the same weight. DRR uses a Deficit Counter to track how many bytes a queue is "owed," ensuring bit-level fairness across different packet sizes.

The Physics of Shaping: Bucket Models

Traffic management often uses two metaphorical models to regulate flow:

Leaky Bucket

Discrete rate limiter.

Water (packets) enters at any speed. It leaks out (departure) at a strictly constant rate. This eliminates bursts entirely but adds fixed delay to every packet.

Token Bucket

Burst-capable regulator.

Tokens enter the bucket at a constant rate. Packets can be sent instantly as long as tokens are available. This allows for bursty traffic while maintaining a long-term average rate.

Scheduling Complexity: O(1) vs. O(N)

In a high-speed core router processing 400Gbps, the CPU has mere nanoseconds to decide which packet to send next. The complexity of the queuing algorithm matters:

O(1) Algorithms: Like Priority Queuing or DRR. The time to pick a packet is constant, regardless of how many flows are active. This is mandatory for hardware-based ASICs.
O(log N) Algorithms: Like Weighted Fair Queuing (WFQ). The switch must sort flows based on "finish time" timestamps. As the number of flows grows, the computational cost increases, often requiring expensive dedicated memory (TCAM).

Mathematics of Delay

Serialization delay is the time required to place bits onto the physical medium.

D_{serialization} = \frac{Packet\ Size\ (bits)}{Link\ Bandwidth\ (bps)}

On slow links, a large 1500B data packet can block a 64B VoIP packet for several milliseconds, causing jitter. QoS mitigates this by allowing the scheduler to jump the queue.

Conclusion

Effective QoS design requires identifying 'Mission Critical' traffic and ensuring it has a reserved 'slice' of the bandwidth during periods of heavy congestion.

Engineering Knowledge Expansion

Performance

QoS Priority Queuing

In a Nutshell