MPLS: Label Switched Path Mechanics
The Architecture of the 'Layer 2.5' Transport Fabric
The Label Anatomy
MPLS operates between the Data Link Layer (Layer 2) and the Network Layer (Layer 3), earning its common moniker as Layer 2.5. It prepends a 32-bit shim header to the IP packet, allowing routers to make forwarding decisions without ever inspecting the IP destination.
MPLS Shim Header (32 bits)
- Label Value: The identifier used for switching. Values 0-15 are reserved (e.g., IPv4 Explicit Null).
- Traffic Class (TC): Formerly EXP bits. Used for QoS mapping (DS-TE).
- Bottom of Stack (S): If set (1), this is the last label before the payload.
- TTL: Prevents infinite loops within the Label Switched Path (LSP).
Label Distribution Protocol (LDP)
LDP is the control plane mechanism that synchronizes label mappings across the network. It operates on a hop-by-hop basis, following the path determined by the IGP (OSPF or IS-IS).
LDP Session Establishment
- Discovery: Routers send UDP Hellos to on port 646.
- TCP Handshake: A TCP session is established between the highest Transport Addresses.
- Initialization: Parameters (Keepalive, Label Range) are negotiated.
- Label Binding: Routers exchange mappings.
Downstream Unsolicited (DU)
In most service provider networks, LDP uses DU mode. This means a router will automatically advertise label bindings to all its neighbors for all prefixes in its routing table, ensuring that LSPs are built proactively before traffic arrives.
LSP: Label Swapping Simulator
Visualize Push, Swap, and Pop (PHP) operations in a Label Switched Path.
Operation Details
Customer sends standard IP packet to Service Provider edge.
Label Forwarding Table
Interactive Simulation: Label Propagation and LSP Pathfinding
Architectural Isolation: L3VPNs
The primary value proposition of modern MPLS is the BGP/MPLS L3VPN (RFC 4364). It allows a provider to carry multiple overlapping private IP spaces over a single shared core without traffic leakage.
The 2-Label Stack Physics
To support VPNs, MPLS uses Label Stacking. A VPN packet in the core has at least two labels:
Outer Label (Transport Label)
Directs the packet to the correct egress PE (Provider Edge) router. It is swapped at every hop in the core (P-routers).
Inner Label (Service / VPN Label)
Hidden from the core. It tells the egress PE which customer VRF (Virtual Routing and Forwarding) instance to use for the final IP lookup.
Traffic Engineering with RSVP-TE
Standard IP routing is "selfish"; every packet takes the shortest path, leading to congestion on primary links while secondary links sit idle. MPLS-TE allows for global network optimization.
CSPF (Constrained Shortest Path First)
Unlike OSPF which only looks at link costs, CSPF factors in constraints like available bandwidth, link color (affinity), and administrative weight. It calculates a path that satisfies the SLA before signaling it.
Fast Re-Route (FRR)
In a standard network, convergence after a link failure takes seconds. With MPLS FRR, the ingress router pre-calculates a Backup LSP.
The Future: Segment Routing (SR-MPLS)
The industry is rapidly moving away from LDP and RSVP-TE toward Segment Routing. SR simplifies the control plane by eliminating the need for LDP entirely, using the IGP itself to distribute labels.
Stateless Core Architecture
In RSVP-TE, every router along a path must maintain state for every tunnel. In SR, the Source Router encodes the entire path into a stack of labels (segments). The core routers remain stateless; they simply pop the top label and forward based on the Prefix SID (Segment Identifier).
Frequently Asked Questions
MPLS Traffic Engineering: Constraint-Based Path Computation and the CSPF Algorithm
MPLS-TE extends MPLS from a simple label-swapping transport to a full traffic engineering platform. Unlike LDP, which maps the IGP shortest path, MPLS-TE uses RSVP-TE (RFC 3209) to signal an Explicit Route (ERO) — a path that need not follow the IGP metric. The path is computed by the Head-End (ingress LSR) using a Constraint-Based Shortest Path First (CSPF) algorithm, then installed as a Traffic Engineering Label Switched Path (TE-LSP).
The CSPF Path Computation Engine
CSPF is a modified Dijkstra that considers both the standard IGP metric and a set of constraints. The constraints are encoded as TE Attributes flooded via OSPF-TE (RFC 3630) or IS-IS-TE (RFC 5305) extensions. Each link carries:
- Maximum Reservable Bandwidth: The total bandwidth that can be reserved for TE-LSPs on this link (may be higher than the physical bandwidth if oversubscription is allowed).
- Unreserved Bandwidth: The remaining bandwidth at each of the 8 priority levels (0–7). The head-end uses this to determine if the link has capacity for the requested Bandwidth (B/0).
- Link Administrative Group (Color): A 32-bit bitmask assigned to each link. An ERO can include or exclude links with specific color bits (e.g., include-only links with bit 0 = GOLD, exclude links with bit 3 = HIGH-LATENCY).
- TE Metric: An optional metric distinct from the IGP metric, used for TE-specific path optimization.
CSPF prunes links that violate constraints, then runs Dijkstra on the remaining topology. The complexity is:
where is the linear scan of all edges (removing links with insufficient bandwidth or excluded colors). In a network with 5,000 TE-enabled links and 500 nodes, CSPF completes in 10–50 ms on a modern Route Processor. The resulting ERO is a strict list of IP addresses (strict hops) that the RSVP-TE PATH message follows hop-by-hop, establishing the LSP.
RSVP-TE Path Setup: PATH and RESV Message Exchange
Once CSPF determines the ERO, the head-end sends an RSVP-TE PATH message hop-by-hop along the computed path. Each intermediate LSR checks the AdSpec (advertised specification) for bandwidth availability at the requested priority. If the link has sufficient unreserved bandwidth, the LSR records the PATH state in its local PSB (Path State Block) and forwards the PATH to the next hop. When the tail-end receives the PATH, it responds with an RSVP-TE RESV message, which travels back along the reverse path, reserving bandwidth and installing the label entry. The total setup time for an LSP crossing 10 hops is approximately:
For a 10-hop path with per hop and T_{\\text{proc}} = 2 \\, \\text{ms}} per LSR, total setup time is — fast enough for dynamic failover but too slow for per-flow setup in a video-on-demand network. This is why MPLS-TE is used for aggregate tunnel engineering (e.g., site-to-site VPNs), not per-microflow path selection.
Inter-AS MPLS VPN: Option A, B, and C Architectures
When an MPLS L3VPN must extend across two different Autonomous Systems (e.g., a multinational corporation connecting its North American and European divisions, each served by a different provider), the BGP/MPLS VPN architecture must be extended across AS boundaries. Three standard inter-AS options, defined in RFC 4364, provide different trade-offs between scalability, configuration complexity, and routing transparency.
Option A: Back-to-Back VRF (VRF-to-VRF)
Option A is the simplest: each AS's ASBR (Autonomous System Boundary Router) connects to the other via a dedicated physical or logical sub-interface per VPN. Each sub-interface belongs to a VRF on each ASBR, and the two VRFs exchange routes via eBGP or static routing across the inter-AS link. The key characteristics:
- Label handling: MPLS labels are removed at the egress ASBR and a new IP packet is forwarded to the ingress ASBR of the neighboring AS — no end-to-end label stack.
- Scalability: Linear with the number of VPNs. Each VPN requires a separate sub-interface (or VLAN trunk with a dedicated VLAN per VPN). For 1,000 VPNs, you need 1,000 sub-interfaces and 1,000 VRF peering sessions.
- Isolation: Maximum. Each VPN is completely independent. No routing state from other VPNs leaks across the AS boundary.
- Use case: Small-scale inter-AS VPNs with fewer than 50 VPNs. Common in carrier-of-carrier scenarios where two providers exchange a small number of customer VPNs.
Option B: eBGP Labeled Unicast with ASBR Route Exchange
Option B eliminates the per-VPN sub-interface requirement. The ASBRs exchange eBGP Labeled Unicast routes carrying the VPN-IPv4 prefixes with their MPLS labels. Each VPN-IPv4 route includes the full label stack (the inner VPN label plus the transport label). When ASBR-1 receives a VPN-IPv4 route, it swaps the label stack and re-advertises the route to ASBR-2 via eBGP, which re-advertises into its own IGP domain.
The critical constraint in Option B is the label allocation model: each ASBR must allocate and advertise a new label for every VPN-IPv4 prefix received from the neighboring AS. For 500,000 VPN-IPv4 routes (200,000 VPN customer prefixes), each ASBR needs 500,000 label bindings in its LIB (Label Information Base) — a significant memory consumption on platforms with limited hardware label space. Option B also requires per-prefix label mode (as opposed to per-VRF label mode), which increases the FIB TCAM consumption by additional entries on the ASBR.
Option C: Multihop eBGP with Route Reflectors
Option C is the most scalable inter-AS VPN architecture. In this model, the ASBRs do not exchange VPN-IPv4 routes at all. Instead, they exchange only the loopback reachability of Route Reflectors (RRs) within each AS. The RRs in each AS peer directly with each other via multi-hop eBGP, exchanging VPN-IPv4 routes with labels as if they were within the same AS. The ASBRs simply propagate labeled BGP routes for each other's RR loopbacks.
The advantage is that the ASBRs only need label bindings for a handful of RR loopback prefixes (not 500,000 VPN prefixes). The RRs handle all VPN-IPv4 routing, performing the next-hop-self function for all inter-AS VPN routes. This reduces ASBR memory by 99% compared to Option B and is the deployment model used by all major Tier-1 ISPs for global MPLS VPN interconnection. The trade-off is configuration complexity: the multi-hop eBGP session between RRs requires TCP MD5/AO authentication across the public internet, and Loop Prevention (using the Cluster-List and Originator-ID attributes) must be carefully tested across the AS boundary.
Option A
Per-VPN sub-interfaces. Max 50 VPNs. Simple but not scalable.
Option B
ASBR label exchange per prefix. Scalable to 1,000 VPNs. High ASBR memory use.
Option C
Multi-hop RR peering. Scalable to 10,000+ VPNs. Lowest ASBR impact.