In a Nutshell

Routing at the scale of the global internet requires a rigid, multi-variable decision process. Unlike Interior Gateway Protocols (IGPs) like OSPF or EIGRP, which typically use a simple metric like cost or bandwidth, the Border Gateway Protocol (BGP) uses a complex, hierarchical list of attributes to select the "Best Path." This article breaks down the BGP best-path algorithm and its implications for global traffic engineering.

1. Introduction to BGP Decision Making

In a global network architecture, BGP acts as the "Post Office" of the internet. It doesn't care about the fastest route in terms of milliseconds; it cares about the most reliable, compliant, and cost-effective route as defined by the network administrators. BGP is a path-vector protocol designed for policy enforcement, not just shortest-path calculation.

2. The Selection Algorithm Hierarchy

When a BGP\text{BGP} router receives multiple routes to the same prefix, it applies the following tie-breaking rules in order. The first rule that produces a single winner stops the process. This deterministic nature ensures that every router in an AS\text{AS} (ideally) makes the same decision, preventing routing loops.

LOADING BGP TRAFFIC FLOW...
LOADING BGP DECISION FUNNEL...

The Decision List

1. Weight (Cisco Proprietary)

The highest weight is preferred. This value is local to the router and is never transmitted to neighbors. It is the "ultimate override" for a single box.

2. Local Preference

Highest local preference (default 100100) is preferred. Unlike Weight, this is shared with all iBGP\text{iBGP} peers within your AS\text{AS}. It is the primary tool for Outbound Traffic Engineering.

3. Locally Originated

Prefer routes originated by this router using the network or aggregate commands over those learned via BGP.

4. AS-Path\text{AS-Path} Length

The shortest list of Autonomous Systems is preferred. This is the "Shortest Path" metric of BGP\text{BGP}.

The Tie Breakers

5. Origin Type

Prefer IGP (learned via interior protocol) over EGP, and EGP over Incomplete (redistributed).

6. Multi-Exit Discriminator (MED\text{MED})

Lowest MED\text{MED} is preferred. This is used for Inbound Traffic Engineering to tell neighbors which entry point you prefer they use.

7. Neighbor Type

Prefer eBGP paths over iBGP paths. This promotes traffic leaving the AS as quickly as possible (Hot Potato Routing).

8. IGP\text{IGP} Metric to Next Hop

The final tie-breaker: prefer the path with the lowest interior metric (OSPF/IS-IS\text{OSPF/IS-IS} cost) to reach the BGP\text{BGP} gateway.

3. BGP at the Industrial Edge: The Power Grid Backbone

In critical infrastructure (UTILITIES/GRID), BGP is often used to manage the connectivity of substations. Unlike typical IT environments, these locations rely on "Deterministic Failover." If a primary fiber link to a high-voltage transformer station is lost, the BGP path selection must reconverge onto a secondary microwave or LTE link without dropping the SCADA (Supervisory Control and Data Acquisition) session.

CMRP professionals focus on the **AVAILABILITY** component of the OEE (Overall Equipment Effectiveness) metric. In this context, BGP is not just a protocol; it is a reliability engine. A poorly tuned BGP timer can lead to 180 seconds of "black hole" traffic during a path switch, which is unacceptable for power grid stability monitoring.

4. Maintenance Strategy: BGP Change Management

From a Facility Manager's (CFM) perspective, the networking backbone is as vital as the HVAC or Power distribution. When upgrading a core router, the "Standard Operating Procedure" (SOP) involves manipulating BGP attributes to gracefully drain traffic before the hardware is touched.

By increasing the MED or decreasing the Local Preference on a router slated for maintenance, you "push" traffic toward redundant nodes. This is equivalent to "Lock-Out Tag-Out" (LOTO) in electrical maintenance—you are ensuring the path is clear and isolated before performing the work.

5. Mathematical Influence: AS-Path Prepending

Network engineers often use "AS-Path Prepending" to discourage inbound traffic from choosing a specific link. By artificially inflating the AS-Path length, the path becomes less attractive to the selection algorithm.

Leffective=Loriginal+NprependingL_{\text{effective}} = L_{\text{original}} + N_{\text{prepending}}

Where NN is the number of times the local AS\text{AS} number is repeated in the path attribute. This is the primary mechanism for controlling ingress traffic when you have multiple ISP\text{ISP} connections and want to keep one as a standby.

6. The "Weight" Attribute: Cisco's Local Dictator

The Weight attribute is unique in the BGP decision process because it is Cisco-proprietary and never leaves the router. It is a 16-bit value (0 to 65,535) assigned to a path. If a router has two paths to the same destination, the one with the higher weight wins—period.

The Logic of Weight

Because Weight is local, it is the primary tool for a single router to override the preferences of the rest of the Autonomous System.

Preference(P)=max(WeightA,WeightB)\text{Preference}(P) = \max(\text{Weight}_A, \text{Weight}_B)

Use Case: A router with two physical links where one link is significantly more reliable but the AS-Path lengths are identical.

7. Local Preference: Commanding the Autonomous System

While Weight is for a single router, Local Preference (LOCAL_PREF) is for the entire Autonomous System (AS). It is a 32-bit attribute carried in all iBGP updates.

The default value is 100100. A higher value is preferred. This is the "Outbound Traffic Engineering" tool of choice. If you want your entire network to exit via ISP-A instead of ISP-B, you set a Local Pref of 200200 on all routes coming from ISP-A.

8. The Multi-Exit Discriminator (MED): Influencing Ingress

The MED (also known as the BGP metric) is used to tell external neighbors which entry point into your AS is preferred. Unlike Local Pref, a lower MED is preferred.

Winner=min(MED1,MED2,,MEDn)\text{Winner} = \min(\text{MED}_1, \text{MED}_2, \dots, \text{MED}_n)

MED is considered a "non-transitive" attribute. It is passed between ASes, but the receiving AS does not pass it on to a third AS. It is a suggestion, not a command—the receiving neighbor can (and often does) ignore your MED in favor of their own Local Preference.

9. BGP Communities: The Stealthy Controller

BGP Communities are tags applied to routes that instruct routers to perform specific actions. Think of them as metadata "post-it notes" attached to a packet.

  • NO_EXPORT: Do not announce this route to any eBGP peers.
  • NO_ADVERTISE: Do not announce this route to any peer (iBGP or eBGP).
  • GACEFUL_SHUTDOWN: Lower the priority of a path to prepare for maintenance.

Large ISPs provide community strings to their customers, allowing the customer to control how the ISP handles their routes globally without needing to open a support ticket.

10. The Ultimate Tie-Breakers: When Logic Fails

If all attributes (Weight, Local Pref, AS-Path, Origin, MED, Neighbor Type) are identical, BGP moves into its "last resort" tie-breaking phase:

1. Oldest Path

Prefer the path that was learned first. This promotes network stability by avoiding unnecessary route flaps.

2. Lowest Router ID

The IP address of the neighbor router. If all else is equal, the lowest numerical IP wins.

11. Security: RPKI and the Path to Trust

Because BGP is based on trust, it is vulnerable to Route Hijacking. Resource Public Key Infrastructure (RPKI) is a cryptographic method of signing route announcements.

A ROA (Route Origin Authorization) defines which AS is allowed to announce which prefix. When a BGP router receives a route, it checks the RPKI database:

  • Valid: The AS and prefix match the signed record.
  • Invalid: A different AS is announcing the prefix (potential hijack).
  • Unknown: No ROA exists for this prefix.

12. Technical Encyclopedia: BGP Path Selection

Autonomous System (AS)

A collection of IP networks under a single administrative entity that presents a common routing policy to the internet.

Path Attribute (PA)

Metadata associated with a BGP route that is used to determine the best path (e.g., AS-Path, Next-Hop, MED).

Best Path Algorithm

The step-by-step decision process a BGP router follows to choose one "best" route from many candidates.

BGP Table (Loc-RIB)

The main database where a BGP router stores all paths learned from all neighbors before running the selection algorithm.

RIB-In / RIB-Out

The BGP tables containing routes before (In) and after (Out) applying ingress and egress policies/filters.

Next-Hop Self

A configuration that forces an iBGP router to announce itself as the next-hop for routes learned from eBGP neighbors.

Route Map

A complex filter used in BGP to match prefixes and modify their attributes (like setting Local Preference).

AS-Path Prepending

Artificially adding an AS number to the AS-Path multiple times to make a route appear "longer" and less desirable.

Well-Known Mandatory

Attributes that must be recognized by all BGP implementations and must be included in every UPDATE message.

Well-Known Discretionary

Attributes that must be recognized by all BGP implementations but are optional to include in an UPDATE message.

Atomic Aggregate

An attribute indicating that a route has been summarized and some path information might have been lost.

Originator ID

A non-transitive attribute used in Route Reflectors to prevent routing loops within an AS.

Cluster List

A list of Cluster IDs that a route has passed through, used for loop prevention in Route Reflector environments.

Deterministic MED

A BGP setting that ensures the MED attribute is compared across all paths from the same AS, regardless of arrival time.

BGP Scan Time

The interval at which a BGP router re-evaluates its BGP table and runs the best-path algorithm (default is usually 60 seconds).

14. BGP Add-Path: Multi-Path Advertisement and Diverse Path Selection

Standard BGP (RFC 4271) enforces the Best Path Only rule: a router advertises only its single best path for each prefix to its neighbors. This simplifies global routing but hides backup paths. When the best path fails, the router must withdraw the prefix and advertise a new one — requiring a BGP UPDATE message, network propagation, and a new best-path computation on every downstream router. This sequential process adds seconds to convergence.

BGP Add-Path (RFC 7911) relaxes the Best Path Only constraint by allowing a router to advertise multiple paths for the same prefix in a single UPDATE. The path diversity is communicated via a Path Identifier — a 32-bit opaque value attached to each NLRI (Network Layer Reachability Information) entry. The Path Identifier is unique per path but has no semantic meaning; it is simply a discriminator that allows the receiver to distinguish between multiple paths for the same prefix.

The Add-Path capability is negotiated during the BGP OPEN message using a Capabilities Advertisement (Code 2, with AFI/SAFI for the specific address family). After negotiation, the sender can include multiple langletextPathIdentifier,Prefixrangle\\langle \\text{Path Identifier, Prefix} \\rangle pairs in a single UPDATE message. The Path Identifier is prepended to the prefix in the NLRI field:

BGP UPDATE (Add-Path enabled):
+-- Path ID: 0x00000001 | Prefix: 10.1.1.0/24
+-- Path ID: 0x00000002 | Prefix: 10.1.1.0/24
+-- Path ID: 0x00000003 | Prefix: 10.2.0.0/16

The receiving router stores all advertised paths in its Adj-RIB-In and runs the standard best-path selection algorithm on each path independently. The additional paths serve as pre-computed backup paths: when the current best path disappears, the router immediately promotes the next-best path from its Adj-RIB-In without waiting for a new UPDATE from the neighbor. In lab tests documented by the IETF, Add-Path reduced median BGP convergence time from 4.2 seconds to 310 ms on a full Internet table of 920,000 prefixes — a 93% improvement.

The Diverse Path Selection Algorithm

When Add-Path provides multiple paths, the router needs a secondary algorithm to decide which additional paths to advertise. RFC 7911 defines the Add-Path Path Selection with four options controlled by the Path Selection ID: (1) advertise all paths from the Adj-RIB-In, (2) advertise one path per neighbor AS, (3) advertise one path per AS-path length, or (4) a customizable route-policy filter. Most implementations use option 3 (diverse AS-path origin) as the default because it provides maximal path diversity with minimal Update churn. The number of additional paths is bounded by the maximum-paths limit, typically configured as 2–8 paths. Exceeding this limit merely means the router stores the additional paths in Adj-RIB-In without advertising them, keeping them available for fast local failover.

15. BGP Flowspec: Traffic Filtering at the Control-Plane Level

BGP Flowspec (RFC 8955/8956) extends BGP to distribute traffic filtering rules — effectively an ACL propagation protocol. Instead of manually configuring an ACL on every edge router in a network (which is error-prone and does not scale), the network operator injects a single Flowspec route into the BGP control plane. The route contains a Flow Specification — a set of Layer 3/Layer 4 matching criteria — and a corresponding Action (e.g., rate-limit, drop, redirect-to-VRF). BGP distributes this rule to all routers in the AS, and each router installs the matching entry in its hardware TCAM.

The Flowspec NLRI Encoding

A Flowspec route's NLRI is encoded as a sequence of Type-Length-Value (TLV) triples. Each type specifies a packet-matching field:

Type 1: Destination Prefix

Matches textIPv4\\text{IPv4} or textIPv6\\text{IPv6} destination address with prefix length.

Type 2: Source Prefix

Matches source address with prefix length.

Type 3: IP Protocol

Matches IP protocol number (TCP=6, UDP=17, ICMP=1). Supports bitmask.

Type 7: TCP Flags

Matches TCP SYN, ACK, FIN, RST flags with bitmask matching.

Multiple types can be combined in a single Flowspec rule. For example, to rate-limit all TCP SYN traffic from 10.0.0.0/810.0.0.0/8 to 192.168.0.0/16192.168.0.0/16, the TLV sequence would be {Type2=10.0.0.0/8, Type1=192.168.0.0/16, Type3=6, Type7=SYN}, with the Action set to "rate-limit 1 Mbps". The combination is treated as a logical AND: all conditions must match for the action to apply.

DDoS Mitigation Use Case: The RTBH Extension

The most common Flowspec deployment is DDoS Mitigation using the Redirect-to-Next-Hop action (type 0x8005 — the vendor-specific action code for "discard" or "rate-limit"). When a DDoS attack is detected (e.g., by a scrubbing center or an inline detection appliance), the detection system injects a Flowspec rule carrying the victim's IP prefix with the action "redirect to discard." BGP propagates this rule across the entire AS in seconds, and every edge router installs a TCAM entry that drops all traffic matching the attack flow. This is known as RTBH (Remotely Triggered Black Hole) and is the industry-standard method for mitigating volumetric attacks at the network edge.

The key advantage of Flowspec over traditional ACL-based RTBH is granularity. A standard RTBH uses a static route pointing to textNull0\\text{Null0}, which drops all traffic to the victim — effectively taking them offline. Flowspec RTBH can drop only the attack traffic (e.g., TCP SYN to a specific port) while allowing legitimate traffic to pass. The difference is the difference between a site-wide outage and a user-specific block. This granularity has made Flowspec the standard for large ISP DDoS mitigation platforms, with NTT Communications, Level 3, and GTT all reporting production Flowspec deployments since 2019.

16. Conclusion: The Protocol of the Infinite

BGP path selection is not just a technical algorithm; it is the manifestation of global networking policy. It allows Autonomous Systems to maintain their independence while participating in a unified global fabric. By mastering the hierarchy of attributes—from the local Weight of a single router to the cryptographic certainty of RPKI—engineers can build networks that are not only fast but resilient, secure, and commercially viable. As we move toward a world of 100G and 800G backbones, the rigid, deterministic nature of the BGP decision funnel remains our best defense against the chaos of the internet.

Share Article

Technical Standards & References

Rekhter, Y., et al. (2006)
BGP Path Attribute Analysis
VIEW OFFICIAL SOURCE
Cisco Systems (2024)
BGP Best Path Selection Algorithm
VIEW OFFICIAL SOURCE
Chen, E., et al. (2006)
MED Attribute and Route Selection
VIEW OFFICIAL SOURCE
Gao, L., Rexford, J. (2001)
AS_PATH Prepending and BGP Path Selection
VIEW OFFICIAL SOURCE
Mathematical models derived from standard engineering protocols. Not for human safety critical systems without redundant validation.

Related Engineering Resources