BGP Introduction

Before we dive into the RBFS-specific configuration in the next section, this section will provide some basics of BGP. If you are already familiar with BGP, you can skip this section.

Autonomous Systems and BGP

An autonomous system (AS) is a set of routers and networks under a single technical administration. Previously it was assumed that all routers within an autonomous system use the same routing protocol among themselves — known as an Interior Gateway Protocol (IGP). However, in complex networks, it can no longer be assumed that all routers use the same IGP. Therefore, an autonomous system is now understood to be a system whose internal infrastructure is not visible outside the autonomous system and share a common policy.

Autonomous systems are identified by AS numbers (ASN) which are assigned by a Regional Internet Registry, such as ARIN, RIPE, or APNIC. Historically, BGP AS numbers were unsigned 16-bit integers, with the range 64512-65534 reserved for private use. In 2012, support for 32-bit ASNs was defined, with the range 4200000000-4294967294 reserved for private use. In addition, AS numbers 64496-64511 and 65536-65551 are reserved for documentation.

The Internet is a collection of autonomous systems.

Routing between autonomous systems is called Interdomain Routing and the corresponding routing protocols are called Exterior Gateway Protocol (EGP). The Border Gateway Protocol (BGP) is the only Exterior Gateway Protocol in use today. The current version, BGPv4, is defined in RFC 4271.

bgp overview
Figure 1. Autonomous Systems

Therefore, BGP has different requirements than an IGP. BGP must enable the separation of customer routes and infrastructure routes, allow policies to be implemented and traffic flows to be influenced, and scale to accommodate several million routes. On the other hand, in contrast to IGPs, there are no strict requirements regarding the speed of convergence in case of a network failure.

An important goal of BGP is to provide loop-free paths to destinations, which does not necessarily mean the optimal paths. The reason for this is that BGP connects autonomous systems, and the protocol is therefore limited in its ability to influence decisions made in another AS. In BGP, it is more important to focus on policies rather than efficiency.

BGP Messages and Sessions

A router that implements BGP is called a BGP speaker. Two BGP speakers that run a BGP connection in order to exchange information are called BGP peers or BGP neighbors.

  • BGP peers in different autonomous systems use external BGP (eBGP).

  • BGP peers within the same autonomous system use internal BGP (iBGP).

BGP does not provide any sort of reliable transport itself but relies on TCP to carry information between BGP peers using the well-known TCP port number 179. Because BGP uses a reliable transport, the sender knows that the receiver has actually received the transmitted information. This capability makes periodic updates unnecessary.

BGP supports five different message formats:

  • The OPEN message is used to establish a BGP peering between two routers once a TCP session is in place. The OPEN message may contain additional information for negotiating optional parameters called capabilities (as defined in RFC 3392).

  • The UPDATE message is used to exchange routing information. In BGP, routing information is called Network Layer Reachability Information (NLRI), which has a set of path attributes attached to it. An UPDATE message may also contain a list of unfeasible routes that need to be withdrawn. Note, that BGP does not periodically re-advertise route information.

  • A NOTIFICATION message is sent when an error situation is detected. After a NOTIFICATION is sent, the BGP session is terminated immediately.

  • KEEPALIVE messages are exchanged periodically between peers to determine if peers are still reachable, because TCP does not provide any mechanism that a peer has become unreachable. The keepalive interval determines the period between two consecutive KEEPALIVE messages, while the hold time determines how long to wait for a KEEPALIVE message before considering the peer dead.

  • ROUTE-REFRESH messages as defined in RFC 2918 are used to request re-advertisements of NLRIs. This mechanism is usefully after modification of policies to avoid session flapping. Route refresh capability is an optional feature which is negotiated during OPEN message exchange.

Since BGP protocol is carried in a TCP session, BGP has no way to automatically detect neighbors, i.e., BGP peers must be manually configured and the peer IP address must be provided. In order to establish a BGP session, the remote peer must be reachable. Two routers should have only a single BGP session between them. The router ID values that are exchanged when the BGP session is established allow the BGP routers to detect when two parallel sessions exist.

BGP uses a finite state machine (FSM) to model the various states, transitions between states, and actions taken by a BGP speaker in the process of establishing and maintaining a BGP peering. The six states are useful to know for troubleshooting BGP sessions:

  • The Idle state is the initial state that indicates that the router is currently not attempting any connection establishments.

  • The Connect state indicates that a TCP connection is initiated with a peer.

  • The Active state is used after a TCP connection fails. In Active state, the BGP speaker retries the connection.

  • The OpenSent state is accomplished when an OPEN message is sent to the peer, which is the first BGP information that is sent. The BGP speaker is waiting for the remote router to respond with its own OPEN message.

  • The OpenConfirm state is used when the local peer receives an OPEN message from its peer. In this state, the router verifies all the parameters in the OPEN message. Once they are accepted, a KEEPALIVE message is sent.

  • The Established state is the final state, where peers exchange UPDATE and KEEPALIVE messages, i.e., this is the only state where BGP peers exchange routing information.

bgp fsm
Figure 2. BGP Finite State machine

Path Attributes and Route Selection Process

In contrast to other routing protocols that rely on a single metric, BGP supports multiple path attributes that are assigned to Network Layer Reachability Information and can be used for decision making. The path attributes are encoded in a TLV (type, length, value) format, which allows for the addition of new path attributes if needed. The support of certain path attributes can be negotiated between BGP peers using capability negotiation.

The corresponding type code also carries information about what to do with a path attribute if the peer does not recognize it. For a complete list of BGP attributes, see the IANA BGP Parameters.

The most important BGP attributes are the AS_PATH attribute and the nexthop attribute. The AS_PATH attribute represents a sequence of autonomous systems through which routing information has passed. Whenever a route is advertised between external BGP peers, the AS_PATH attribute is updated. The AS_PATH is also used for loop prevention, as a BGP peer does not accept updates where its own AS number is already in the AS_PATH attribute.

bgp as path
Figure 3. AS_PATH Attribute Update

The NEXT_HOP attribute identifies the IP address that a router should use to forward packets toward the destination that is announced in a BGP routing update. In most cases, the sending router sets the next-hop attribute to its own IP address.

As BGP has several path attributes that can serve as metric, we need to know in which order they are evaluated:

  1. NEXT_HOP: If the NEXT_HOP address is not resolvable, the route is skipped.

  2. Route Source: Always prefer routes from local which are locally originated over the received route. Note, this step is rarely used in the decision process.

  3. Local Preference: The BGP paths with highest local preference is chosen. As the name implies, the local preference determines the preferred exit point from the local AS point of view.

    The local preference is the only path attribute where the higher value is preferred.
  4. AS_PATH length: The BGP path with shortest AS_PATH length is preferred, i.e., the path which passes the smallest number of autonomous systems.

  5. Origin: Prefer the path with lowest origin code (IGP < EGP < INCOMPLETE). Note, this step is rarely used in the decision process.

  6. Multiexit Discriminator (MED): The path with lowest MED value is preferred. If there is no MED, then it is assumed to be 0.

  7. Route Type: eBGP is always preferred over iBGP.

  8. IGP metric: The path with lowest IGP metric to the BGP nexthop is preferred.

    If multiple paths are equal up to this point and the multipath option is enabled, then multiple nexthops will be installed into the FIB.
  9. CLUSTER_LIST: Prefer the BGP path with the shortest CLUSTER_LIST. If this attribute is absent, it is assumed to be 0.

  10. Router ID: Prefer the from the BGP peer with the lowest router ID.

  11. etc.

BGP Traffic Steering

From the BGP best path selection algorithm it is clear, that the local preference value is the best way to influence which path the traffic leaves the local autonomous system. If a prefix is learned via multiple paths, setting the local preference for one incoming update to a higher value than the others, will determine the exit point for this particular destination prefix.

bgp localpref
Figure 4. LOCAL_PREF Path Attribute

Forcing incoming traffic to take a particular path is not so easy as BGP has only limited ability to influence decisions made by other autonomous systems. The most appropriate way to manage incoming traffic is to modify the AS_PATH attribute, because the length of the AS_PATH attribute is used by the best path algorithm. To ensure that the AS_PATH loop checks are not violated, it is easiest to prepend updates that are sent along the less preferred path with your own AS number. Thus, a remote autonomous systems receives two updates with different path lengths. Note, that you can prepend the AS_PATH multiple times depending on the diameter you want to influence.

bgp aspath prepend
Figure 5. AS_PATH Prepend

An alternative method of influencing the inbound traffic path is to set the multiexit discriminator (MED) within UPDATES messages. By deafult, the MED is only evaluated for UPDATES that are received from the same peer AS, i.e., this option only works reliably, if there are multiple peering to the same peer autonomous system.

bgp med
Figure 6. Multiexit Discriminator

The BGP path attributes can changed using routing policies which will be discussed in Module Policies.