BGP Border Gateway Protocol (an introduction) dr. C. P. J. Koymans Informatics Institute University of Amsterdam (version 1.3, 2010/03/10 20:05:02) Monday, March 8, 2010 General ideas behind BGP Background Providers, Customers and Peers External and Internal BGP BGP information bases The BGP protocol BGP attributes BGP packets Traffic Engineering Outbound Traffic Engineering Inbound Traffic Engineering IBGP scaling BGP version 4 Autonomous system (AS) Border Gateway Protocol version 4 (BGP4) is specified in RFC 4271 is an inter-as routing protocol monopolises the Internet uses path vector routing which is inbetween distance vector and link state uses (often non-coordinated) routing policies which can be problematic for convergence An Autonomous System or AS is a connected group of networks and routers, representing some assigned set of IP prefixes, having a single, consistent routing policy, both internally and externally
Autonomous system illustration Providers, Customers and Peers Customers and Providers Autonomous Systems AS192 AS2503 IP traffic AS29077 Customer pays for access to the Internet 3 Slide courtesy Iljitsch van Beijnum Providers, Customers and Peers The Peering Relationship Providers, Customers and Peers Peering Provides Shortcuts Peers provide transit between their respective s Peers do not provide transit between s traffic allowed traffic NOT allowed Peers (often) do not exchange $$$ Peering also allows connectivity between the s of Tier 1 s.
Providers, Customers and Peers Treatment Providers, Customers and Peers: Import and Export Import Routes The order of preference for a route is Customers have highest preference Peers have the next highest preference Providers have the lowest preference Transit relationships are enforced by export filtering Do not advertise or routes to other s or s Do advertise all routes to s Do advertise routes to s and s route From From From route route From From From ISP route Providers, Customers and Peers: Import and Export EBGP and IBGP (1) Export Routes route To route To To route From To To ISP route filters block External BGP (EBGP) is used for BGP neighbors between different AS s to exchange prefixes and to implement policies Internal BGP (IBGP) is used for BGP neighbors within only one AS to distribute Internet prefixes across the backbone in order to create a consistent view among all entry/exit points to originate locally originated prefixes for instance for s that do not speak BGP
EBGP and IBGP (2) Routing Information Bases (RIBs) Routes imported from one IBGP are not distributed to another IBGP This prevents possible routing loops Loop detection is based on duplicates in AS paths, which is detected by EBGP between different AS s but not by IBGP inside the same AS Requires IBGP s to be configured as a full mesh Adj-RIB-In (one per ) Routes after input filtering Loc-RIB (one globally) Routes after best path selection Adj-RIB-Out (one per ) Routes after output filtering BGP protocol Some important BGP attributes Uses TCP over port 179 Exchanges NLRI Network Layer Reachability Information Prefixes that can or can no longer be reached through the router Accompanied by BGP attributes In order of path selection importance LOCAL_PREF (Local Preference) AS_PATH ORIGIN (Historical) MULTI_EXIT_DISC (MED; Multi-exit discriminator) And further... NEXT_HOP which must be reachable (directly or via IGP) except in the case of multi-hop BGP
Interaction betweed BGP and IGP BGP Next Hop Attribute Interaction betweed BGP and IGP Join EGP with IGP For Connectivity 12.125.133.90 AS 6431 AT&T Research AS 7018 AT&T Next Hop = 12.125.133.90 12.127.0.121 2654 RIPE NCC RIS project Next Hop = 12.127.0.121 Every time a route announcement crosses an AS boundary, the Next Hop attribute is changed to the IP address of the border router that announced the route. 53 10.10.10.10 Forwarding Table destination next hop 192.0.2.0/30 + EGP destination 10.10.10.10 next hop 192.0.2.1 Next Hop = 192.0.2.1 192.0.2.1 192.0.2.0/30 Forwarding Table destination next hop 10.10.10.10 192.0.2.0/30 10.10.10.10 Route selection Route selection BGP Route Processing In order of preference: 1. Highest Local Preference 2. Shortest AS Path 3. (Lowest Origin) 4. Lowest MED 5. Prefer EBGP over IBGP 6. Lowest IGP cost to BGP egress 7. Lowest Router ID Receive BGP Updates Apply Import Policies Open ended programming. Constrained only by vendor configuration language Apply Policy = filter routes & tweak attributes Based on Attribute Values Best Route Selection Best Routes Best Route Table Install forwarding Entries for best Routes. IP Forwarding Table Apply Policy = filter routes & tweak attributes Apply Export Policies Transmit BGP Updates 52
BGP attribute types LOCAL_PREF (Local Preference) Well-known mandatory ORIGIN, AS_PATH, NEXT_HOP Well-known discretionary LOCAL_PREF, ATOMIC_AGGREGATE Optional transitive COMMUNITIES, AGGREGATOR Optional non-transitive MULTI_EXIT_DISC Advertised within a single AS (via IBGP) Used to implement local policies Can depend on any locally available information, possibly learned outside BGP Default value is 100 Highest value wins AS_PATH Examples of AS_PATHs ASPATH Attribute AS Path = 1755 1239 7018 6341 129 Global Access Sequence of AS s (or sets of AS s) Used for loop detection and distance metric Shortest path wins Prepend own AS (possibly multiple times) in EBGP updates Leave unchanged in IBGP updates AS Path = 1239 7018 6341 239 Sprint AS Path = 6341 AS 6341 AT&T Research Prefix Originated 755 Ebone AS Path = 7018 6341 AS7018 AT&T AS Path = 1129 1755 1239 7018 6341 AS Path = 7018 6341 2654 RIPE NCC RIS project AS Path = 3549 7018 6341 AS 3549 Global Crossing 64
Examples of AS_PATHs Shorter Doesn t Always Mean Shorter Examples of AS_PATHs Interdomain Loop Prevention In fairness: could you do this right and still scale? Exporting internal state would dramatically increase global instability and amount of routing state AS 3 Mr. BGP says that path 4 1 is better than path 3 2 1 Duh! AS 4 BGP at AS YYY will never accept a route with ASPATH containing YYY. AS 7018 Don t Accept! 12.22.0.0/16 ASPATH = 1 333 7018 877 66 Examples of AS_PATHs Traffic Often Follows ASPATH Examples of AS_PATHs But It Might Not AS 3 ASPATH = 3 2 1 AS 4 ASPATH = 1 135.207.44.0/25 ASPATH = 5 filters all subnets with masks longer than /24 AS 3 ASPATH = 3 2 1 AS 4 IP Packet Dest = 135.207.44.66 AS 5 135.207.44.0/25 IP Packet Dest = 135.207.44.66 From AS 4, it may look like this packet will take path 3 2 1, but it actually takes path 3 2 5
ORIGIN MULTI_EXIT_DISC (Multi-Exit Discriminator or MED) The ORIGIN attribute tells where the route (NLRI) originated Interior to the originating AS: ORIGIN = 0 Via the EGP protocol (historic): ORIGIN = 1 Via some other means: ORIGIN = 2 A lower ORIGIN wins The MED (or metric, formerly INTER_AS_METRIC) is meant to be advertised between neighboring AS s (via EBGP) Some implementations carry MED on by IBGP (hot potato versus cold potato) The MED is non-transitive (is not transferred into a third AS) A lower MED wins The default MED is 0 (lowest possible value) Some implementations choose the highest possible value BGP packet header BGP header fields 0 15 16 23 24 31 BGP header fields Length Marker Type Marker Length Type All 1 s (compatibility) Total length no padding, including header 1: OPEN 2: UPDATE 3: NOTIFICATION 4: KEEPALIVE Remember that BGP packets are in fact part of a TCP-stream
BGP OPEN message OPEN message fields 0 7 8 15 16 31 Version My Autonomous System Opt Parm Len Hold Time BGP Identifier Optional Parameters OPEN message fields Version 4 My Autonomous System Sender s AS Hold Time Liveness detection BGP Identifier Sender s identifying IP address Opt Parm Length Length of parameter field Optional Parameters TLV-encoded options One interesting parameter is the Capabilities Optional Parameter, which defines (among others) the Route Refresh Capability. (variable) BGP KEEPALIVE message KEEPALIVE message fields This page intentionally left blank. http://www.this-page-intentionally-left-blank.org/ KEEPALIVE message fields :)
BGP NOTIFICATION message NOTIFICATION message fields 0 7 8 15 16 31 Error code Error subcode Data (variable) NOTIFICATION message fields Error code Error subcode Data 1: Message Header Error 2: OPEN Error 3: UPDATE Error 4: Hold Timer Expired... Depends on error code Depends on error code and subcode BGP UPDATE message UPDATE message fields 0 15 16 31 Unfeasible Routes Length Total Path Attribute Length Withdrawn Routes (variable length) Path Attributes (variable length) UPDATE message fields Unfeasible Routes Length Length of Withdrawn Routes Withdrawn Routes List of prefixes 1 Total Path Attribute Length Length of Path Attributes Path Attributes TLV-encoded attributes Network Layer Reachability Information List of NLRI prefixes Network Layer Reachability Information (variable length) 1 A prefix is specified by its length and just enough bytes of the network IP address to cover this length
Tweaking your policies Outbound Traffic Engineering Tweak Tweak Tweak For inbound traffic Filter outbound routes Tweak attributes on outbound routes in the hope of influencing your neighbor s best route selection For outbound traffic Filter inbound routes Tweak attributes on inbound routes to influence best route selection inbound traffic outbound traffic outbound routes inbound routes This works by manipulating incoming routes Changing local preference Extending inbound AS paths Manipulating the metric (MED), for instance by using inbound communities It is relatively simple (and based on your own policy) In general, an AS has more control over outbound traffic Manipulating local preference Manipulating local preference So Many Choices LOCAL PREFERENCE AS 4 Local preference used ONLY in ibgp AS 4 local pref = 80 Frank s Internet Barn AS 3 local pref = 90 AS 3 local pref = 100 Which route should Frank pick to 13.13.0.0./16? 13.13.0.0/16 60 Higher Local preference values are more preferred 13.13.0.0/16 61
Manipulating local preference Implementing Backup Links with Local Preference (Outbound Traffic) Manipulating local preference Multihomed Backups (Outbound Traffic) AS 3 primary link backup link primary link backup link Set Local Pref = 100 for all routes from AS 65000 Set Local Pref = 50 for all routes from Forces outbound traffic to take primary link, unless link is down. We ll talk about inbound traffic soon 70 Set Local Pref = 100 for all routes from Set Local Pref = 50 for all routes from AS 3 Forces outbound traffic to take primary link, unless link is down. 71 Inbound Traffic Engineering This works by manipulating outgoing routes Extending outbound AS_PATHs is a traditional hack Manipulating the metric (MED) is the traditional way Setting outbound communities is the more modern approach, where agreements with your neighbors are specified Inbound is more complex than outbound Inbound depends on neighbor s policy Last resort method: announcing more specific routes (often a bad idea) Manipulating AS_PATHs Shedding Inbound Traffic with ASPATH Padding. Yes, this is a Glorious Hack ASPATH = 2 primary backup ASPATH = 2 2 2 Padding will (usually) force inbound traffic from to take primary link 72
Manipulating AS_PATHs But Padding Does Not Always Work Manipulating AS_PATHs COMMUNITY Attribute to the Rescue! AS 3 AS 3 AS 3: normal local pref is 100, local pref is 90 ASPATH = 2 ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ASPATH = 2 ASPATH = 2 COMMUNITY = 3:70 primary backup AS 3 will send traffic on backup link because it prefers routes and local preference is considered before ASPATH length! Padding in this way is often used as a form of load 73 balancing primary backup Customer import policy at AS 3: If 3:90 in COMMUNITY then set local preference to 90 If 3:80 in COMMUNITY then set local preference to 80 If 3:70 in COMMUNITY then set local preference to 70 74 Manipulating MEDs Manipulating MEDs Hot Potato Routing: Go for the Closest Egress Point Getting Burned by the Hot Potato 192.44.78.0/24 High bandwidth Provider backbone 2865 17 Heavy Content Web Farm egress 1 egress 2 SFF NYC 15 56 IGP distances Low bandwidth backbone 15 56 San Diego This Router has two BGP routes to 192.44.78.0/24. Hot potato: get traffic off of your network as Soon as possible. Go for egress 1! 75 Many s want their to carry the bits! tiny http request huge http reply 76
Manipulating MEDs COMMUNITIES Cold Potato Routing with MEDs (Multi-Exit Discriminator Attribute) Prefer lower MED values 192.44.78.0/24 MED = 15 2865 15 56 192.44.78.0/24 17 192.44.78.0/24 MED = 56 Heavy Content Web Farm This means that MEDs must be considered BEFORE IGP distance! An optional transitive attribute A community can be used to communicate preferred treatment of a route Some communities have a well-known semantics NO_EXPORT: don t export beyond current AS (or confederation) NO_ADVERTISE: don t export at all Note1 : some s will not listen to MEDs Note2 : MEDs need not be tied to IGP distance 77 Use of communities How Can Routes be Colored? BGP Communities! Use of communities Communities Example A community value is 32 bits By convention, first 16 bits is ASN indicating who is giving it an interpretation community number Used for signally within and between ASes Very powerful BECAUSE it has no (predefined) meaning Community Attribute = a list of community values. (So one route can belong to multiple communities) 1:100 Customer routes 1:200 Peer routes 1:300 Provider Routes Import To Customers 1:100, 1:200, 1:300 To Peers 1:100 To Providers 1:100 Export RFC 1997 (August 1996) Two reserved communities no_export = 0xFFFFFF01: don t export out of AS no_advertise 0xFFFFFF02: don t pass to BGP neighbors 58
Route Reflectors Route reflectors illustration Full Mesh Specified in RFC 4456 A route reflector is a kind of super IBGP A route reflector has clients with which it s via IBGP and for which it reflects (transitively) routes A route reflector is part of a full mesh of other route reflectors and non-clients Slide courtesy Iljitsch van Beijnum 39 Route reflectors illustration Confederations Route Reflection Specified in RFC 5065 Use multiple private AS s inside your main AS Talk to the outside world with your main AS, hiding the private AS s Talk to the inside world as if using EBGP and IBGP for the different private AS s This needs special AS_PATH segment types Slide courtesy Iljitsch van Beijnum 40
Confederations illustration Loose Topics Confederations Using IBGP as an IGP Multi-hop BGP MBGP (Multiprotocol Extensions for BGP) Creative use of Communities Slide courtesy Iljitsch van Beijnum 41