0/6/08 Interpla between routing and forwarding Routing Algorithms Link State Distance Vector BGP routing routing algorithm local forwarding table header value output link 000 00 0 00 value in arriving packet s header 0 Graph abstraction Graph abstraction: costs v w u z x Graph: G = (N,E) N = set of routers = { u, v, w, x,, z } E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,), (w,), (w,z), (,z) } Remark: Graph abstraction is useful in other network contexts Example: PP, where N is set of peers and E is set of TCP connections u c(x,x ) = cost of link (x,x ) v x w z - e.g., c(w,z) = cost could alwas be, or inversel related to bandwidth, or inversel related to congestion Cost of path (x, x, x,, x p ) = c(x,x ) + c(x,x ) + + c(x p-,x p ) Question: What s the least-cost path between u and z? Routing algorithm: find good paths source to destination router. 4
0/6/08 A Link-State Routing Algorithm Dijsktra s Algorithm Dijkstra s algorithm net topolog, link costs known to all nodes o accomplished via link state broadcast o all nodes have same info computes least cost paths one node ( source ) to all other nodes o gives forwarding table for that node iterative: after k iterations, know least cost path to k dest. s Notation: c(x,): link cost node x to ; = if not direct neighbors D(v): current value of cost of path source to dest. v p(v): predecessor node along path source to v N': set of nodes whose least cost path definitivel known Initialization: N' = {u} for all nodes v 4 if v adjacent to u then D(v) = c(u,v) 6 else D(v) = 7 8 Loop 9 find w not in N' such that D(w) is a minimum 0 add w to N' update D(v) for all v adjacent to w and not in N' : D(v) = min( D(v), D(w) + c(w,v) ) /* new cost to v is either old cost to v or known 4 shortest path cost to w plus cost w to v */ until all nodes in N' 6 Dijkstra s algorithm: example Step 0 4 N' u ux ux uxv uxvw uxvwz D(v),p(v),u,u,u D(w),p(w),u 4,x,, D(x),p(x),u D(),p(),x D(z),p(z) 4, 4, 4, u v x w z 7 8
0/6/08 Distance Vector Algorithm Distance vector algorithm D x () = estimate of least cost x to Distance vector: D x = [D x (): є N ] Node x knows cost to each neighbor v: c(x,v) Node x maintains D x Node x also maintains its neighbors distance vectors For each neighbor v, x maintains D v = [D v (): є N ] Basic idea: Each node periodicall sends its own distance vector estimate to neighbors When node a node x receives new DV estimate neighbor, it updates its own DV using B-F equation: D x () min v {c(x,v) + D v ()} for each node N Under some conditions, the estimate D x () converge the actual least cost d x () 9 0 DV example Distance Vector Algorithm u v x w z Clearl, d v (z) =, d x (z) =, d w (z) = d u (z) = min { c(u,v) + d v (z), c(u,x) + d x (z), c(u,w) + d w (z) } = min { +, +, + } = 4 Node that achieves minimum is next hop in shortest path forwarding table Iterative, asnchronous: each local iteration caused b: local link cost change DV update message neighbor Distributed: each node notifies neighbors when its DV changes o neighbors then notif their neighbors if necessar Each node: wait for (change in local link cost of msg neighbor) recompute estimates if DV to an dest has changed, notif neighbors
0/6/08 D x () = min{c(x,) + D (), c(x,z) + D z ()} D x (z) = min{c(x,) + = min{+0, 7+} = D (z), c(x,z) + D z (z)} node x table = min{+, 7+0} = cost to cost to cost to x 0 7 x 0 x 0 0 0 z z 7 0 z 0 node table cost to cost to cost to x x 0 7 x 0 x z 0 0 0 7 z z 7 0 z 0 node z table cost to cost to cost to x z 7 0 x z 0 7 0 0 x z 0 0 0 time Problem: Count-to-Infinit With distance vector routing, good news travels fast, but bad news travels slowl When a router goes down, it takes can take a reall long time before all the other routers become aware of it 4 Count-to-Infinit Comparison of LS and DV algorithms A B C D E 4 4 4 4 4 4 6 6 7 6 7 6 etc to infinit Initiall After exchange After exchanges After exchanges After 4 exchanges After exchanges Message complexit LS: with n nodes, E links, O(nE) msgs sent DV: exchange between neighbors onl Speed of Convergence LS: O(n ) algorithm requires O(nE) msgs o ma have oscillations DV: convergence time varies o ma have routing loops o count-to-infinit problem Robustness: what happens if router malfunctions? LS: o node can advertise incorrect link cost o each node computes onl its own table DV: o DV node can advertise incorrect path cost o each node s table used b others error propagate thru network 6 4
0/6/08 Interconnected ASes c a b AS a c d b Intra-AS Routing algorithm AS Forwarding table Inter-AS Routing algorithm c a b AS Forwarding table is configured b both intra- and inter-as routing algorithm o Intra-AS sets entries for internal dests o Inter-AS & Intra-As sets entries for external dests 7 Inter-AS tasks Obtain reachabilit information neighboring AS(s) Propagate this info to all routers within the AS All Internet gatewa routers run a protocol called BGPv4 (we will talk about this soon) c a b AS a c d b AS c a b AS 8 Internet inter-as routing: BGP BGP (Border Gatewa Protocol): the de facto standard BGP provides each AS a means to:. Obtain subnet reachabilit information neighboring ASs.. Propagate the reachabilit information to all routers internal to the AS.. Determine good routes to subnets based on reachabilit information and polic. Allows a subnet to advertise its existence to rest of the Internet: I am here 9 BGP basics Pairs of routers (BGP peers) exchange routing info over semipermanent TCP conctns: BGP sessions Note that BGP sessions do not correspond to phsical links. When AS advertises a prefix to AS, AS is promising it will forward an datagrams destined to that prefix towards the prefix. AS can aggregate prefixes in its advertisement c a b AS a AS c d b c a b AS ebgp session ibgp session 0
0/6/08 BGP protocol (Chapter 4.6.) OPEN Message BGP uses TCP as its transport protocol, on port 79. On connection start, BGP peers exchange complete copies of their routing tables, which can be quite large. However, onl changes (deltas) are then exchanged, which makes long running BGP sessions more efficient than shorter ones. Four Basic messages: Open: Establishes BGP session (uses TCP port #79) Notification: Report unusual conditions Update: Inform neighbor of new routes that become active Inform neighbor of old routes that become inactive Keepalive: Inform neighbor that connection is still viable During session establishment, two BGP speakers exchange their AS numbers BGP identifiers (usuall one of the router s IP addresses) A BGP speaker has option to refuse a session Select the value of the hold timer: maximum time to wait to hear something other end before assuming session is down. authentication information (optional) http://www.antc.uoregon.edu/route-views/ NOTIFICATION and KEEPALIVE Messages NOTIFICATION Indicates an error terminates the TCP session gives receiver an indication of wh BGP session terminated Examples: header errors, hold timer expir, bad peer AS, bad BGP identifier, malformed attribute list, missing required attribute, AS routing loop, etc. KEEPALIVE protocol requires some data to be sent periodicall. If no UPDATE to send within the specified time period, then send KEEPALIVE message to assure partner that connection still alive UPDATE Message used to either advertise and/or withdraw previousl announced prefixes path attributes: list of attributes that pertain to ALL the prefixes in the Reachabilit Info field FORMAT: Withdrawn routes length ( octets) Withdrawn routes (variable length) Total path attributes length ( octets) Path Attributes (variable length) (NLRI) Reachabilit Information (variable length) 4 6
0/6/08 BGP update message Advertising a prefix Withdrawn Routes: Length field Btes Withdrawn route list Path attributes: Length field btes Path attributes list NLRI list : a list of entries Length field ( bte), Prefix (variable length) Path attributes appl to all the prefixes in the NLRI list When a router advertises a prefix to one of its BGP neighbors: information is valid until first router explicitl advertises that the information is no longer valid BGP does not require routing information to be refreshed if node A advertises a path for a prefix to node B, then node B can be sure node A is using that path itself to reach the destination. NLRI Network Laer Reachabilit Information 6 BGP attributes PATH ATTRIBUTES BGP protocol announcements carries with it several attributes Attribute describes characteristics of a prefix BGP chooses a single path for a given prefix based on attributes (can choose to ignore!) BGP alwas announces the best path to neighbors Attributes ORIGIN AS_PATH NEXT_HOP 4 MED LOCAL_PREF 6 WEIGHT 7 COMMUNITY 8 AGGREGATOR ORIGIN(TYPE CODE=): Who originated the announcement? Where was a prefix injected into BGP? Manuall configured, directl connected, b other intra-routing protocols IGP, EGP, default incomplete (learnt some other means) AS-PATH (TYPE CODE =) a list of AS s through which the announcement for a prefix has passed each AS prepends its AS # to the AS-PATH attribute when forwarding an announcement useful to detect and prevent loops AS length can be used to select among routes unless a LOCAL PREF attribute overrides AS 8.9..0/6 AS, AS, AS AS AS AS, AS AS 7 8 7
0/6/08 Attribute: Local Preference (tpe code = ) Use of local pref Used to indicate preference among multiple paths for the same prefix anwhere in the internet. The higher the value the more it is preferred Default value is 00 Local to the AS (nontransitive) Often used to select a specific exit point for outbound traffic Override influence of AS path length 98..4./0 AS AS AS AS4 BGP table at AS4: Destination AS Path Local Pref 98..4/0 AS AS 00 98..4/0 AS AS 00.0.0.0/8 OC Local-pref=00 Local-pref=00.0.0.0/8 T 9 0 Attribute: NEXT HOP (code=) Use of next hop When AS boundaries are Crossed, the next hop field is Changed and replaced with the Address of the border router ebgp address of external neighbor ibgp nexthop ebgp E? 8.64.. 8.64.9. A 8.64.. ibgp ibgp D 98.6.4.0/ C ibgp 8.64.. 8.64.. B Destination 98.6.4.0/ 8.64.8. 8.64.8. ebgp Destination 98.6.4.0/ BGP Table at Router A: 98.6.4.0/ Nexthop 8.64.8. BGP Table at Router C: Nexthop Next hop=8.64.8. for 98.6.8/4 AS 8.64.8. 8.64.8. 8.64.8. 8.64.8.4 Destination Next hop? 8.64.8. 98.6.8/4 e-bgp AS 98.6.8/4 Next hop=8.64.8. 8
0/6/08 Attribute: Multi-Exit Discriminator (MED) (code=4) when AS s interconnected via or more links AS path length are same AS announcing prefix, sets MED value enables AS to indicate its preference (lower MED is better) AS receiving prefix uses MED to select link a wa to specif how close a prefix is to the link it is announced on AS Link B Link A MED=00 MED=00 AS AS AS4 Med values the same AS are compared A lower MED value is preferred MED values exchanged between ASs- non-transitive Ref: BGP Tutorial Sam Halabi- Cisco 4 Use of MED Path attributes & BGP routes When advertising a prefix, advert includes BGP attributes. prefix + attributes = route When gatewa router receives route advert, uses import polic to accept/decline. Export polic is defined b a series of rules.0.0.0/8 MED = 0 7.6.0.0/ MED = 00.0.0.0/8 MED = 00 7.6.0.0/ MED = 0 L L Router C 6 9
0/6/08 BGP Decision Process. Choose route with highest LOCAL-PREF. If have more than route, select route with shortest AS-PATH. If have more than route, select according to lowest ORIGIN tpe where IGP (i) < E-BGP (e) < default (?) 4. If have more than route, select route with lowest MED value. Select e-bgp learned over i-bgp learned path 6. Select min cost path to NEXT HOP using IGP metrics (lowest IGP cost to BGP egress) 7. If have multiple internal paths, use lowest BGP Router ID to break the tie. 7 0