ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Link State Routing Jean Yves Le Boudec 2015 1
Contents 1. Link state 2. OSPF and Hierarchical routing with areas 3. Dynamic metrics and Braess paradox 2
1. Link State Routing Principle of link state routing each router keeps a topology database of whole network link state updates flooded, or multicast to all network routers compute their routing tables based on topology often uses Dijkstra s shortest path algorithm Used in OSPF (Open Shortest Path First, layer 2), IS IS (similar to OSPF) TRILL (Transparent Interconnection of Lots of Links), SPB (Shortest Path Bridging), layer 2 3
(a) Topology Database Synchronization Neighbouring nodes synchronize before starting any relationship Hello protocol; keep alive initial synchronization of database description of all links (no information yet) Once synchronized, a node accepts link state advertisements contain a sequence number, stored with record in the database only messages with new sequence number are accepted accepted messages are flooded to all neighbours sequence number prevents anomalies (loops or blackholes) 4
5 Example network Each router knows directly connected networks n3 n6 B D E n2 n4 n5 n7 A C F n1
Initial routing tables D E net type net type B net type n3 Ether n2 P-to-P n4 P-to-P B n3 n6 Ether n5 P-to-P D n6 n6 Ether n7 Ether E A n2 n4 n5 n7 net type n1 Ether n2 P-to-P A C F F net type n1 C net type n1 Ether n7 Ether n1 Ether n4 P-to-P n5 P-to-P 6
After Flooding The local metric information is flooded to all routers After convergence, all routers have the same information rtr net cost A n1 A n2 0 B n3 B n2 0 B n4 0 C n1 C n4 0 C n5 0 D n6 D n5 0 E n6 E n7 F n1 F n7 n2 A B n3 n1 n4 C D n5 n6 F E n7 7
(b) From Topology Database to Graph Arrows routers to nets with a given metric except point to point, stub, and external networks From nets to routers, metric = 0 stub network point to point link n3 0 B A 0 0 0 n6 0 0 0 D C n1 0 0 0 0 0 0 E n7 F 54 external network external network broadcast network 8
(b) Path Computation Performed locally, based on topology database Computes one or several best paths to every destination from this node Best Path = shortest for OSPF OSPF uses Dijkstra s shortest path the best known algorithm for centralized operation Paths are computed independently at every node synchronization of databases guarantees absence of persistent loops every node computes a shortest path tree rooted at self 9
Simplified graph Only arrows with metrics between routers Every node executes the shortest path computation on the graph same graph, but different sources 0 0 0 A C F
Dijkstra s Shortest Path Algorithm The nodes are 0...N and the algorithm computes best paths from node 0 c(i,j) is the cost of (i,j), pred(i) is the predecessor of node i on the tree M being built m(j) is the distance from node 0 to node j. m(0) = 0; M = {0}; for k=1 to N { find (i0, j0) that minimizes m(i) + c(i,j), with i in M, j not in M m(j0) = m(i0) + c(i0, j0) pred(j0) = i0 M = M {j0} } 11
12 Example: Dijkstra at A 0 B A D 0 0 C E F init: M = { A } step 1: i0=a j0=c m(c)= M = {A, C} m(a)=0 m(c)=
Next, whichnodeisaddedto M? 1. F 2. E 3. D 4. B 5. I don t know 0% 0% 0% 0% 0% 1. 2. 3. 4. 5. 13
14 Example: Dijkstra at A 0 B D 0 0 E i0=a j0=f m(f)= M = {A,C,F} A C F m(a)=0 m(c)= m(f)=
15 Example: Dijkstra at A m(f)=20 0 B D 0 0 E i0=f j0=e m(e)=20 M = {A,C,F,E} A C F m(a)=0 m(c)= m(f)=
16 Example: Dijkstra at A m(d)=30 m(e)=20 0 B D 0 0 E i0=e j0=d m(d)=40 M = {A,C,F,E,D} A C F m(a)=0 m(c)= m(f)=
17 Example: Dijkstra at A m(b)=0 m(d)=30 m(e)=20 0 B D 0 0 E i0=a j0=b m(b)=0 M = {A,C,F,E,D,B} A C F m(a)=0 m(c)= m(f)=
Routing table at A A net next n3 n6 n1 direct n2 direct n3 B n4 C n5 C n6 F n7 F n2 B n4 D n5 E n7 A C F n1 18
19 Dijkstra s Algorithm At C m(b)=0 B m(d)=30 D E m(f)=20 0 0 0 At A A m(a)= C m(c)=0 F m(f)=
20 Routing Tables at C n3 n6 B D E n2 n4 n5 n7 C A C F back n1 net next n1 direct n2 A n3 B n4 direct n5 direct n6 F n7 F
Changes to Topology Changes to topology (e.g. link failures) cause routers to send new Link State Advertisements All routers update their topology database and propagate the change to all their neighbours LSA sequence number is used to avoid loops in the propagation One router that has received an LSA already does not propagate it further Changes to topology database trigger re computation of shortest paths 21
Link State Routing can be used for non standard operations Example: assume you want to bridge VLANs across a campus One solution: tunnel MAC packets in IP Problem: automatic creation of tunnels VLAN2 VLAN1 R1 R3 R6 VLAN2 VLAN2 R2 R4 R7 VLAN1 R5 VLAN1 22
Can you imagine a solution using Link State Routing in R1, R2,? 1. Routers R1, R2 discover which VLAN is active on any of their ports and put this information in the topology database 2. Routers R1, R2 overhear all MAC source addresses and put the information in the topology database 3. Both of these solutions seem bad to me 4. I don t know 0% 0% 0% 0% 1. 2. 3. 4. 23
Solution 2 does not help since MAC addresses don t say in which VLAN the machine is 1 is a feasible solution: routers can create VLAN tunnels (MAC in IP!) e.g. using IP multicast This is what Cisco s TRILL does (with IS IS instead of OSPF) MAC in IP tunnels IEEE s SPB is similar (with MAC in MAC encapsulation). 24
LS: Summary All nodes compute their own topology database represents the whole network strongly synchronized All nodes compute their best path tree to all destinations Routing tables are built from the tree used for next hop routing only LS versus DV LS avoids convergence problems of DV supports flexible cost definitions; can be used for routing specific flows LS is much more code than DV but gives more flexibility 25
2. The OSPF Protocol and Hierarchical Routing OSPF (Open Shortest Path First) IETF standard for internal routing used in large networks (ISPs), in MPLS and in TRILL (Cisco VLAN interconnection) OSPF uses Link State protocol + Hierarchical 26
OSPF and hierarchical routing Why divide large networks? Cost of computing routing tables update when topology changes size of DB, update messages grows with the network size Use hierarchical routing to limit the scope of updates and computational overhead divide the network into several areas independent route computing in each area inject aggregated information on routes into other areas We explain hierarchical routing the OSPF way IS IS does things a bit differently 27
Hierarchical Routing An OSPF domain is configured in areas one backbone area (area 0) plus zero or several non backbone areas (areas numbered other than 0) All inter area traffic goes through area 0 strict hierarchy Inside one area: link state routing as seen earlier one topology database X1 per area X1 A1 area 1 X4 X3 area 0 area 2 B1 A2 X4 X3 B2 28
Principles Routing method used in the higher level: distance vector no problem with loops one backbone area Mapping of higher level nodes to lower level nodes area border routers (inter area routers) belong to both areas Inter level routing information summary link state advertisements (LSA) from other areas are injected into the local topology databases 29
Assume networks n1 and n2 become visible at time 0. Which are the topology databases at A1? n1, d=11 n2, d=17 n1, d=17 n2, d=11 D1 1 n1, d=29 n2, d=23 n1, d=23 n2, d=17 n1, d=29 n2, d=45 n1, d=23 n2, d=17 1 D2 D3 1. D1 2. D2 3. D3 4. I don t know 0% 0% 0% 0% 1. 2. 3. 4. 30
31 Solution area 0 A1 X1 6 X3 6 X2 6 X4 B1 n1 A2 area 1 6 X5 6 X6 6 area 2 B2 n2 area 1 topology database n1, d=29 n2, d=23 n1, d=23 n2, d=17 n1, d=11 n2, d=17 n1, d=17 n2, d=11 n1 n2 area 2 topology database area 0 topology database
Solution All routers in area 2 propagate the existence of n1 and n2, directly attached to B1 (resp. B2). See the topology database in area 2. Area border routers X4 and X6 belong to area 2, thus they can compute their distances to n1 and n2 Area border routers X4 and X6 inject their distances to n1 and n2 into the area 0 topology database (item 3 of the principle). The corresponding summary link state record is propagated to all routers of area 0. See now the topology database in area 0. All routers in area 0 can now compute their distance to n1 and n2, using their distances to X4 and X6, and using the principle of distance vector (item 1 of the principle). Do the computation for X3 and X5. Area border routers X3 and X5 inject their distances to n1 and n2 into the area 1 topology database (item 3 of the principle). See now the topology database in area 1. 32
Comments Distance vector computation causes none of the RIP problems strict hierarchy: no loop between areas External and summary LSA for all reachable networks are present in all topology databases of all areas most LSAs are external can be avoided in configuring some areas as terminal: use default entry to the backbone Area partitions require specific support partition of non backbone area is handled by having the area 0 topology database keep a map of all area connected components partition of backbone cannot be repaired; it must be avoided; can be handled by backup virtual area 0 links through non backbone area 33
Example of issue : partitioned backbone area 0 A1 X1 X3 6 X2 6 X4 B1 n1 A2 area 1 6 X5 6 X6 area 2 B2 n2 No connectivity between areas via backbone There is a route through Area 2 Virtual link X4 and X6 configure a virtual link through Area 2 virtual link entered into the database, metric = sum of links 34
3. Dynamic Metrics Does a routing protocol minimize network utility? 1. Yes, because it minimizes the cost to destination 2. Yes if TCP is used because it ensures fairness 3. No 4. I don t know 0% 0% 0% 0% 1. 2. 3. 4. 35
36 Solution We would need to define the utility. For example: flows Assume flows, all link capacities are 1. Shortest path routing gives rate to all sources. Deflection routing could deviate of the flows and give to all sources Answer 3
37 Dynamic Metrics Some proposed to use dynamic metrics for improving over shortest path high load on a link => high cost => link is less used This is used by EIGRP But there may be some issues Braess paradox
38 Least Delay Routing and Wardrop Equilibirum Assume all flows pick the route with shortest delay Assume parallel paths exist and flows can make use of them Eventually, there will be an equilibrium (called Wardrop Equilibrium ) such that delay is equal on all competing routes Delay = Link 1?? Link 2 Delay = Delay = (ms) traffic on this link (Gb/s) Delay = Link 3 Link 4
Which is a Wardrop Equilibrium for this Network? 4. None of the above Delay = Delay = Link 3 Link 1 Gb/s?? Delay = Link 4 Link 2 0% 0% 0% 0% 1. 2. 3. 4. Delay = 39
Solution 1, 5 Delay on route 13 = 50 11 61 ms Delay on route 24 = 50 11 5ms Not a Wardrop equilibrium Same for 5 1 3, 3 Delay on both routes is 50 therefore are equal It is a Wardrop equilibrium Delay on route 13 = Delay on route 24 83 ms 40
41 Now introduce link 5 Link 5 has delay function i.e. short delay and high capacity There are now 3 paths: 13, 154 and 24 Assume we start from previous equilibrium Delay = Is this a Wardrop equilibrium? Delay = Link 1 Gb/s Link 3 Delay = Link 5 Delay = Link 4 Link 2 Delay =
Is a Wardrop Equilibrium? 1. Yes 2. No 3. I don t know 0% 0% 0% 1. 2. 3. 42
43 Solution Assume we start from previous equilibrium; Delays are: route 13: 83 ms route 24: 83 ms route 154: 32+6+32=70ms Not a Wardrop equilibrium Delay = Link 1 Gb/s Delay = Link 3 Delay = Link 5 Delay = Link 4 Some traffic will move to route 154 Link 2 Delay =
What is the Wardrop Equilibrium now? delay equations total flow Solution : Gb/s Delay now is 92 ms on all routes 44
Braess Paradox With shortest delay routing: disable link 5: delay = 83 ms enable link 5 : delay = 92 ms Adding capacity made things worse This is called Braess paradox Shortest delay routing is not optimal 45
Optimal Routing One can change the objective of routing: instead of computing shortest paths, one could solve a global optimization problem maximizing a utility function: minimize total delay subject to flow constraints this is a well posed optimization problem the optimal solution depends on all flows but it can be implemented in a distributed algorithm similar to TCP congestion control ; see [BertsekasGallager92] This can be solved using an offline optimization procedure that computes optimal paths for all traffic flows and downloads the routes into all routers Can be done with SDN 46
47 Conclusion Link State Routing is an alternative to distance vector more complex allows more control over the chosen paths Shortest path routing may not be globally optimal and may need to be complemented with offline optimization methods
48