ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE. Jean-Yves Le Boudec Fall Contents

Bridging ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Jean-Yves Le Boudec Fall 0 Algorhyme I think that I shall never see a graph more lovely than a tree. A tree whose crucial property is loop-free connectivity. A tree that must be sure to span so packet can reach every LAN. First, the root must be selected. By ID, it is elected. Least-cost paths from root are traced. In the tree, these paths are placed. A mesh is made by folks like me, then bridges find a spanning tree. Radia Perlman (inventor of STP) Contents. Specification of the Spanning Tree Protocol (STP). A best path problem 3. The Spanning Tree Algorithm 4. STP in practice

. Transparent Bridging Bridges are intermediate systems that forward MAC frames to destinations based on MAC addresses Transparent bridging (the main method used today) means that there is no difference from a protocol viewpoint for end-systems, whether there is a bridge or not but there is a performance difference Transparent bridges learn by sniffing, this requires that there is no loop port Bridge A port Repeater D port 3 B C Forwarding Table Dest Port MAC Nb addr A B C 3 D 3 A mesh is made by folks like me, then bridges find a spanning tree. What does the Spanning Tree Protocol do? Prevent loops in the active topology Decide which ports should be blocked or opened ports that are allowed to forward frames are said to be in the forwarding state or called forwarding ports Adapt to changes in the physical topology is plug and play 4

Specification of the Spanning Tree Protocol We now specify the STP method (ie what it does, in more details than before, not how) There are many ways to build a tree on a graph. Minimum Spanning Tree (Kruskal or Prim s algorithms) The STP chose to use the set of shortest paths towards some selected vertex. Each bridge has a bridge label, based on MAC address + configurable offset. Bridge with smallest label is selected and called root. Each LAN between bridges has a cost, by default, decreasing function of bit rate: Port Type Duplex Cost 00BASE-TX / 00BASE-FX (VLT) Full 5 Half 0BASE-T Full Half 700 What: The STP computes a tree of shortest paths to the bridge with the smallest label 5 Specification of STP (cont d) STP gives a role to ports of bridges Root ports One per bridge := port towards root along shortest path in case of equal costs, lowest port id chosen Designated ports On every LAN ( collision domain), choose one designated bridge all ports on LAN for which the bridge is designated are designated ports Designated bridge one per LAN defined by : it has the shortest path to root possibly root itself Ports other than root or designated are blocking 3

9% 9% 9% 9% % 3 4 5 7 8 9 0 % 3% 9% % %. b, r, 3r. b, d, 3d 3. b, r, 3d 4. b, d, 3r 5. d, r, 3b. d, b, 3d 7. d, b, 3b 8. r, b, 3d 9. None of the above 0. I don t know What is the role of each port of B90 (b=blocking, r=root, d=designated)? 7 cost = 3 Solution X cost = 3 T B84 B8 X B4 3 X B90 cost = Z back cost = 3 X Y cost = 3 B9 3 B99 cost = root port designated port X blocking port Forwarding Tables: B4 X YZ 3T B84 XYZT B9 XZT Y B8 XYZT B90 XZT 3Y B99 XZT Y 8 4

What does STP produce on this network of bridges? The loop is broken Paths are not optimal X All frames go through the spanning tree Less efficient than routing A B 9. A Best Path Problem STP uses a variant of the Bellman-Ford algorithm (see dv.ppt), which we call the Bellman-Ford algorithm for Bridges To fully appreciate the algorithm, let us first transform the original problem into a shortest path problem 0 5

A Special Path Algebra Consider a directed graph with edge (=link) costs c(i, j) > 0 and c(i,j) = when i and j are not connected. Every vertex (=node), say, also has a label l Definition: an link attribute is A(i,j) := [l(j), c(i,j)] = [label of end, cost] comparison of attributes lexicographic: [l, c] [l c ] iff [(l < l ) or (l = l and c c )] a total order on N x [0, ] concatenation of attributes: [l,c] [l, c ] = [min(l, l ), c+c ] the attribute of a path i i ik is the concatenation of the attributes of the links i.e.: [minimum label, sum of costs] A path p is better than p if attribute of p attribute of p Which path is best?. 50 40. 0 0 40 3. 0 0 4. 50 40 0 label cost 0 0 3 3 50 40 % 4% % 9% 3 4

Solutions Attributes of the following paths 50 40 [, ] 0 0 40 [0, 9] 0 0 [0, 7] 50 40 0 [0, 4] The best path is the last. 0 0 3 3 50 40 3 The STP problem seen as a best path computation problem Assume the graph is fully connected; all vertex labels are different; all link costs are > 0 A path p that starts at is best (among all paths that start at i) iff It goes through the vertex i 0 that has the smallest label in the graph (the minimum label is reached at only one vertex, by hypothesis) It stops at i 0 It is a shortest path from i to i 0 Thus: the best paths in this graph are the shortest paths to the node with the smallest label Thus: STP problem = finding best paths from self to anywhere in this special path algebra 4 7

3. The STP Algorithm The STP algorithm is fully distributed It is simpler to understand first the centralized version It is a Bellman-Ford algorithm! 5 A Centralized Bellman-Ford Algorithm What: Given a directed graph with links attributes as above, computes one tree of best paths from any node to any node let A(i,j):= attribute of link (i,j) =[l(j), c(i,j)] How: Define p k (i) as the attribute of the best path from i to anywhere in at most k hops. Theorem. If the graph is fully connected, the algorithm stops at the latest at k=number of vertices ; at the end, p k (i) is the attribute of a best path. A best path from is obtained by letting pred(i) = the index (j or i) that achieves the minimum in () If the min is achieved by the term [l(i),0] then pred[i]=i; this happens only when vertex i has the smallest label 8

The algorithm is the same as the classical Bellman-Ford algorithm with the following modifications Exotic algebra instead of usual algebra: costs are replaced by attributes; addition of costs is replaced by concatenation ( ) and comparison by the lexicographic order. All paths instead of paths to a specific node: add a virtual node 0 such that A(i,0)=[l(i), 0] and A(0,i)=[, ]. Apply the classical Bellman-Ford to compute the shortest (i.e. best) paths from all nodes i to node 0. Remove the final edge from these paths and obtain the best paths we are looking for. Indeed, with these modifications, the classical Bellman-Ford becomes One can easily see that () is equivalent to (), given that we set p k- (0) to [, 0], and that the impact of the initialization for p 0 (i) disappears after one step. Note: in the algorithm, min is the lexicographic min (derived from the comparison of attributes) The proof of the algorithm is similar to the classical case. It relies on the fact that is associative. 7 A Run of this Centralized Bellman Ford Algorithm l, 0 k=0 l 0, 0 k= l 0, l 0, l 0, 0 l 0, 0 l 50, 0 l 40, 0 k= l 0, 7 l 0, l 0, 3 l 40, k=3 l 0, 4 l 0, l 0, 0 l 0, 0 l 0, 3 l 0, l 0, 3 l 0, 8 9

A Run of this Centralized Bellman Ford Algorithm 0 0 3 3 50 40 p k (i): (format: (label, cost)) k \ i 0 0 40 50 0 0,0 0,0,0 40,0 50,0 0,0 0, 0, 0, 0,3 0,0 0, 0,7 0, 0,3 3 0,0 0, 0,4 0, 0,3 i 0 0 40 50 pred(i) 0 0 50 0 40 9 If we change the initial conditions, will the centralized algorithm converge to the correct value?. Yes. No 3. Sometimes yes, sometimes no, depending on the initial contidions 4. I don t know 09 0 0 3 3 50 40 p k (i): (format: label, cost) k \ i 0 0 40 50 0 0,0 0,0 09, 40,0 50,0 3 34% 8% 34% 3% 3 4 0 0

Impact of Initial Conditions Unlike the classical Bellman-Ford algorithm, this one may fail if initial conditions are not as expected Example: If initial conditions say that some node has a best path with label 09, every node will eventually believe the best label is 09 09 p k (i): (format: label, cost) k \ i 0 0 40 50 0 0 3 3 50 40 0 0,0 0,0 09, 40,0 50,0 0,0 09,8 0, 0, 09,3 09,9 09, 09,4 09,4 09, 3 09, 09,7 09, 09,9 09,0 4 09,8 09, 09, 09,0 09,0 5 09, 09,9 09, 09,0 09,0 09,0 09,3 09, 09, 09, The Bellman-Ford Algorithm for Bridges is sensitive to initial conditions Theorem If the initial conditions in the centralized Bellman-Ford Algorithm for Bridges satisfy: i: p 0 (i)=(m i, c i ) with m i min j l(j) the algorithm converges to the correct value else the algorithm diverges lim k p k (i)=(m 0, ) where m 0 =min i m i Proof: first show that the label converges to the minimum of all initial conditions (it can only decrease). Then use the property of Bellman-Ford in the usual algebra (see chapter distance vector ) Comment: the convergence may be much longer than with the initial conditions in theorem All-path variant of Bellman Ford Note that there is a condition on the initial label, not on the initial cost.

Example Q. write p k (i), pred(i) and draw the spanning tree, with initial conditions as shown. The dotted link does not exist in the current configuration. It existed before, and explains why node starts with these initial conditions. Does the algorithm converge to the correct values? 0 0 3 3 50 40 p k (i): (format: (label, cost)) k \ i 0 0 40 50 0 0,0 0,0 0, 40,0 50,0 3 3 Example Q. write p k (i), pred(i) and draw the spanning tree, with initial conditions as shown. The dotted link does not exist in the current configuration. It existed before, and explains why node starts with these initial conditions. A. The algorithm converges since the initial labels are not below the smallest one. p k (i): (format: (label, cost)) 0 0 k \ i 0 0 40 50 3 0 0,0 0,0 0, 40,0 50,0 3 0,0 0, 0, 0, 0, 50 40 0,0 0, 0,3 0, 0,3 3 0,0 0, 0,4 0, 0,3 4 0,0 0, 0,4 0, 0,3 4

3% % 0% 0% 0% 3 4 5 7 8 9 0 8% % % 3% 3% Distributed Bellman-Ford The Bellman-Ford Algorithm for Bridges can be distributed: It is the algorithm used by STP Distributed Bellman-Ford Algorithm, Bridges node maintains an estimate of its best path attribute node also keeps a record of latest values for all neighbors initial conditions are l,0 from time to time, i sends its value to all neighbours when receives an updated value from neighbor, node recomputes : eq () min A i, j q j pred(i) is set to a value of j that achieves the min in eq() 5. A and C. A and D 3. B and C 4. B and D 5. A, B and C. A, B and D 7. A, C and D 8. B, C and D 9. All 0. I don t know Say what is true A. The Distributed Bellman-Ford algorithm for bridges keeps a record of the most recent updates received from all neighbors B. The Distributed Bellman-Ford algorithm for routers keeps a record of the most recent updates received from all neighbors C. The Distributed Bellman-Ford algorithm for bridges works regardless of initial conditions D. The Distributed Bellman-Ford algorithm for routers works regardless of initial conditions 3

Sample Run of the Distributed Bellman- Ford Algorithm for Bridges A possible run : 0 0 3 3 50 40 0 -> 0 50 -> 0 0 -> 50 0 -> 50 -> 40 0 -> 40 0 -> 40 40 -> 50 50 -> 0 50 -> link breaks 50 -> 0 -> 50 50 -> i 0 0 40 50 0,0 0,0,0 40,0 50,0 0, 0, 0,4 0,7 0,5 0,4 0, 0,3 0,4 50 does as if received q(40)= (, ); pred(50)=40 thus 50 does q(50)=(50,0); similarly 40 does a new computation but this does not change 40 0, 50,0,0 0,4 0,5 7 The Distributed Bellman-Ford Algorithm for Bridges may need to be reset Like the centralized algorithm, the distributed algorithm is robust to changes in configuration as long as the node with the smallest label (called root bridge ) is still present and reachable from all bridges. if this is not true, the algorithm does not converge to a true value. It needs to be reset by some additional mechanism. This is solved in STP by root monitoring: root refreshes validity of STP by periodically sending a refresh message every HelloTime (s) the refresh message is propagated along the spanning tree a bridge that does not receive refresh message for MaxAge restarts STP basic procedure from fresh initial conditions (= reset) Q. Compare to the classical distributed Bellman-Ford algorithm (for routers) A. It does not need to be reset, since it always converges to the true value. 8 4

Standardized by IEEE 80.D All bridges run it 4. STP in practice Bridges send to other bridges BPDU (Bridge PDUs) Implements the Distributed Bellman Ford Algorithm for Bridges Bridge keeps best values received on all ports Bridge periodically sends its values to downstream neighbours, and to all whenever a loss of root occurs 9 Topology changes Topology changes occur due to changes in configuration failures, recoveries It changes occur, the behaviour depends on whether the root bridge is still reachable from all bridges if so, let distributed Bellman Ford do the job else, root monitoring kicks in 5

A Topology Change That is Handled by Bellman-Ford B99 crashes; focus on B90 B90 detects absence of B99 (absence of hello, or other mechanism); this is equivalent to receiving (in Bellman-Ford s algorithm) a state information: from B99: best attribute (, ) B90 compares all values received so far on all ports Port : best = B4, 3; port =, port 3: best = B90, Bellman ford finds new best value: B4, 3 on port cost = 3 X cost = 3 T B84 cost = 3 B8 X B4 B90 X3 cost = Z Y cost = 3 B9 U B99 cost = A Topology Change That is not Handled By Bellman-Ford Q: If B4 dies, what happens? A: root monitoring at all bridges detect that B4 does not send a refresh message anymore all bridges start the STP procedure from fresh initial conditions and converge to a new spanning tree rooted at B8 3

Other Bells and Whistles Bridges wait for some time after any topology change before declaring the port as «forwarding» (5--45 secs) To avoid loops during transients Optimizations of STP (called «Rapid STP», RSTP) avoid the timers in some frequent cases Detects that the change cannot cause a loop See «Rapid Spanning Tree» on www.cisco.com 33 Conclusions Bridges use STP to remove loops from the active topology An example of bio-like software All bridges have the same code, only one becomes root No central intervention, plug and play The Bellman Ford algorithm of Bridges repairs any failures except loss of root Handled by a separate keep-alife mechanism; loss of root causes a global reset RSTP is an optimization to speed up impact of topology changes The active topology is the same as with STP To know more: Radia Perlman, «Interconnections, Bridges and Routers» CISCO RSTP White Paper 34 7