cs/ee 143 Communication Networks Chapter 5 Routing Text: Walrand & Parakh, 2010 Steven Low CMS, EE, Caltech
Warning These notes are not self-contained, probably not understandable, unless you also were in the lecture They are supplement to not replacement for class attendance
Lecture outline Inter-domain routing n BGP Intra-domain routing n Shortest path algortihms Coding n FEC, network coding
Putting it all together Figure 5.19: Figure for Routing Problem 3. [W&P 2010] initially unconnected
What is routing? Choose red or blue. Internet A B
How to route? Two layers of routing: 1. Choose which AS? - BGP 2. How to route inside an AS? - OSPF A Internet Autonomy system (AS) e.g., AT&T, Verizon, MIT. B
Why two layers? o Different objectives n Choose AS: special policies n Inside AS: minimize delay, # hops o Simplify routing n Choose AS: ignore details inside AS n Inside AS: only details inside AS
Inter-domain routing: BGP Peering relation: A-B, B-C A, B, C carry each other s traffic free of charge B only advertises B to A and to C A does not know how to reach C through this. A must have transit relation with anther ISP (not shown here) that carries its traffic to C. Transit relation: A-B, B-C Customer-provider relation, e.g., B is provider for A and for C. A (C) pays B for carrying to/from A (C). B advertises {B,C} to A and {A, B} to C so that all ISP s know how to reach all destinations.
Inter-domain routing: BGP A typical configuration
Inter-domain routing: BGP BGP is policy-based routing o Generally not shortest-path o Other factors are generally more important in determining an AS-path than performance n Peering agreement n Pricing (revenue/cost) with next hop n Reliability, security, political reasons o Can lead to oscillation and bad performance
Inter-domain routing: BGP Example BGP policy at Berkeley: 1. If possible, avoid AT&T 2. Choose path with smallest #hops 3. Alphabetical Berkeley decision: use path Sprint-Verizon-MIT to reach MIT
Border Gateway Protocol (BGP) Every AS keeps a list of (Destination, Path) pairs & policies. Policy: avoid AT&T. How to reach MIT from Berkeley? Verizon AT&T Sprint Berkeley (MIT, Verizon---MIT) (MIT, AT&T---MIT) (MIT, NA) (MIT, NA)
Border Gateway Protocol (BGP) Every AS keeps a list of (Destination, Path) pairs & policies. Policy: avoid AT&T. How to reach MIT from Berkeley? Verizon AT&T Sprint Berkeley (MIT, Verizon---MIT) (MIT, AT&T---MIT) (MIT, Sprint---Verizon---MIT) (MIT, Berkeley---AT&T---MIT) (MIT, Verizon---AT&T---MIT) (MIT, AT&T---Verizon---MIT) (MIT, Sprint---AT&T---MIT)
Border Gateway Protocol (BGP) Every AS keeps a list of (Destination, Path) pairs & policies. Policy: avoid AT&T. How to reach MIT from Berkeley? Verizon AT&T Sprint Berkeley (MIT, Verizon---MIT) (MIT, AT&T---MIT) (MIT, Sprint---Verizon---MIT) (MIT, Berkeley---AT&T---MIT) (MIT, Berkeley---Sprint---Verizon---MIT)
Border Gateway Protocol (BGP) In BGP, each AS o Announces itself to other ASes and which ASes it can reach o Obtains ASes reachability info from neighboring Ases o Propagate reachability info to all routers internal to the AS o Determine good routes to ASes based on reachability info and AS policy
BGP: potential oscillation Example BGP policy to reach D: 1. Prefer 2-hop path to 1-hop 2. Avoid 3-hop paths Oscillation: Every node will alternate between choosing an 1- hop path and 2-hop path
Some questions Q1: Why not 3-level, or N-level, routing? Q2: How can a source ensure that its packets follow the inter-domain path it wants? Q3: In BGP, can one prevent a domain from lying and funneling all traffic through itself in order eavesdrop?
Lecture outline Inter-domain routing n BGP Intra-domain routing n Shortest path algortihms Coding n FEC, network coding
Shortest-path algorithm Input: graph G = ( V, E), link costs d ij ( d ij = if ( i, j) E) Execution: algorithm run at each node Output: n n Dijkstra: shortest-path tree rooted at the node Bellman-Ford: next hop to all destinations from the node (entries in forwarding table) Notation: n D x (i) : min cost to reach node i from x n pred x (i) : parent node of i (for Dijkstra) n next x (i) : next hop to i from x (for Bellman-Ford)
Dijkstra algorithm Init: Dx ( i) = d xi, Dx ( x) = 0, R = { x}, pred x (i) = null Each node x sends d xi to all other nodes i R V while * { i = argmin R R i R do D { i * } x ( i) Run at each source node x for all j N( i * ) \ R { if D then x ( j) > Dx ( i * ) + d * i j { D x( j) Dx ( i * ) + d * i j * pred(j) = i }if }for Which step requires global info? }while
Dijkstra algorithm Init: Dx ( i) = d xi, Dx ( x) = 0, R = { x}, pred x (i) = null Each node x sends d xi to all other nodes i R V while * { i = argmin R R i R do D { i * } x ( i) Run at each source node x for all j N( i * ) \ R { if D then x ( j) > Dx ( i * ) + d * i j { D x( j) Dx ( i * ) + d * i j * pred(j) = i }if }for Which step requires global info? }while
Dijkstra algorithm B A F C E D 1 4 1 1 2 1 4 3 P(1) 1 B A F C E D 1 4 1 1 2 1 4 3 P(2) 1 B A F C E D 1 4 1 1 2 1 4 3 P(3) 1 2 B A F C E D 1 4 1 1 2 1 4 3 P(4) 1 2 3 B A F C E D 1 4 1 1 2 1 4 3 P(5) 1 2 3 3 B A F C E D 1 4 1 1 2 1 4 3 P(6): Final 1 2 3 3 5
Bellman-Ford algorithm Init: Dx ( i) = d xi, Dx ( x) = 0, R = { x}, pred x (i) = null Each node x sends distance vector ( D x ( i), i V ) to all its neighbors whenever ( i), i V changes ( ) D x do (execute when link cost changes or on receipt of a DV from neighbor) { for all destination nodes i { }for } until no change D x (i) := min next x (i) = j N (x) ( d xj + D j (i)) j * := argmin j N (i) ( d xj + D j (i)) Run at each source node x
Bellman-Ford algorithm A 1 4 B 1 1 F 3 1 C E 1 2 4 D Consider the calculations at all nodes to reach node D Every node has access to distance estimates from neighbors to D Assume synchronous operation P(1) iteration D A (D) next A (D) D B (D) next B (D) D C (D) next C (D) D E (D) next E (D) D F (D) next F (D) 0 Inf - Inf - 2 D 4 D Inf - 1 Inf - 5 C 2 D 3 C 5 E 2 6 B 4 E 2 D 3 C 4 E 3 5 B 4 E 2 D 3 C 4 E 4 5 B 4 E 2 D 3 C 4 E
Compare Dijsktra & BF o Message exchange n n Dijkstra: every node sends only its incident link costs to all other nodes. This requires O( V E ) messages. BF: every node sends only to its neighbors least-cost estimates from itself to all other nodes o Speed of convergence n n Disjstra: above implementation takes O( V 2 ); can be reduced using heap BF: can converge slowly and have routing loops during transient; count-to-infinity problem (can be solved using poisoned reverse) o No clear winner n n n Both are used on Internet RIP: distance-vector protocol OSPF: link-state protocol (meant to be successor to RIP)
Count-to-infinity problem Example Link between B & C fails A and B will not realize it, a routing route is created and their cost estimate to C keeps going up A solution: poisoned reverse: instead of telling B its true cost (2) to reach C, A tells B that its cost to reach C is infinity because A uses B to reach C.
Compare Dijsktra & BF o Dijkstra algorithm n Needs global information (link-state alg) n Each node broadcasts link-state packets to all other nodes in network n Each node executes Dijkstra alg to calculate shortest paths to all other nodes n After k iteration, shortest paths to k destinations are known (and they are the k shortest paths among the shortest paths to all nodes) n Terminates after N-1 iterations (N = #nodes)
Compare Dijsktra & BF o Bellman-Ford algorithm n Only needs local information (distance-vector alg) n Each node exchanges with neighbors the vector of distances from itself to all other nodes n Each node then updates the next hop and associated distance to all other nodes using Bellman-Ford (DP) equation n Decentralized, asynchronous, distributed
Other questions o How can routers trust each other? o How to deal with non-convergence in DV protocol? How often is oscillation encountered in practice? o Router R1 can route a pkt to host A through R2 or R3; R2 can route through R4 or R5 and has chosen R4. But R1 prefers R5 to R2 to R4. What happens?
Other questions o What is timescale for routing update? o What are major impediments to making significant changes to routing architecture? Would a bio-inspired routing system feasible? o Can network coding & FEC be combined?
Putting it all together Figure 5.19: Figure for Routing Problem 3. [W&P 2010] initially unconnected
Putting it all together Thenetworkaddressesofnodesaregivenby<AS>.<Network>.0.<node>, e.g., node A has the address AS1.E1.0.A, The bridge IDs satisfy B1 < B2 < B3, HisnotconnectedtoAS2.E5forpart(a), TheBGPSpeakersusetheleast-next-hop-costpolicyforrouting(i.e.,amongalternative paths to the destination AS, choose the one that has the least cost on the first hop), and The network topology shown has been stable for a long enough time to allow all the routing algorithms to converge and all the bridges to learn where to forward each packet. Figure 5.19: Figure for Routing Problem 3. [W&P 2010]
Putting it all together 1. How to route G à A? 2. As soon as H is added, D tries to send a packet to H. What happens? 3. If AS2.R2 goes down, what will be the routing changes? later goes down Figure 5.19: Figure for Routing Problem 3. [W&P 2010] initially unconnected
1. compute spanning tree Figure 5.19: Figure for Routing Problem 3. [W&P 2010]
2. compute intra-as routing Figure 5.19: Figure for Routing Problem 3. [W&P 2010]
3. compute inter-as routing 1. How to route G à A? Figure 5.19: Figure for Routing Problem 3. [W&P 2010]
3. compute inter-as routing 1. How to route G à A? Does A à G follow the same path? Figure 5.19: Figure for Routing Problem 3. [W&P 2010]
4. Address resolution protocol 1. How to route G à A? 2. As soon as H is added, D tries to send a packet to H. What happens? 3. If AS2.R2 goes down, what will be the routing changes? Figure 5.19: Figure for Routing Problem 3. [W&P 2010] initially unconnected
4. Address resolution protocol Packets from D can be delivered to subnet AS2.B1 based on IP address of H AS2.B1 does not know H AS2.B1 uses ARP to find H s MAC address Use STP to forward pkts to H Figure 5.19: Figure for Routing Problem 3. [W&P 2010] initially unconnected
Example: H1 wants to send packet to H2 Network Link [all, gateway e1, who is IP2?] Ethernet switch Link layer on H1 broadcasts a message (ARP query) on its layer 2 network asking for the MAC address corresponding to IP2
Example: H1 wants to send packet to H2 Network Link [all, gateway e1, who is IP2?] Ethernet switch Network Link [e1, e2, I am IP2] Link layer on H2 responds to the ARP query with its MAC address
Example: H1 wants to send packet to H2 Network Link [e2, gateway e1,[ip1, IP2, X]] Ethernet switch Network Link Once the link layer on H1 knows e2, it can now send the original message
Example: H1 wants to send packet to H2 Network Link [e2, gateway e1,[ip1, IP2, X]] Ethernet switch Network Link [IP1, IP2, X] [e2, e1,[ip1, IP2, X]] Link layer on H2 delivers the packet to the network layer on H2
5. re-compute routing table 1. How to route G à A? 2. As soon as H is added, D tries to send a packet to H. What happens? 3. If AS2.R2 goes down, what will be the routing changes? goes down Figure 5.19: Figure for Routing Problem 3. [W&P 2010]
5. re-compute routing tables Failure detected by AS2.R1 and AS2.R3; update routing tables (intra-as) Failure detected by border gateway in AS5 BGP re-computes The path between AS2 and AS5 will be changed goes down Figure 5.19: Figure for Routing Problem 3. [W&P 2010]
Lecture outline Inter-domain routing n BGP Intra-domain routing n Shortest path algortihms Coding n FEC, network coding
FEC: packet erasure code Recover from packet loss Coding n Input: n packets n Output: m packets o C k P 1,, P n C 1,,C m, m > n { } = bit-by-bit XOR of a random subset of C k := P i1 P i2 P ij k P 1,, P n o Header of C k specifies the subset used to generate C k
FEC: packet erasure code Decoding n If C j = P i for some i, then C k := C k P i for all pkts C k that contains P i n Remove from the collection of rec d pkts C j n Repeat until all P 1,, P n have been decoded n If at one step, there is no C j { P 1,, P n } then decoding fails
FEC: example P1 P2 P3 P4 C1 C2 C3 C4 C5 C6 C7 Decoding: received pkts C = { C 1,C 3,C 5,C 6 } C = { C 1,C 3,C 5,C 6 } P = { } : received pkt C 1 = P 1 C 3 = P 1 P 2 P 3 C 5 = P 4 C 6 = P 3 P 4 C 3 C 3 C 1 = P 2 P 3 C = { C 3,C 5,C 6 } P { P 1 }
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 1,C 3,C 5,C 6 } P = { } C 1 = P 1 C 3 = P 1 P 2 P 3 C 5 = P 4 C 6 = P 3 P 4 C 3 C 3 C 1 = P 2 P 3 C = { C 3,C 5,C 6 } P { P 1 }
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 1,C 3,C 5,C 6 } P = { } C 1 = P 1 C 3 = P 1 P 2 P 3 C 5 = P 4 C 6 = P 3 P 4 C 3 C 3 C 1 = P 2 P 3 C = { C 3,C 5,C 6 } P { P 1 }
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3,C 5,C 6 } C 3 = P 2 P 3 P = { P 1 } C 5 = P 4 C 6 = P 3 P 4 C = { C 3,C 6 } P { P 1, P 4 } C 6 C 6 C 5 = P 3
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3,C 5,C 6 } C 3 = P 2 P 3 P = { P 1 } C 5 = P 4 C 6 = P 3 P 4 C = { C 3,C 6 } P { P 1, P 4 } C 6 C 6 C 5 = P 3
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3,C 5,C 6 } C 3 = P 2 P 3 P = { P 1 } C 5 = P 4 C 6 = P 3 P 4 C = { C 3,C 6 } P { P 1, P 4 } C 6 C 6 C 5 = P 3
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3,C 6 } P = { P 1, P 4 } C 3 = P 2 P 3 C 6 = P 3 C 3 C 3 C 6 = P 2 C = { C 6 } P { P 1, P 3, P 4 }
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3,C 6 } P = { P 1, P 4 } C 3 = P 2 P 3 C 6 = P 3 C 3 C 3 C 6 = P 2 C = { C 6 } P { P 1, P 3, P 4 }
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3,C 6 } P = { P 1, P 4 } C 3 = P 2 P 3 C 6 = P 3 C 3 C 3 C 6 = P 2 C = { C 3 } P { P 1, P 3, P 4 }
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3 } P = { P 1, P 2, P 4 } C 3 = P 3 C = { } P { P 1, P 2, P 3, P 4 }
FEC: example P1 P2 P3 P4 : received pkt C1 C2 C3 C4 C5 C6 C7 Decoding: C = { C 3 } P = { P 1, P 2, P 4 } C 3 = P 3 C = { } P { P 1, P 2, P 3, P 4 }
Network coding: example link rate = R on every link multicast to both Y & Z T Y S b 1 b 2 b 1 b 2 W X U b 1 b b 1 2 b 1 b 1 Z throughput = 1.5R T b 1 b 1 b 2 Y S b 1 b 2 b 1 b 2 W X b 1 b 2 b 1 b 2 U Z throughput = 2R b 2
Network coding: example X and Y want to exchange A & B (1) (2) A B X Z Y A B (3) A B Without network coding, needs 4 pkt xmissions With network coding, needs 3 pkt xmissions