CS118 Discussion Week 7 Taqi
Outline Hints for project 2 Lecture review: routing
About Course Project 2 Please implement byte-stream reliable data transfer Cwnd is in unit of bytes, not packets How to realize timeout? Option 1: select( ) with countdown timer Option 2: alarm( ) + signal ( ) Should we implement RTT estimation? No, if you do not plan to realize congestion control algorithms
Routing Intra-domain routing V.S. inter-domain routing Performance V.S. policy Scalability: hierarchical routing Distance-vector routing V.S. link-state routing Fully-distributed algorithm V.S. decentralized algorithm Unicast V.S. multicast
Distance-vector Routing Foundation: Bellman-Ford algorithm Count-to-infinity problem When will this problem happen? Split Horizon: if B learns about D from C, B should not tell C that B can reach D Poison Reverse: if A routes through C to reach D, A tells C that A s distance to D is infinite Enhancement: BGP s path-vector routing All problems solved?
Link-state Routing Foundation: Dijkstra algorithm How to calculate A s FIB? What if link failure happens? Is forwarding loop completely avoided?
Multicast Reverse Path Forwarding Why does it avoid loops? For each node, how to forward A s multicast packets?
MPLS: Multi-Protocol Label Switching
Routing VS. Switching Circuit-switch network: connection should be established before forwarding the data At each hop, the circuit path is marked as a label Data forwarding is based on label: O(1) complexity Vulnerable to link/node failures Packet-switched network: connectionless, packets are forwarded based on IP header Longest prefix matching: O(N) complexity Robust to link/node failures Can we take advantage of both, while preventing any vulnerabilities?
Multi-Protocol Label Switching Idea: switching technique into connectionless network In IP routing table, each entry is associated with a label Neighboring routers exchange labels, and forms an index of next hop s forwarding table When forwarding the packet, lookup the index only Only the first hop performs longest prefix matching
Exchanging labels Between Routers R1 0 Prefix Next 10.1.1 0 10.3.3 0 R2 1 0 Prefix Next 10.1.1 1 10.3.3 0 R3 R4 10.1.1/24 10.3.3/24
Exchanging labels Between Routers R1 0 Prefix Next Label 10.1.1 0 15 10.3.3 0 16 Store remote labels Exchange label between neighbors R2 1 0 Label Prefix Next 15 10.1.1 1 16 10.3.3 0 Allocate label for each entry R3 R4 10.1.1/24 10.3.3/24
Forwarding Packets 10.1.1.1 1 10.1.1/24 R1 0 R2 0 R3 10.3.3/24 Prefix Next Label 10.1.1 0 15 10.3.3 0 16 Label Prefix Next 15 10.1.1 1 16 10.3.3 0 R4 Longest Prefix Matching
Forwarding Packets 10.1.1.1 (15) 1 10.1.1/24 R1 0 R2 0 R3 10.3.3/24 Prefix Next Label 10.1.1 0 15 10.3.3 0 16 Label Prefix Next 15 10.1.1 1 16 10.3.3 0 R4 Label-based Switching
Other Benefits In IP table, the IP addresses are aggregated based on prefixes In MPLS, the IP addresses can be aggregated in more flexible ways How: assign same label for IP addresses in the same category Benefit: better management (e.g., offer different levels of performance using different labels)
Lecture 13: Distance-vector Routing" CSE 123: Computer Networks Alex C. Snoeren HW 3 due now!
Lecture 13 Overview" Distance vector Assume each router knows its own address and cost to reach each of its directly connected neighbors Bellman-Ford algorithm Distributed route computation using only neighbor s info Mitigating loops Split horizon and posion reverse CSE 123 Lecture 13: Distance-vector Routing 2
Bellman-Ford Algorithm" Define distances at each node X d x (y) = cost of least-cost path from X to Y Update distances based on neighbors d x (y) = min {c(x,v) + d v (y)} over all neighbors V u! 3 2 v! w! 1 4 1 2 x! 5 s! 3 4 y! 1 z! t! d u (z) = min{c(u,v) + d v (z), c(u,w) + d w (z)} CSE 123 Lecture 13: Distance-vector Routing 3
Distance Vector Algorithm" Iterative, asynchronous: each local iteration caused by: Local link cost change Distance vector update message from neighbor Each node: wait for (change in local link cost or message from neighbor) Distributed: Each node notifies neighbors only when its DV changes Neighbors then notify their neighbors if necessary recompute estimates if distance to any destination has changed, notify neighbors CSE 123 Lecture 13: Distance-vector Routing 4
Step-by-Step" c(x,v) = cost for direct link from x to v Node x maintains costs of direct links c(x,v) D x (y) = estimate of least cost from x to y Node x maintains distance vector D x = [D x (y): y є N ] Node x maintains its neighbors distance vectors For each neighbor v, x maintains D v = [D v (y): y є N ] Each node v periodically sends D v to its neighbors And neighbors update their own distance vectors D x (y) min v {c(x,v) + D v (y)} for each node y N CSE 123 Lecture 13: Distance-vector Routing 5
Example: Initial State" 7 B 1 C Info at node Distance to Node A B C D E A 0 7 1 A 8 2 B 7 0 1 8 1 E 2 D C 1 0 2 D 2 0 2 E 1 8 2 0 CSE 123 Lecture 13: Distance-vector Routing 6
D sends vector to E! A 7 1 B E 8 Im 2 from C, 0 from D and 2 from E 1 2 C D 2 Info at node Distance to Node A B C D E A 0 7 1 B 7 0 1 8 C 1 0 2 D 2 0 2 E 1 8 4 2 0 D is 2 away, 2+2<, so best path to C is 4 CSE 123 Lecture 13: Distance-vector Routing 7
B sends vector to A! A 1 7 B E 8 Im 7 from A, 0 from B, 1 from C & 8 from E 1 2 C D B is 7 away, 1+7< so best path to C is 8 2 Info at node Distance to Node A B C D E A 0 7 8 1 B 7 0 1 8 C 1 0 2 D 2 0 2 E 1 8 4 2 0 CSE 123 Lecture 13: Distance-vector Routing 8
E sends vector to A! E is 1 away, 4+1<8 so C is 5 away, 1+2< so D is 3 away A 7 1 B E 8 1 2 C D 2 Info at node Distance to Node A B C D E A 0 7 5 3 1 B 7 0 1 8 C 1 0 2 D 2 0 2 E 1 8 4 2 0 Im 1 from A, 8 from B, 4 from C, 2 from D & 0 from E CSE 123 Lecture 13: Distance-vector Routing 9
until Convergence" 7 B 1 C Info at node Distance to Node A B C D E A 0 6 5 3 1 A 8 2 B 6 0 1 3 5 1 E 2 D C 5 1 0 2 4 D 3 3 2 0 2 E 1 5 4 2 0 CSE 123 Lecture 13: Distance-vector Routing 10
Node Bʼs distance vectors" Next hop 7 B 1 C Dest A E C A 7 9 6 A 8 2 C 12 12 1 D 10 10 3 1 E 2 D E 8 8 5 CSE 123 Lecture 13: Distance-vector Routing 11
Handling Link Failure" A 7 1 A marks distance to E as, and tells B E marks distance to A as, and tells B and D B and D recompute routes and tell C, E and E etc until converge B E 8 1 2 C D 2 Info at node Distance to Node A B C D E A 0 7 8 10 12 B 7 0 1 3 5 C 8 1 0 2 4 D 10 3 2 0 2 E 12 5 4 2 0 CSE 123 Lecture 13: Distance-vector Routing 12
Counting to Infinity" Distance to C 3 2 1 A B 2 C 3 4 1 A B Update 3 C 5 4 1 A B Update 4 C Etc CSE 123 Lecture 13: Distance-vector Routing 13
Why so High?" Updates dont contain enough information Cant totally order bad news above good news B accepts As path to C that is implicitly through B! Aside: this also causes delays in convergence even when it doesnt count to infinity CSE 123 Lecture 13: Distance-vector Routing 14
Mitigation Strategies" Hold downs As metric increases, delay propagating information Limitation: Delays convergence Loop avoidance Full path information in route advertisement Explicit queries for loops (e.g. DUAL) Split horizon Never advertise a destination through its next hop» A doesnt advertise C to B Poison reverse: Send negative information when advertising a destination through its next hop» A advertises C to B with a metric of» Limitation: Only works for loops of size 2 CSE 123 Lecture 13: Distance-vector Routing 15
Poison Reverse Example" If Z routes through Y to get to X: Z tells Y its (Zs) distance to X is infinite (so Y wont route to X via Z) 60 X 4 Y 50 1 Z CSE 123 Lecture 13: Distance-vector Routing 16
Split Horizon Limitations" B A tells B & C that D is unreachable 1 1 B computes new route through C A 1 C Tells C that D is unreachable (poison reverse) Tells A it has path of cost 3 (split horizon doesn t apply) 1 A computes new route through B A tells C that D is now reachable D Etc CSE 123 Lecture 13: Distance-vector Routing 17
Routing Information Protocol" DV protocol with hop count as metric Infinity value is 16 hops; limits network size Includes split horizon with poison reverse Routers send vectors every 30 seconds With triggered updates for link failures Time-out in 180 seconds to detect failures RIPv1 specified in RFC1058 www.ietf.org/rfc/rfc1058.txt RIPv2 (adds authentication etc.) in RFC1388 www.ietf.org/rfc/rfc1388.txt CSE 123 Lecture 13: Distance-vector Routing 18
Link-state vs. Distance-vector" Message complexity LS: with n nodes, E links, O(nE) messages sent DV: exchange between neighbors only Speed of Convergence LS: relatively fast DV: convergence time varies May be routing loops Count-to-infinity problem Robustness: what happens if router malfunctions? LS: DV: Node can advertise incorrect link cost Each node computes only its own table Node can advertise incorrect path cost Each nodes table used by others (error propagates) CSE 123 Lecture 13: Distance-vector Routing 19
Routing so far " Shortest-path routing Metric-based, using link weights Routers share a common view of path goodness As such, commonly used inside an organization RIP and OSPF are mostly used as intradomain protocols But the Internet is a network of networks How to stitch the many networks together? When networks may not have common goals and may not want to share information CSE 123 Lecture 13: Distance-vector Routing 20
For next time " Read Ch. 4.3.3-4 in P&D Keep moving on Project 2 CSE 123 Lecture 13: Distance-vector Routing 21
Chapter 13 Routing Protocols (RIP, OSPF, BGP) INTERIOR AND EXTERIOR ROUTING RIP OSPF BGP 1 The McGraw-Hill Companies, Inc., 2000 1 Adapted for use at JMU by Mohamed Aboutabl, 2003
Introduction Packets may pass through several networks on their way to destination Each network carries a price tag, or a metric The metric of a network may be: constant (i.e. each network costs one hop) Service type-dependent (the cost of the network depends on what service the packet needs: e.g. throughput, delay,.. etc.) Policy-dependent: a policy defines what paths should, or should not, be followed. The router uses a routing table to determine the path Static vs. Dynamic routing tables. 2 The McGraw-Hill Companies, Inc., 2000 2 Adapted for use at JMU by Mohamed Aboutabl, 2003
13.1 Interior & Exterior Routing Autonomous system: a group of networks and routers under authority of a single administrator 3 The McGraw-Hill Companies, Inc., 2000 3 Adapted for use at JMU by Mohamed Aboutabl, 2003
Popular routing protocols 4 The McGraw-Hill Companies, Inc., 2000 4 Adapted for use at JMU by Mohamed Aboutabl, 2003
13.2 RIP: Routing Information Protocol Distance Vector Routing Share the most you know about the entire autonomous system Share with all your direct neighbors, and them only Share periodically, e.g. every 30 seconds Destination Hop Count Next Hop Other Info 163.5.0.0 7 172.6.23.4 197.5.13.0 5 176.3.6.17 189.45.0.0 4 200.5.1.6 5 The McGraw-Hill Companies, Inc., 2000 5 Adapted for use at JMU by Mohamed Aboutabl, 2003
RIP Updating Algorithm Receive: a response RIP message 1. Add one to the hop count for each advertised destination 2. Repeat for each advertised destination If ( destination is not in my routing table) Add the destination to my table Else If ( next-hop field is the same) Replace existing entry with the new advertised one Else if (advertised hop-count after incrementing- is smaller) Replace existing entry with the new advertised one 6 The McGraw-Hill Companies, Inc., 2000 6 Adapted for use at JMU by Mohamed Aboutabl, 2003
Example of updating a routing table Receive: a response RIP message 1. Add one to the hop count for each advertised destination 2. Repeat for each advertised destination If ( destination is not in my routing table) Add the destination to my table Else If ( next-hop field is the same) Replace existing entry with the new advertised one Else if (advertised hop-count after incrementing- is smaller) Replace existing entry with the new advertised one 7 The McGraw-Hill Companies, Inc., 2000 7 Adapted for use at JMU by Mohamed Aboutabl, 2003
Initial routing tables in a small autonomous system Configuration File Directly attached networks Hop-count = 1 8 The McGraw-Hill Companies, Inc., 2000 8 Adapted for use at JMU by Mohamed Aboutabl, 2003
Final routing tables for the previous autonomous system RIP messages are exchanged Routing tables are updated 9 The McGraw-Hill Companies, Inc., 2000 9 Adapted for use at JMU by Mohamed Aboutabl, 2003
RIP message format 1: Request 2: Response 1 or 2 Address Family Identifier 2: TCP/IP family 12 Bytes up to 25 AFIs Hops from advertising router to dest. network 10 The McGraw-Hill Companies, Inc., 2000 10 Adapted for use at JMU by Mohamed Aboutabl, 2003
RIP Request Messages Sent by a router when booted, or when an entry times-out May request updates for ALL networks, or specific one(s) RIP Response Messages Solicited responding to a previous request Unsolicited (sent periodically to all neighbors) 11 The McGraw-Hill Companies, Inc., 2000 11 Adapted for use at JMU by Mohamed Aboutabl, 2003
Example 1 What is the periodic response sent by router R1? Assume R1 knows about the whole autonomous system. 12 The McGraw-Hill Companies, Inc., 2000 12 Adapted for use at JMU by Mohamed Aboutabl, 2003
RIP Timers Periodic Timer ( 25 < random < 35): controls advertising of update messages. There ONE such timer Expiration Timers: governs route validity. Reset upon receipt of an update. If it ever expires, destination is considered unreachable. Yet, entry is not removed from table, it continues to be advertised with hop count = 16 ( i.e. infinity) Garbage Collection Timers: Reset to 120sec when a route is invalidated. If it expires, the route entry is completely removed from routing table 13 The McGraw-Hill Companies, Inc., 2000 13 Adapted for use at JMU by Mohamed Aboutabl, 2003
Example 2 A routing table has 20 entries. It does not receive information about five routes for 200 seconds. How many timers are running at this time? Solution The timers are listed below: Periodic timer: 1 Expiration timer: 20-5 = 15 Garbage collection timer: 5 14 The McGraw-Hill Companies, Inc., 2000 14 Adapted for use at JMU by Mohamed Aboutabl, 2003
RIP Problems: 1) Slow convergence Network topology changes propagate slowly (avg. 15 sec per hop) Solution: Limit the diameter of an autonomous system to 15 hops. 15 The McGraw-Hill Companies, Inc., 2000 15 Adapted for use at JMU by Mohamed Aboutabl, 2003
RIP Problems: 2) Instability Net1 is disconnected from Router A Router A updates its hop count to 16 Router A waits for 30 seconds before sending it advertisement Router B advertises Net1 (with hop-count =2) to A before A has a chance to advertise that Net1 is disconnected A is fooled and sets its Hop-count to 2+1=3 16 The McGraw-Hill Companies, Inc., 2000 16 Adapted for use at JMU by Mohamed Aboutabl, 2003
Remedies for RIP Instability Triggered Update: Send an immediate update (with hop count =16) whenever a network becomes unreachable, otherwise send periodic updates. Split Horizons: Never sent same information back to the interface it came from 17 The McGraw-Hill Companies, Inc., 2000 17 Adapted for use at JMU by Mohamed Aboutabl, 2003
Remedies for RIP Instability: Poison reverse A variation of Split Horizon. 18 The McGraw-Hill Companies, Inc., 2000 18 Adapted for use at JMU by Mohamed Aboutabl, 2003
RIP-v2 Format: Same length as in RIP-v1 AS number RIP version 2 supports CIDR. RIP messages are encapsulated in a UDP datagram RIP uses the services of UDP on well-known port 520. or prefix useful if 2 AS share a backbone network 19 The McGraw-Hill Companies, Inc., 2000 19 Adapted for use at JMU by Mohamed Aboutabl, 2003
Authentication Protect against unauthorized advertisement First entry (with family type = FFFF) is used for authontication 20 The McGraw-Hill Companies, Inc., 2000 20 Adapted for use at JMU by Mohamed Aboutabl, 2003
Computer Communications - CSCI 551 CS551 External v.s. Internal BGP Bill Cheng http://merlot.usc.edu/cs551-f12 Copyright William C. Cheng 1
Exterior vs. Interior World vs. me EGP vs. IGP Computer Communications - CSCI 551 Little control vs. complete administrative control BGP (and GGP, Hello, EGP) vs. (RIP, OSPF, IS-IS, IGRP, EIGRP) Copyright William C. Cheng 2
Learning Routes Computer Communications - CSCI 551 AS1 R1 R2 R3 BGP R4 AS2 BGP can be used by R3 and R4 to learn routes. How do R1 and R2 learn routes? (How does R3 pass on the routes that is has learned to R1 and R2?) Option 1: Inject routes in IGP (such as OSPF) only works for small routing tables Option 2: Use I-BGP Copyright William C. Cheng 3
Why BGP as an IGP? Computer Communications - CSCI 551 I-BGP has mechanisms to forward BGP policy directives across an AS Often use I-BGP with some other IGP (such as OSPF) that does internal routing Copyright William C. Cheng 4
I-BGP Computer Communications - CSCI 551 Upstream Provider A AS100 Upstream Provider B AS200 ebgp ebgp AS 1 AS 2 ibgp ibgp ebgp Copyright William C. Cheng 5
E-BGP vs. I-BGP E-BGP connects AS s (external GP) I-BGP is intra-as (internal GP) Differences in operation direct vs. indirect connections different failure modes special attributes for internal use Computer Communications - CSCI 551 Copyright William C. Cheng 6
Internal BGP (I-BGP) Computer Communications - CSCI 551 Same message types, attribute types, and state machine as E-BGP Different rules about re-advertising prefixes: prefix learned from E-BGP can be advertised to I-BGP neighbor and vice-versa, but prefix learned from one I-BGP neighbor cannot be advertised to another I-BGP neighbor reason: no AS-PATH within the same AS and thus danger of looping Copyright William C. Cheng 7
Internal BGP (I-BGP) Computer Communications - CSCI 551 R1 R3 E-BGP R4 R2 I-BGP R3 can tell R1 and R2 prefixes from R4 R3 can tell R4 prefixes from R1 and R2 R3 cannot tell R2 prefixes from R1 Copyright William C. Cheng 8
Internal BGP (I-BGP) Computer Communications - CSCI 551 R1 R2 R3 E-BGP R4 I-BGP R3 can tell R1 and R2 prefixes from R4 R3 can tell R4 prefixes from R1 and R2 R3 cannot tell R2 prefixes from R1 R2 can only find these prefixes through a direct connection to R1 Result: I-BGP routers must be fully connected (via TCP)! contrast with E-BGP sessions that map to physical links Copyright William C. Cheng 9
I-BGP Computer Communications - CSCI 551 Upstream Provider A Upstream Provider B ebgp AS100 I-BGP mesh AS200 ebgp ibgp AS 1 AS 2 ibgp ebgp Copyright William C. Cheng 10
BGP Example Computer Communications - CSCI 551 R1 advertises routes within AS1 to R2 R2 advertises routes within AS2 and AS3 to R1 R2 learns AS3 routes from I-BGP with R4 R4 learns AS3 routes from E-BGP with R6 R4 advertises routes within AS2 and AS1 to R6 AS1 R1 E-BGP AS2 I-BGP R2 R5 R3 R4 E-BGP R6 AS3 Copyright William C. Cheng 11
Two types of link failures: failure on an E-BGP link failure on an I-BGP Link Link Failures Computer Communications - CSCI 551 These failures are treated completely different in BGP Why? Copyright William C. Cheng 12
Failure on an E-BGP Link Computer Communications - CSCI 551 E-BGP session AS1 R1 R2 AS2 Physical link 138.39.1.1/30 138.39.1.2/30 Note that 138.39.1.1 and 138.39.1.2 are on the same network If the link R1-R2 goes down, then the TCP connection breaks and so does the E-BGP connection; BGP routes are removed This is the desired behavior Copyright William C. Cheng 13
Failure on an I-BGP Link Computer Communications - CSCI 551 138.39.1.2/30 R2 Physical link 138.39.1.1/30 R1 I-BGP session R3 If physical link R1-R2 goes down, the 138.39.1.0/30 network becomes unreachable, connection between R1 and R2 is lost R1 and R2 should, in theory, still be able to exchange traffic, i.e., the indirect path through R3 should be used given the above configuration, it would not work! thus, E-BGP and I-BGP must use different conventions with respect to TCP endpoints Note: I-BGP often does not go over a physical link Copyright William C. Cheng 14
Virtual Interfaces (VIFs, a.k.a. Loop-back Interfaces) Computer Communications - CSCI 551 138.39.128.1/30 138.39.128.5/30 R1 vif I-BGP session Physical link vif R2 138.39.1.1/30 138.39.1.2/30 Note that 138.39.128.1 and 138.39.128.5 are on different networks here! A VIF is not associated with a physical link or hardware interface How do routers learn of VIF addresses? use IGP Copyright William C. Cheng 15
Scaling the I-BGP Mesh Computer Communications - CSCI 551 Two methods: BGP confederations scale by adding hierarchy to AS (sub-as) Route reflectors scale by adding hierarchical IBGP route forwarding Copyright William C. Cheng 16
AS Confederation Computer Communications - CSCI 551 Subdivide a single AS into multiple, internal sub-as s to reduce I-BGP mesh size simple hierarchy but only one level Still advertises a single AS to external peers internally use sub-as s Copyright William C. Cheng 17
An AS Confederation Computer Communications - CSCI 551 AS1 AS Confederation Sub-AS s 10 11 14 12 13 R1 R2 does not see sub-as 10-14, but sees AS1 R2 AS2 Copyright William C. Cheng 18
Confederations Computer Communications - CSCI 551 BGP sessions between sub-as s are like regular E-BGP but with some changes: local-pref attribute remains meaningful within confederation (E-BGP ignores it) next-hop attribute traverses sub-as boundaries (assumes single IGP running - everyone has same route to next-hop) AS-PATH now includes AS-CONFED-SET and AS-CONFED-SEQUENCE to avoid loops Copyright William C. Cheng 19
BGP Confederation Computer Communications - CSCI 551 AS300 AS10 AS20 AS30 Copyright William C. Cheng 20
Route Reflectors Computer Communications - CSCI 551 Route Reflector (RR): router whose BGP implementation allows re-advertisement of routes between I-BGP neighbors RR runs modified I-BGP Route Reflector Client (RRC): router that depends on RR to re-advertise its routes to entire AS. It also depends on RR to learn routes from the rest of the network RRC runs normal I-BGP Copyright William C. Cheng 21
RR Example Computer Communications - CSCI 551 128.4.0.0/16 RR-C1 RR3 AS1 RR-C3 RR1 RR2 RR-C2 modified I-BGP RR-C4 I-BGP E-BGP neighbor AS2 E-BGP 138.39.0.0/16 With RR there are 7 I-BGP sessions instead of 21 (=7*6/2) Copyright William C. Cheng 22
Rules for Route Reflectors Computer Communications - CSCI 551 Reflectors advertise routes learned from clients into the I-BGP mesh RR1 advertises 138.39.0.0/16 learned from RRC2 into I-BGP Reflectors do not re-advertise routes between non-clients RR1 will not re-advertise 128.4.0.0/16 learned from RR3 to RR2 Copyright William C. Cheng 23