State of routing research Olivier Bonaventure with Pierre François, Bruno Quoitin and Steve Uhlig Dept. Computing Science and Engineering Université catholique de Louvain (UCL) http://www.info.ucl.ac.be/people/obo January 15th, 2005 Page 1
Agenda Intradomain unicast routing Improving IGP convergence Fast-reroute techniques Interdomain unicast routing Page 2
Simulated network GEANT Page 3
Simulation model Page 4 Router model Cisco 12000 Based on measurements performed by Clarence Filsfils Key parameters Time to produce a new LSP : 2 milliseconds Failure detection : [10,15] msec SPF computation time 2-4 msec for a 22 nodes network 20-30 msec for a 200-nodes network FIB update time : 100-110 microseconds per prefix usually much larger than SPF computation time Pacing time : 33 milliseconds (ISIS default) Fast flooding : on or off Simulator SSFNET, simulations by Pierre François
How to evaluate IGP convergence? Page 5 Packet-based approach Often used to perform measurements Principle Starting shortly before the failure, send a constant stream of packets from each router to any router in the network Count the number of packets that arrive in sequence at their destination are sent over failed links loop in the network due to transient loops are dropped inside routers due to unreachable destination Derive convergence time for each source/destination pair affected by the failure Drawbacks Huge simulation cost as most packets are useless Each packet takes a sample of the routing table of the routers that it passes through
How to evaluate convergence? (2) Page 6 The Nettester approach After each physical failure, detection of a failure of FIB update, check consistency of routing tables for each router-router pair Definition Routing is consistent for a pair S-D at time t if all the paths that packets would follow, from S to D, based on the FIB of the routers at time t, are loop-free and finish with D, without passing through a failed link. Principle Before the failure, routing is consistent Convergence time is the time when routing becomes and remains consistent for all router-router pairs Consistency is checked by using the loopback addresses of the routers as source and destination Note that a packet-based definition could find a lower convergence time than the consistency time
All link failures in GEANT Page 7
Initial wait Recommendations for link failures Should be as small as possible to improve convergence in case of link failures 70% of the failures are link failures in Sprint FIB size A small FIB size is important to ensure fast convergence Reducing the number of prefixes advertised by the IGP reduces convergence time IGP weights Should be set to reroute as locally as possible Page 8
Router failures Used router failures as a way to model SRLG failures Few SRLG information is available for the GBLX and GEANT topologies Detecting SRLG information from traces is difficult What happens when a router fails? all its links fail and its neighbors detect the link failure within 10-15 msec All neighbors flood their new LSP Page 9
Convergence time for router failures Modification to Nettester Definition Routing is consistent for a pair S-D at time t if all the paths that packets would follow, from S to D, following the FIB of the routers at time t, are loop-free and end at D, without passing through failed node Principle Before the failure, routing is consistent Convergence time is the time when routing becomes and remains consistent for all router-router pairs (excluding the failed router) Consistency is checked by using the loopback addresses of the routers as sources and destinations Page 10
All router failures in GEANT Static FIB updates, 33 msec pacing Page 11
All router failures in GEANT Static FIB updates, fast flooding Page 12
All router failures in GEANT Incremental FIB updates, fast flooding Page 13
Agenda Intradomain unicast routing Improving IGP convergence Fast-reroute techniques Interdomain unicast routing Page 14
First step How to provide sub 50 msec recovery in pure IP networks? When a (directed) link fails, immediately reroute the packets at the router that detects the failure to a loop-free alternate router This loop-free alternate router is precomputed What is a loop-free alternate router? For the failure of link S->E and destination D, this is a router N, whose shortest path to reach D does not contain S->E Page 15
Loop-free neighbor W 1 E SPT(W) 10 1 Loop-free neighbor detection algorithm for protected link S->E For each direct neighbor (S->N i ) Compute SPT(Ni) if (S->E) SPT(N i ) S then N i is a candidate loop-free neighbor for all destinations otherwise not Page 16
The protectable links with loopfree neighbors in GEANT Page 17 one direction protectable No direction protectable Both directions protectable
Loop-free alternate routers How to improve the coverage? Use as loop-free alternate a router that does not use the (directed) link to be protected rspt(a->b) IP Tunnel to E W A 10 Precompute a tunnel towards loop-free alternate router to protect link from failure N S direction to protect 10 B E N's routing table All : via E E's routing table N : NorthWest S : SouthWest W: West B : South A : South via B West via W Page 18
Page 19 Are loop-free alternate routers sufficient? Unfortunately not... Traffic is rerouted quickly after the failure, but a new IGP convergence will take place and this convergence may cause transient loops transient loops can last hundreds of msec and cause packet losses although traffic was rerouted... Three solutions are discussed within IETF Synchronised update of all the FIBs Timer-based ordering the updates of the FIBs Distributed ordering of the updates of the FIBs Solution developped could also handle all non-urgent topology changes link brought up/ down for maintenance router reboot change in link weights
Agenda Intradomain unicast routing Interdomain unicast routing Current issues with BGP Centralising interdomain routing Page 20
BGP routing tables are growing......again Page 21 Main cause is multihoming growth A much better multihoming architecture is being developped for IP v6 Source : http:/ / bgp.potaroo.net/ as1221/ bgp-active.html
Safe policies BGP policies customer-provider shared-cost peering provided that customer-provider hierachy is a graph without cycles Page 22 State of the art ISPs invent and deploy more complex policies ISP1 shared-cost peer of ISP2 in Europe customer of ISP2 in USA maybe some day provider of ISP2 in Asia State of research Few theoretical results guarantee the convergence of the BGP routing policies...
Page 23 Quality of Service inside a single domain The (simplified) research network approach buy high bandwidth circuits and high performance routers overprovision (almost) everything to avoid packet losses due to congestion The (simplified) commercial approach design network based on business model and expected revenue overprovision where possible, use QoS techniques elsewhere Provide QoS on congested access links Current status It is possible to provide stronger QoS than best effort inside large networks
Main commercial drivers Quality of Service across domains Interdomain VPNs Voice over IP and TV/ Video over IP See http:/ / cfp.mit.edu/ meetings/ oct04/ slides/ Page 24 Issues How to advertise the quality of interdomain path? How to select the interdomain path with the best quality? How to verify that interdomain QoS is met? How to establish interdomain MPLS LSPs? How to scale to 20.000+ AS and 200.000+ prefixes?
State of the art BGP security BGP misconfigurations are still too common Spammers and other criminals try all kinds of techniques myasn service run by RIPE collect BGP routing tables from various peers warns prefix maintainer by email/sms when his prefix is advertised by someone else.. Proposed solutions SBGP SoBGP... Page 25 Deploymentability is key problem
BGP convergence Page 26 Measurements in global Internet Labovitz et al., BGP beacons BGP convergence can take O(10sec) or more Recent measurements 18 seconds to reroute 500k BGP routes on 40Gbps interface http://www.lightreading.com/document.asp?site= testin A few seconds to allow BGP to converge when PE-CE link fails in BGP/ MPLS VPN http://www.cisco.com/global/emea/networkers/post_e How to improve convergence? Tune BGP implementation Ghost flushing, MRAI timer, dampening Change BGP Root Cause Notification proposal
Changing BGP Can we replace BGP with something else? IRTF routing working group produced a requirements document some work by one CAIDA researcher HLP idea presented at SIGCOMM's hotnets Assumes a hierarchical Internet and current policies hybrid between path vector and link state path vectors are exchanged on shared cost peerings link state packets are exchanged on customer-provider links AS-based routing, not prefix-based routing Other types of interdomain routing pure link-state scalability and policies would be a key concern geographical-based routing how to represent policies with such a routing scheme? Page 27
Agenda Intradomain unicast routing Interdomain unicast routing Current issues with BGP Centralising interdomain routing Page 28
Principle Centralising interdomain routing To improve the quality and performance of interdomain routing, perform routing decisions centrally inside each transit AS RCP : Routing Control Platform proposed by AT&T and MIT PCE : Path Computation Element under development within IETF, mainly to aid the establishment of interarea and interdomain MPLS tunnels Page 29 Intelligent BGP Router Reflectors proposed by UCL, place intelligence inside the route reflectors
BGP-based traffic engineering Traffic engineering in transit ASes Current solutions Collect traffic matrix and tune IGP weights Collect traffic matrix and establish MPLS tunnels with reservations Another solution is possible Most destinations are reachable via BGP learned prefixes, not via IGP learned prefixes a small number of destination prefixes carry most of the transit traffic BGP routes towards important prefixes are stable Instead of tweaking network topology, can tweak selection of the BGP routes to force packets to follow a path meeting TE objectives Proposed by Steve Uhlig and Bruno Quoitin Page 30
Reference environment Page 31
Example scenario Congestion! Page 32
Tunnel-based solution : principles TE tool collects IGP information for topology BGP routes Traffic statistics Based on traffic engineering objectives, TE tool determines the BGP route to be advertised to each border router via ibgp by sending different BGP UPDATEs for each important prefix to each ingress routers, TE tool can influence the flow of the IP traffic ingress routers send transit traffic inside tunnels (MPLS, GRE,...) to egress routers Page 33
The traffic engineering objectives TE tool optimises three objectives Objective 1 Reduce the number of BGP route tweaking performed by the TE tool required to avoid causing routing instabilities Objective 2 Load balance the trafic over the 6 main peering links load balancing is easy to show, but cost-based objectives can be used Objective 3 Reduce increase in transit cost of trafic transit cost = Page 34
Tunnel-based solution : example Page 35
Evaluation of tunnel-based solution The network GEANT network used as transit AS 2 commercial providers in five locations load-balancing is a problem on some links Page 36
Evaluation of tunnel-based solution The traffic Netflow was not yet available and we generated synthetic traffic Page 37 Added sinusoidal variation to traffic amount to model hourly changes
Evaluation of tunnel-based solution BGP GEANT uses a full ibgp-mesh We received from intel a MRT trace of all ibgp messages received by one BGP router Those MRT messages are fed to CBGP that recomputes routes Page 38
Principle Every period Incremental heuristic Predict amount of traffic that will be sent towards each important prefix during next period Determine nexthop for each (ingress,prefix) pair with CBGP Starting from CBGP computed configuration, use evolutionary algorithm to determine the best BGP tweaking to be applied on prefixes to load-balance outgoing links with constraint not more than 50 different prefixes can be tweaked per period Apply the set of tweaking that minimizes the increase in transit cost Page 39
Performance of tunnel-based solution Peering link 0 Page 40
Performance of tunnel-based solution Peering link 2 Page 41
Performance of tunnel-based solution Peering link 1 Page 42
Performance of tunnel-based solution Peering link 5 Page 43
Tabu heuristic Page 44 Principle Every period Predict amount of traffic that will be sent towards each important prefix during next period Determine nexthop for each (ingress,prefix) pair with CBGP Starting from the network configuration at the end of the previous period, use evolutionary algorithm to determine the best BGP tweaking to be applied on prefixes to load-balance outgoing links with constraints not more than N different prefixes can be tweaked per period tweakings performed less than lifetime (tabu list) periods ago cannot be unchanged Apply the set of tweaking that minimizes the increase in transit cost place new tweakings in tabu list
Performance of tunnel-based solution Tabu : Peering link 0 Page 45
Performance of tunnel-based solution Tabu : Peering link 2 Page 46
Performance of tunnel-based solution Tabu : Peering link 1 Page 47
Performance of tunnel-based solution Tabu : Peering link 5 Page 48
Reducing the impact of peering failures Simulation study Consider a single day of the GEANT analysis Simulate the failure of peering link 3 for one hour with and without the TE box Page 49
Impact of the failure of peering 3 on peering link 1 Page 50
Impact of the failure of peering 3 on peering link 0 Page 51
Impact of the failure of peering 3 on peering link 4 Page 52
Intradomain routing Conclusion Mainly considered as an engineering problem Few ongoing research activity traffic engineering is an exception achieving sub 50-millisecond recovery in pure IP networks is challenging Interdomain routing BGP is key for Internet performance and VPNs Both engineering and research work We still do not correctly understand the theoretical limits of policy-based interdomain routing QoS and security will be important challenges Achieving fast recovery is still a research challenge Scalability is a serious concern Page 53