A Technique for Reducing BGP Update Announcements through Path Exploration Damping Geoff Huston, Mattia Rossi, Grenville Armitage mrossi@swin.edu.au Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology Terminology and BGP recap BGP messages between BGP speakers (peers): OPEN KEEP-ALIVE UPDATE NOTIFICATION ROUTE-REFRESH UPDATE messages: carry announcements or withdrawals or both for the same -path announcements or withdrawals are a list of prefixes update packing -path: list of es to be traversed to reach the originator/owner of the prefix -path length is the main criteria for deciding the best path Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 2
Terminology and BGP recap BGP speaker uses 3+1 Tables: ADJ-RIB-IN (per peer) RIB (routing table) ADJ-RIB-OUT (per peer) decision making takes place in RIB using information from ADJ-RIB-IN ADJ-RIB-OUT used for UPDATEs per peer FIB, forwarding table 2 Layers: Control plane (routing plane) exchange of routing information Data plane (forwarding plane) packet forwarding Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 3 BGP dynamics Simplified example for ebgp: 5 2 3 4 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 4
BGP dynamics 5 Path: 1 2 3 4 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 5 BGP dynamics Path: 2,1 5 Path: 1 2 3 4 Path: 2,1 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 6
BGP dynamics Path: *2,1 3,2,1 5 Path: 5,2,1 Path: 1 2 3 4 Path: 2,1 Path: 3,2,1 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 7 BGP dynamics Path: *2,1 3,2,1 4,3,2,1 5 Path: 5,2,1 Path: 1 2 3 4 Path: 2,1 Path: 3,2,1 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 8
What is Path Exploration? From where wepath: left before: *2,1 3,2,1 4,3,2,1 5 Path: 5,2,1 2 3 4 Path: 2,1 Path: 3,2,1 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 9 What is Path Exploration? Path: *3,2,1 4,3,2,1 5 Path: 5,2,1 W 2 W 3 4 Path: 3,2,1 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 10
What is Path Exploration? Path: *4,3,2,1 5 A Path: 5,3,2,1 W 2 3 W 4 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 11 What is Path Exploration? 5 A Path: 5,4,3,2,1 W 2 3 4 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 12
What is Path Exploration? 5 W 2 3 4 1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 13 Path Exploration consequences Unnecessary load! Every update triggers one or more decision processes CPU load Every peer sends updates peers CPU load Path Exploration unnecessary updates unnecessary CPU load Unnecessary network load Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 14
Current methods to avoid unnecessary updates Old knowledge - countermeasures exist Outgoing updates depend on arrival time of incoming updates Three methods available in current BGP systems All three based on timing: MRAI - Minimum Route advertisement Interval Delays announcements Goal: Prevention of interim announcements RFD - Route Flap Damping Delays withdrawals and announcements for unstable peers for 24 hrs Goal: Prevention of updates caused by flapping routes WRATE - Withdrawal rate limiting Delays withdrawals and announcements Goal: Allow peer to stabilize before sending updates Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 15 The Problem Current status is not ideal: RFD is strongly discouraged WRATE likely disrupts data delivery MRAI widely deployed with a timer of 30 seconds + random jitter for ebgp often not deployed on a per-prefix basis random behavior We need a better solution! Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 16
Update categorization Code AA+ AA- AA0 AA* AA AW NA Description Announcement of an already announced prefix with a longer Path (update to longer path) Announcement of an announced prefix with a shorter Path (update to shorter path) Announcement of an announced prefix with a different path of the same length (update to a different Path of same length) Announcement of an announced prefix with the same path but different attributes (update of attributes) Announcement of an announced prefix with no change in path or attributes (possible BGP error or data collection error) Withdrawal of an announced prefix Announcement of a previously unknown or withdrawn prefix Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 17 Update sequences Path Exploration example by update sequence: {NA, AA+, AA+, AW} We define Path Exploration: An update sequence lengthening the -path gradually until stability is reached Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 18
PED Getting rid of unnecessary updates the smart way Path Exploration Damping PED Algorithm: Delay updates announcing a longer (or equal-length) -path (AA+,AA0,AA*,AA) Immediately send announcements of a shorter -path or withdrawals (AA-,AW) Do not delay initial announcements (NA) Perform output queue compression Introduction of the Path Exploration Damping Timer PEDI PED does not change the BGP protocol PED can be deployed incrementally Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 19 PED Implementation Quagga 0.99.13 Implemented in ADJ-RIB-OUT (does not affect decision process) Per-peer and per-prefix basis Added jitter like MRAI does Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 20
PED Reduction in update load Experiments using 24 hours of real BGP updates Two datasets: 1. APNIC (2 peers) 2. Routeviews (5 peers) Replayed using the Quagga-Accelerator 293 11537 3727 4777 4608 2914 48285 6447 Router 2 Routeviews Router 1 65102 Router 1 65102 APNIC 131702 Router 2 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 21 PED Reduction in update load Used a range of PEDI settings based on intervals of incoming Path Exploration sequences: 1. APNIC: 30s 70s, 5s steps 2. Routeviews: 5s 75s, 5s steps Compared to 0s MRAI (no delay Juniper default) and 30s MRAI (Cisco default) Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 22
Incoming Path Exploration intervals APNIC CDF 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF Interval dis tribution 10k 1k 100 10 1 log(number of Path Exploration events ) 0 30 60 90 120 150 180 210 240 270 300 interval times in s econds Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 23 Incoming Path Exploration intervals Routeviews CDF 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF Interval dis tribution 100k 10k 1k 100 10 1 log(number of Path Exploration events ) 0 30 60 90 120 150 180 210 240 270 300 interval times in s econds Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 24
Results APNIC number of mes s ages 160k 140k 120k 100k 80k 60k 40k 20k 0 BGP update mes s ages Prefix announcements Prefix withdrawals 70 s ec PEDI 65 s ec PEDI 60 s ec PEDI 55 s ec PEDI 50 s ec PEDI 45 s ec PEDI 40 s ec PEDI 35 s ec PEDI 30 s ec PEDI 0 s ec MRAI 30 s ec MRAI Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 25 Results Routeviews number of mes s ages 150k 100k 50k BGP update mes s ages Prefix announcements Prefix withdrawals 0 75 s ec PEDI 70 s ec PEDI 65 s ec PEDI 60 s ec PEDI 55 s ec PEDI 50 s ec PEDI 45 s ec PEDI 40 s ec PEDI 35 s ec PEDI 30 s ec PEDI 25 s ec PEDI 20 s ec PEDI 15 s ec PEDI 10 s ec PEDI 5 s ec PEDI 0 s ec MRAI 30 s ec MRAI Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 26
Results in Numbers Reduction of announcements compared to 30s MRAI: APNIC dataset: 20% for 35s PEDI 29% for 65s PEDI Routeviews dataset: 18% for 35s PEDI 32% for 65s PEDI Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 27 Good results but... What about convergence time!? Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 28
What is convergence in BGP? Convergence is commonly understood as: All es participating in a BGP system have received the latest, up-to-date routing information for a certain prefix A BGP system is stable Approximated by: How long does it take for the upstream peer to have a stable route to a prefix? Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 29 Convergence time approximation APNIC data 800 700 600 Upper and lower quartile Mean Median 500 s econds 400 200 150 100 50 0 70s PEDI 65s PEDI 60s PEDI 55s PEDI 50s PEDI 45s PEDI 40s PEDI 35s PEDI 30s PEDI 0s MRAI 30s MRAI Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 30
Convergence time approximation Routeviews data seconds 800 700 600 500 400 250 200 Upper and lower quartile Mean Median 150 100 50 0 70 sec PDI 65 sec PDI 60 sec PDI 55 sec PDI 50 sec PDI 45 sec PDI 40 sec PDI 35 sec PDI 30 sec PDI 25 sec PDI 20 sec PDI 15 sec PDI 10 sec PDI 5 sec PDI 0 sec MRAI 30 sec MRAI Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 31 Convergence refined optimality vs. reachability Convergence in BGP is actually twofold: Optimality: Every in the BGP system has the best path to the originator/owner of the prefix Control plane convergence The BGP system is stable Reachability: Every in the BGP system has a path to the originator/owner of the prefix Path doesn t need to be valid, data delivery needs to be ensured Data plane convergence BGP system doesn t need to be stable to achieve this state Control plane convergence only needed up to an altbgp speaker Reachability ensures data delivery, optimality the best path Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 32
Convergence refined optimality vs. reachability We have 4 possible events that trigger instability: T long A link failure triggers an announcement of a longer (or equal-length) -path T short A link recovery triggers an announcement of a shorer -path T down A link failure triggers a withdrawal T up A link recovery triggers a new announcement Reachability and optimality are achieved at different times, depending on the event: T long Reachability is achieved before optimality T short Reachability is already ensured, only optimality needs to be achieved T down Reachability can not be achieved, optimality is achieved when every peer has withdrawn the route T up Reachability and optimality are mostly achieved at the same time, optimality can be delayed by timers though Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 33 Convergence with MRAI at 30s and PEDI at 35s Announcement of initial route at 6 : PED: 5 seconds MRAI: 60-120 seconds Stable System: * 1 * 2,1 *3,2,1 *4,3,2,1 6,13,12,11,10,1 *13,12,11,10,1 5,4,3,2,1 2 3 4 5 6 * 1 *10,1 Data to 1 *11,10,1 22,21,20,10,1 1 10 11 12 13 Prefix Origin: 1.0.0.0/8 *12,11,10,1 20 21 22 30 *10,1 *20,10,1 *21,20,10,1 12,11,10,1 *13,12,11,10,1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 34
Convergence MRAI at 30s and PEDI at 35s T long between 10 and 11 : Reachability achieved ( 11 ) PED: 0 seconds MRAI: 0-4 or 29-30 seconds * 1 * 2,1 *3,2,1 *4,3,2,1 6,13,12,11,10,1 *13,12,11,10,1 5,4,3,2,1 2 3 4 5 6 * 1 Data to 1 *22,21,20,10,1 *12,22,21,20,10,1 1 10 11 12 13 Prefix Origin: 1.0.0.0/8 *12,11,10,1 *10,1 *20,10,1 20 21 22 30 *21,20,10,1 *13,12,11,10,1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 35 Convergence MRAI at 30s and PEDI at 35s T long between 10 and 11 : MRAI goes further within 1-30 seconds * 1 * 2,1 *3,2,1 *4,3,2,1 6,13,12,11,10,1 *5,4,3,2,1 13,12,22,21,20,10,1 2 3 4 5 6 * 1 Data to 1 *22,21,20,10,1 *12,22,21,20,10,1 1 10 11 12 13 Prefix Origin: 1.0.0.0/8 *12,22,21,20,10,1 *10,1 *20,10,1 20 21 22 30 *21,20,10,1 *13,12,11,10,1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 36
Convergence MRAI at 30s and PEDI at 35s T long between 10 and 11 : Optimality achieved: PED: 66 seconds (+-jitter) MRAI: 2-58 seconds * 1 * 2,1 *3,2,1 *4,3,2,1 *5,4,3,2,1 13,12,22,21,20,10,1 2 3 4 5 6 * 1 Data to 1 *22,21,20,10,1 *12,22,21,20,10,1 1 10 11 12 13 Prefix Origin: 1.0.0.0/8 *12,22,21,20,10,1 *10,1 *20,10,1 *21,20,10,1 20 21 22 30 *13,12,22,21,20,10,1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 37 Convergence MRAI at 30s and PEDI at 35s T short between 10 and 11 : Optimality achieved: PED: 2 seconds MRAI: 31-33 and 55-60 seconds * 1 * 2,1 *3,2,1 *4,3,2,1 6,13,12,11,10,1 *13,12,11,10,1 5,4,3,2,1 2 3 4 5 6 * 1 *10,1 Data to 1 *11,10,1 22,21,20,10,1 1 10 11 12 13 Prefix Origin: 1.0.0.0/8 *12,11,10,1 *10,1 *20,10,1 20 21 22 30 *21,20,10,1 12,11,10,1 *13,12,11,10,1 Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 38
Convergence MRAI at 30s and PEDI at 35s Announcement of initial route at 6 : PED: 5 seconds MRAI: 60-120 seconds T down at 1 Optimality achieved (all routes withdrawn): PED: 0 seconds MRAI: 0 seconds T up at 1 Optimality achieved (same as initial anouncement): PED: 5 seconds MRAI: 32-34, 58-60, 76-90 seconds Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 39 Conclusions PED diminishes update load PED delays optimality in some cases PED is faster than MRAI for T up and T short, slower for T long and same for T down PED is more consistent than MRAI 35 second PEDI is a safe default value Can be deployed incrementally Interacts well with MRAI Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 40
Future work Dynamic PEDI calculation Improvement of heuristics / addition of new heuristics Improvement of implementation Improvement of evaluation tools Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 41 Thank you! Questions? Architectures http://www.caia.swin.edu.au mrossi@swin.edu.au 16 December, 2010 42