Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network

Similar documents
Dynamics of Hot-Potato Routing in IP Networks

BGP Route Propagation between Neighboring Domains

Hot Potatoes Heat Up BGP Routing

Interdomain Routing Reading: Sections K&R EE122: Intro to Communication Networks Fall 2007 (WF 4:00-5:30 in Cory 277)

BGP Routing inside an AS

Some Foundational Problems in Interdomain Routing

Interdomain Routing Reading: Sections P&D 4.3.{3,4}

Impact of Hot-Potato Routing Changes in IP Networks

Inter-Autonomous-System Routing: Border Gateway Protocol

Inter-Autonomous-System Routing: Border Gateway Protocol

Inter-Domain Routing: BGP

Preventing the unnecessary propagation of BGP withdraws

Interdomain Routing. EE122 Fall 2011 Scott Shenker

CS4450. Computer Networks: Architecture and Protocols. Lecture 15 BGP. Spring 2018 Rachit Agarwal

Lecture 4: Intradomain Routing. CS 598: Advanced Internetworking Matthew Caesar February 1, 2011

Investigating occurrence of duplicate updates in BGP announcements

Feng Wang, Zhuoqing Morley Mao, Jia Wang, Lixin Gao, Randy Bush. Presenter s Qihong Wu (Dauphin)

The Case for Separating Routing from Routers

Backbone Networks. Networking Case Studies. Backbone Networks. Backbone Topology. Mike Freedman COS 461: Computer Networks.

Advanced Computer Networks

Interdomain Routing. Networked Systems (H) Lecture 11

Inter-Domain Routing: BGP II

CS 43: Computer Networks Internet Routing. Kevin Webb Swarthmore College November 14, 2013

CSCI Topics: Internet Programming Fall 2008

1 Introduction. AT&T Labs - Research. Jay Borkenhagen Dept. HA MT C5-3D

Guidelines for Interdomain Traffic Engineering

Interdomain Routing. EE122 Fall 2012 Scott Shenker

CS 204: BGP. Jiasi Chen Lectures: MWF 12:10-1pm Humanities and Social Sciences

BGP Routing Stability of Popular Destinations

A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance

Network-Wide Prediction of BGP Routes

Inter-AS routing and BGP. Network Layer 4-1

Internet Routing Protocols Lecture 01 & 02

Lecture 18: Border Gateway Protocol

CS 43: Computer Networks Internet Routing. Kevin Webb Swarthmore College November 16, 2017

CS 43: Computer Networks. 24: Internet Routing November 19, 2018

Lecture 17: Border Gateway Protocol

Inter-Domain Routing: BGP II

Border Gateway Protocol - BGP

Lecture 3: Packet Forwarding

Interdomain routing CSCI 466: Networks Keith Vertanen Fall 2011

Network Layer (Routing)

Internet inter-as routing: BGP

Inter-AS routing. Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley

ABSTRACT. by Jian Wu

TO CONTROL the flow of traffic through their networks,

Configuring BGP. Cisco s BGP Implementation

COM-208: Computer Networks - Homework 6

BGP Convergence in Virtual Private Networks

Routing on the Internet. Routing on the Internet. Hierarchical Routing. Computer Networks. Lecture 17: Inter-domain Routing and BGP

Lecture 16: Border Gateway Protocol

Next Lecture: Interdomain Routing : Computer Networking. Outline. Routing Hierarchies BGP

Last time. Transitioning to IPv6. Routing. Tunneling. Gateways. Graph abstraction. Link-state routing. Distance-vector routing. Dijkstra's Algorithm

BGP. Daniel Zappala. CS 460 Computer Networking Brigham Young University

Lecture 16: Interdomain Routing. CSE 123: Computer Networks Stefan Savage

Chapter H through R. loss (PfR), page 28. load-balance, page 23 local (PfR), page 24 logging (PfR), page 26

Configuring BGP community 43 Configuring a BGP route reflector 44 Configuring a BGP confederation 44 Configuring BGP GR 45 Enabling Guard route

CSCD 433/533 Network Programming Fall Lecture 14 Global Address Space Autonomous Systems, BGP Protocol Routing

Chapter IV: Network Layer

Chapter 13 Configuring BGP4

Protecting an EBGP peer when memory usage reaches level 2 threshold 66 Configuring a large-scale BGP network 67 Configuring BGP community 67

Dynamics of Hot-Potato Routing in IP Networks

Configuring BGP on Cisco Routers Volume 1

Outline Computer Networking. Inter and Intra-Domain Routing. Internet s Area Hierarchy Routing hierarchy. Internet structure

BGP routing changes: Merging views of two ISPs

The Internet now-a-days has become indispensable to each and everyone. It is vulnerable to

! Distance vector routing! Link state routing.! Path vector routing! BGP: Border Gateway Protocol! Route aggregation

internet technologies and standards

BGP. Autonomous system (AS) BGP version 4

Practical Verification Techniques for Wide-Area Routing

Hierarchical Routing. Our routing study thus far - idealization all routers identical network flat not true in practice

Operation Manual BGP. Table of Contents

BGP. Autonomous system (AS) BGP version 4

Troubleshooting High CPU Caused by the BGP Scanner or BGP Router Process

ISP Border Definition. Alexander Azimov

BGP. Autonomous system (AS) BGP version 4

A Technique for Reducing BGP Update Announcements through Path Exploration Damping

Connecting to a Service Provider Using External BGP

BGP. Autonomous system (AS) BGP version 4. Definition (AS Autonomous System)

Configuring Internal BGP Features

Computer Networking Introduction

Taming BGP. An incremental approach to improving the dynamic properties of BGP. Geoff Huston. CAIA Seminar 18 August

Introduction to BGP ISP/IXP Workshops

Master Course Computer Networks IN2097

BGP Scaling Techniques

BGP Support for Next-Hop Address Tracking

CS 268: Computer Networking. Next Lecture: Interdomain Routing

Introduction to BGP. ISP Workshops. Last updated 30 October 2013

ibgp Multipath Load Sharing

TELE 301 Network Management

Flexible Forwarding. Tim Sally.

Border Gateway Protocol (an introduction) Karst Koymans. Monday, March 10, 2014

BGP. Autonomous system (AS) BGP version 4. Definition (AS Autonomous System)

Making an AS Act Like a Single Node: Toward Intrinsically Correct Interdomain Routing

Chapter 4: Network Layer

Configuration prerequisites 45 Configuring BGP community 45 Configuring a BGP route reflector 46 Configuring a BGP confederation 46 Configuring BGP

CS519: Computer Networks. Lecture 4, Part 5: Mar 1, 2004 Internet Routing:

Chapter 4: Network Layer. Lecture 12 Internet Routing Protocols. Chapter goals: understand principles behind network layer services:

Ravi Chandra cisco Systems Cisco Systems Confidential

6.829 BGP Recitation. Rob Beverly September 29, Addressing and Assignment

Transcription:

Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network Jian Wu (University of Michigan) Z. Morley Mao (University of Michigan) Jennifer Rexford (Princeton University) Jia Wang (AT&T Labs Research) 1

Motivation AS2 AS4 destination AS3 Failure Disruption Congestion A B AS1 A backbone network is vulnerable to routing D changes that occur in other domains. source C Mitigation 2

Goal Identify important routing anomalies Lost reachability Persistent flapping Large traffic shifts Contributions: Build a tool to identify a small number of important routing disruptions from a large volume of raw BGP updates in real time. Use the tool to characterize routing disruptions in an operational network 3

Interdomain Routing: Border Gateway Protocol AS 1 BR 12.34.158.0/24 I can reach 12.34.158.0/24 ebgp data traffic AS 2 ibgp I can reach 12.34.158.0/24 via AS 1 ebgp data traffic AS 3 12.34.158.5 Prefix-based: one route per prefix Path-vector: list of ASes in the path Incremental: every update indicates a change Policy-based: local ranking of routes 4

Capturing Routing Changes A large operational network (8/16/2004 10/10-2004) ebgp Updates ibgp Best routes CPE BGP Monitor Best routes ibgp Updates ebgp ibgp ebgp 5

Challenges Large volume of BGP updates Millions daily, very bursty Too much for an operator to manage Different from root-cause analysis Identify changes and their effects Focus on actionable events rather than diagnosis Diagnose causes in/near the AS 6

System Architecture EBR EBR BGP Updates (106 ) BGP BGP Update Update Grouping Grouping Events (10 5 ) Event Event Classification Classification Typed Events Event Event Correlation Correlation Clusters (10 3 ) Traffic Traffic Impact Impact Prediction Prediction Large Disruptions (10 1 ) EBR Persistent Flapping Prefixes (10 1 ) Frequent Flapping Prefixes (10 1 ) EBR EBR Netflow Data EBR From millions of updates to a few dozen reports 7

Grouping BGP Update into Events Challenge: A single routing change leads to multiple update messages affects routing decisions at multiple routers EBR EBR EBR BGP Updates BGP BGP Update Update Grouping Grouping Persistent Flapping Prefixes Events Approach: Group together all updates for a prefix with inter-arrival < 70 seconds Flag prefixes with changes lasting > 10 minutes. 8

Grouping Thresholds Based on our understanding of BGP and data analysis Event timeout: 70 seconds 2 * MRAI timer + 10 seconds 98% inter-arrival time < 70 seconds Convergence timeout: 10 minutes BGP usually converges within a few minutes 99.9% events < 10 minutes 9

Persistent Flapping Prefixes A surprising finding: 15.2% of updates were caused by persistent-flapping prefixes even though flap damping is enabled. Types of persistent flapping Conservative damping parameters (78.6%) Protocol oscillations due to MED (18.3%) Unstable interfaces or BGP sessions (3.0%) 10

Example: Unstable ebgp Session ISP EA ED Peer EB EC p Customer Flap damping parameters is session-based Damping not implemented for ibgp sessions 11

Event Classification Challenge: Major concerns in network management Changes in reachability Heavy load of routing messages on the routers Change of flow of the traffic through the network Events Event Event Classification Classification Typed Events, e.g., Loss/Gain of Reachability Solution: classify events by severity of their impact 12

Event Category No Disruption p AS 1 AS 2 ED EE No Traffic Shift EA EB No Disruption : ISP no border routers have any traffic shift. (50.3%) EC 13

Event Category Internal Disruption p AS 1 AS 2 ED EE EA EB Internal Disruption : ISP all traffic shifts are internal. (15.6%) EC Internal Traffic Shift 14

Event Category Single External Disruption p AS 1 AS 2 ED EE external Traffic Shift EA EB Single External Disruption : ISP only one of the traffic shifts is external (20.7%) EC 15

Statistics on Event Classification No Disruption Internal Disruption Single External Disruption Multiple External Disruption Loss/Gain of Reachability Events 50.3% 15.6% 20.7% 7.4% 6.0% Updates 48.6% 3.4% 7.9% 18.2% 21.9% First 3 categories have significant day-to-day variations Updates per event depends on the type of events and the number of affected routers 16

Event Correlation Challenge: A single routing change affects multiple destination prefixes Typed Events Event Event Correlation Correlation Clusters Solution: group the same-type, close-occurring events 17

EBGP Session Reset Caused most of single external disruption events Check if the number of prefixes using that session as the best route changes dramatically Number of prefixes session recovery session failure time Validation with Syslog router report (95%) 18

Hot-Potato Changes Hot-Potato Changes P EA 11 9 ISP EB 10 Hot-potato routing = route to closest egress point EC Caused internal disruption events Validation with OSPF measurement (95%) [Teixeira et al SIGMETRICS 04] 19

Traffic Impact Prediction Challenge: Routing changes have different impacts on the network which depends on the popularity of the destinations Clusters Traffic Traffic Impact Impact Prediction Prediction Large Disruptions BR E Netflow Data BR E BR E Solution: weigh each cluster by traffic volume 20

Traffic Impact Prediction Traffic weight Per-prefix measurement from netflow 10% prefixes accounts for 90% of traffic Traffic weight of a cluster the sum of traffic weight of the prefixes A small number of large clusters have large traffic weight Mostly session resets and hot-potato changes 21

Performance Evaluation Memory Static memory: current routes, 600 MB Dynamic memory: clusters, 300 MB Speed 99% of intervals of 1 second of updates can be process within 1 second Occasional execution lag Every interval of 70 seconds of updates can be processed within 70 seconds Measurements were based on 900MHz CPU 22

Conclusion BGP troubleshooting system Fast, online fashion Operators concerns (reachability, flapping, traffic) Significant information reduction millions of update a few dozens of large disruptions Uncovered important network behavior Hot-Potato changes Session resets Persistent-flapping prefixes 23