Introduction. 2002, Cisco Systems, Inc.

Similar documents
Troubleshooting High CPU Caused by the BGP Scanner or BGP Router Process

Optimizations for Routing Protocol Stability and Convergence

Cisco CISCO Configuring BGP on Cisco Routers Exam. Practice Test. Version

Introduction. Keith Barker, CCIE #6783. YouTube - Keith6783.

Inter-Domain Routing: BGP

Border Gateway Protocol - BGP

BGP Scaling (RR & Peer Group)

Configuring BGP on Cisco Routers Volume 1

Copyright 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 0945_05F9_c1.scr 1. RST _05_2001_c1

Ravi Chandra cisco Systems Cisco Systems Confidential

Routing Between Autonomous Systems (Example: BGP4) RFC 1771

BGP Part-1.

Routing Concepts. IPv4 Routing Forwarding Some definitions Policy options Routing Protocols

Complex Deployment and Analysis of BGP

Preventing the unnecessary propagation of BGP withdraws

PART III. Implementing Inter-Network Relationships with BGP

The Design and Implementation of OpenBGPd

Other Developments: CIDR

BGP for Internet Service Providers

IBGP internals. BGP Advanced Topics. Agenda. BGP Continuity 1. L49 - BGP Advanced Topics. L49 - BGP Advanced Topics

Configuring BGP community 43 Configuring a BGP route reflector 44 Configuring a BGP confederation 44 Configuring BGP GR 45 Enabling Guard route

Routing Basics ISP/IXP Workshops

BGP Scaling Techniques

Routing Basics. ISP Workshops. Last updated 10 th December 2015

Some Foundational Problems in Interdomain Routing

Routing Basics. Routing Concepts. IPv4. IPv4 address format. A day in a life of a router. What does a router do? IPv4 Routing

Internet Interconnection Structure

LACNIC XIII. Using BGP for Traffic Engineering in an ISP

BGP. Autonomous system (AS) BGP version 4. Definition (AS Autonomous System)

Fast IP Convergence. Section 4. Period from when a topology change occurs, to the moment when all the routers have a consistent view of the network.

BGP AS-Override Split-Horizon

BGP Configuration. BGP Overview. Introduction to BGP. Formats of BGP Messages. Header

Internet Routing : Fundamentals of Computer Networks Bill Nace

Border Gateway Protocol (an introduction) Karst Koymans. Tuesday, March 8, 2016

CS 640: Introduction to Computer Networks. Intra-domain routing. Inter-domain Routing: Hierarchy. Aditya Akella

Routing Basics ISP/IXP Workshops

Outline Computer Networking. Inter and Intra-Domain Routing. Internet s Area Hierarchy Routing hierarchy. Internet structure

Symbols. Numerics I N D E X

Implementing BGP. BGP Functional Overview. Border Gateway Protocol (BGP) is an Exterior Gateway Protocol (EGP) that allows you to create loop-free

Dynamics of Hot-Potato Routing in IP Networks

CertifyMe. CertifyMe

internet technologies and standards

Routing Basics. ISP Workshops

Lecture 16: Interdomain Routing. CSE 123: Computer Networks Stefan Savage

CSCD 433/533 Network Programming Fall Lecture 14 Global Address Space Autonomous Systems, BGP Protocol Routing

Configuring BGP. Cisco s BGP Implementation

CS 43: Computer Networks. 24: Internet Routing November 19, 2018

BGP Protocol & Configuration. Scalable Infrastructure Workshop AfNOG2008

Towards a Logic for Wide-Area Internet Routing

Configuring a Basic BGP Network

Important Lessons From Last Lecture Computer Networking. Outline. Routing Review. Routing hierarchy. Internet structure. External BGP (E-BGP)

BGP Commands: M through N

Routing Basics. Campus Network Design & Operations Workshop

Internet inter-as routing: BGP

ISP Border Definition. Alexander Azimov

IETF RFCs Supported by Cisco NX-OS Unicast Features Release 6.x

BGP Commands. Network Protocols Command Reference, Part 1 P1R-355

Configuring a Basic BGP Network

Inter-AS routing and BGP. Network Layer 4-1

Advanced Multihoming. BGP Traffic Engineering

Lecture 4: Intradomain Routing. CS 598: Advanced Internetworking Matthew Caesar February 1, 2011

TROUBLESHOOTING AND ADVANCED BGP

CS4450. Computer Networks: Architecture and Protocols. Lecture 15 BGP. Spring 2018 Rachit Agarwal

ISP and IXP Design. Point of Presence Topologies. ISP Network Design. PoP Topologies. Modular PoP Design. PoP Design INET 2000 NTW

Introduction to Routing

BGP. BGP Overview. Formats of BGP Messages. I. Header

Computer Networks ICS 651. IP Routing RIP OSPF BGP MPLS Internet Control Message Protocol IP Path MTU Discovery

BGP Multihoming ISP/IXP Workshops

CSE 473 Introduction to Computer Networks. Exam 2. Your name here: 11/7/2012

BGP Routing inside an AS

Advanced BGP using Route Reflectors

Routing Protocols. Autonomous System (AS)

CS118 Discussion Week 7. Taqi

CS 457 Networking and the Internet. The Global Internet (Then) The Global Internet (And Now) 10/4/16. Fall 2016

The Case for Separating Routing from Routers

ICS 351: Today's plan. OSPF BGP Routing in general routing protocol comparison encapsulation network dynamics

Real4Test. Real IT Certification Exam Study materials/braindumps

BGP Commands. Network Protocols Command Reference, Part 1 P1R-355

Configuration prerequisites 45 Configuring BGP community 45 Configuring a BGP route reflector 46 Configuring a BGP confederation 46 Configuring BGP

LOUP: The Principles and Practice of Intra-Domain Route Dissemination. Nikola Gvozdiev, Brad Karp, Mark Handley

COMP/ELEC 429 Introduction to Computer Networks

BGP. Inter-domain routing with the Border Gateway Protocol. Iljitsch van Beijnum Amsterdam, 13 & 16 March 2007

CS4700/CS5700 Fundamentals of Computer Networks

CS 43: Computer Networks Internet Routing. Kevin Webb Swarthmore College November 16, 2017

Table of Contents 1 BGP Configuration 1-1

Multiprotocol BGP (MBGP)

Shim6: Network Operator Concerns. Jason Schiller Senior Internet Network Engineer IP Core Infrastructure Engineering UUNET / MCI

Lecture 18: Border Gateway Protocol

Connecting to a Service Provider Using External BGP

BGP Scaling Techniques

Module 6 Implementing BGP

Scaling IGPs in ISP Networks. Philip Smith SANOG 8, Karachi 3rd August 2006

Interdomain routing CSCI 466: Networks Keith Vertanen Fall 2011

This appendix contains supplementary Border Gateway Protocol (BGP) information and covers the following topics:

Q&As. CCIP Configuring BGP on Cisco Routers (BGP) Pass Cisco Exam with 100% Guarantee

Inter-domain Routing. Outline. Border Gateway Protocol

Lecture 17: Border Gateway Protocol

Chapter 13 Configuring BGP4

Protecting an EBGP peer when memory usage reaches level 2 threshold 66 Configuring a large-scale BGP network 67 Configuring BGP community 67

Service Provider Multihoming

Transcription:

BGP Scalability

Introduction Talk about different configuration changes you can make to improve convergence No Cisco vs. other supplier data BGP can be confusing so don t hesitate to ask questions 2

Before we begin What does this graph show? Shows the number of peers we can converge in 10 minutes (y-axis) given a certain number of routes (x-axis) to advertise to those peers Example: We can advertise 100k routes to 50 peers with 12.0(12)S or 110 peers with 12.0(13)S 3

Old Improvements Peer Groups Advertising 100,000+ routes to hundreds of peers is a big challenge from a scalability point of view. BGP will need to send a few hundred megs of data in order to converge all peers Two part challenge Generating the hundreds of megs of data Advertising this data to BGP peers Peer-groups make it easier for BGP to advertise routes to large numbers of peers by addressing these two problems Using peer-groups will reduce BGP convergence times and make BGP much more scalable 4

Peer Groups UPDATE generation without peer-groups The BGP table is walked for every peer, prefixes are filtered through outbound policies, UPDATEs are generated and sent to this one peer UPDATE generation with peer-groups A peer-group leader is elected for each peer-group. The BGP table is walked for the leader only, prefixes are filtered through outbound policies, UPDATEs are generated and sent to the peer-group leader and replicated for peer-group members that are synchronized with the leader If we generate an update for the peer-group leader and replicate it to all peer-group members we are achieving 100% replication 5

Peer Groups A peer-group member is synchronized with the leader if all UPDATEs sent to the leader have also been sent to the peergroup member The more peer-group members stay in sync the more UPDATEs BGP can replicate. Replicating an UPDATE is much easier/faster than formatting an UPDATE. Formatting requires a table walk and policy evaluation, replication does not A peer-group member can fall out of sync for several reasons *Slow TCP throughput **Rush of TCP Acks fill input queues resulting in drops Peer is busy doing other tasks Peer has a slower CPU than the peer-group leader 6

Old Improvements A lot of customers still do not realize that peer-groups help convergence Peer-groups give between 35% - 50% increase in scalability 7

TCP window size/ Input queue depth interaction In a nutshell If a BGP speaker is pushing a full Internet table to a large number of peers, convergence is degraded due to enormous numbers of drops (100k+) on the interface input queue. Typical ISP gets ~½ million drops in 15 minutes on their typical route reflector. With the default interface input queue depth of 75, it takes us ~19 minutes to advertise 75k real world routes to 500 clients. The router drops ~225,000 packets (mostly TCP Acks) in this period. By using brute force and setting the interface input queue depth to 4096, it takes us ~10 minutes to send the same number of routes to the same number of clients. The router drops ~20,000 packets in this period 8

TCP window size/ Input queue depth interaction 9

Complicated Solution TCP window size/ Input queue depth interaction Not desirable to set the interface input queue to 4096 (DOS attacks, memory consumption, etc). The following paper describes how to accurately tune your TCP window-sizes and interface input queues: Complicated Solution In A Nutshell Make the input queues big enough to hold all of the TCP Acks that would be generated if all of your peers where to Ack their entire window size of data at the exact same time. The result is that BGP will converge much faster because we are no longer dropping tons of packet on the interface input queues. We also have the benefit of keeping our input queues at reasonable depths. Easy Solution Just set your Input queues or SPD (ext-)headroom to 1000 1500 1000 is deep enough for the # of routes/peers that we see on a heavily loaded box today 10

Larger Input Queues Rush of TCP Acks from peers can quickly fill the 75 spots in process level input queues Increasing queue depths (4096) improves BGP scalability 11

Larger Input Queues Why not change default input queue size? May happen someday but people are nervous CSCdu69558 has been filed for this issue Even with 4096 spots in the input queue we can still see drops given enough routes/peers Need to determine How big is too big in terms of how large an input queue can be before we are processing the same data multiple times 12

MTU Discovery Default MSS (Max Segment Size) is 536 bytes Inefficient for today s POS/Ethernet networks Using ip tcp path-mtu-discovery improves convergence 13

MTU Discovery and Larger Input Queues Simple config changes can give 3x improvement A large ISP convergence times drop from 1 hour to 22 minutes when they made these changes 14

UPDATE Packing Quick review on BGP UPDATEs An UPDATE contains: +-----------------------------------------------------+ Withdrawn Routes Length (2 octets) +-----------------------------------------------------+ Withdrawn Routes (variable) +-----------------------------------------------------+ Total Path Attribute Length (2 octets) +-----------------------------------------------------+ Path Attributes (variable) +-----------------------------------------------------+ Network Layer Reachability Information (variable) +-----------------------------------------------------+ At the top you list a combination of attributes (MED = 50, Local Pref = 200, etc) Then you list all of the NLRI (prefixes) that share this combination of attributes 15

Update Packing If your BGP tables contains 100k routes and 15k attribute combinations then you can advertise all the routes with 15k updates if you pack the prefixes 100% If it takes you 100k updates then you are achieving 0% update packing Convergence times vary greatly depending on the # of attribute combinations used in the table and on how well BGP packs updates Ideal Table Routem generated BGP table of 75k routes All paths have the same attribute combination Real Table 75k route feed from Digex ~12,000 different attribute combinations 16

Update Packing 17

Update Packing With the ideal table we are able to pack the maximum number of prefixes into each update because all prefixes share a common set of attributes. With the real world table we send updates that are not fully packed because we walk the table based on prefix but prefixes that are side by side may have different attributes. We can only walk the table for a finite amount of time before we have to release the CPU so we may not find all the NLRI for a give attribute combination before sending the updates we have built and suspending. With 500 RRCs the ideal table takes ~4 minutes to converge where a real world table takes ~19 minutes!! 18

UPDATE Packing CSCdt34187 introduces an update-cache that gives us: 100% update packing attribute distribution no longer makes a significant impact 100% peer-group replication no longer have to worry about peers staying in sync In a nutshell, it is amazing!! 19

UPDATE Packing 4x 6x improvement!! 20

UPDATE Packing 12.0(19)S + MTU discovery + Larger Input Queues = 14x improvement 21

UPDATE Packing Building an update-cache isn t all fun and games Requires tons of transient memory to build a cache and queue it to peers 12.0(21)S was deferred as a result 12.0(21)S1 has a lot of safety nets in place to prevent BGP from using too much memory 22

READ_ONLY Mode READ_ONLY Mode - If BGP is in READ_ONLY mode then BGP is only accepting routing updates and is not computing a best path nor advertising routes for any prefixes. When the BGP process starts (i.e. after a router reboot) BGP will go into READ_ONLY mode for a maximum of two minutes. RO mode forces a BGP speaker to be still for a few minutes giving his peers a chance to send their initial set of updates. The more routes/paths BGP has the more stable the network will be because we will avoid the scenario where BGP sends an update for a prefix and then learns about a better path for that prefix a few seconds later. If that happened then BGP sent two updates for a single prefix, which is very inefficient. READ_ONLY mode increases the chances of BGP learning about the bestpath for a prefix before sending out any advertisements for that prefix. BGP will transition from RO mode to RW mode once all of our peers have sent us their initial set of updates or the two-minute RO timer expires. READ_WRITE Mode - This is the normal mode of operation for BGP. While in READ_WRITE mode BGP will install routes in the routing table and will advertise those routes to his peers. 23

READ_ONLY Mode RO and RW modes were introduced via CSCdm56595 RO timer (120 seconds) started when BGP process started Never worked on GSR because it takes more than 120 seconds for linecards to boot, IGP to converge, etc 24

READ_ONLY Mode CSCds66429 corrects oversights made by CSCdm56595 RO timer now starts when the first peer comes up Linecard boot times and IGP convergence are accounted for automatically Will transition to RW mode when one of the following happens: All peers have sent us a KA All peers that were up within 60 seconds of the first peer have sent us a KA. This way we do not wait 120s for a peer that is mis-configured The 120s timer pops 25

CCIE 99 Session 1624 scsturge@cisco.com 1999, Cisco Systems, Inc. www.cisco.com 26

Input Queues Diagram of input queue With default values (Defaults can vary upon IOS-release): Input queue = 75 SPD headroom = 100 Extended headroom = 10 Input queue (hold queue) SPD headroom Extended headroom 0 ------------------- 75--------------- 175------------------ 185 IP, BGP, ISIS, OSPF, HDLC BGP, ISIS, OSPF, HDLC ISIS, OSPF, HDLC 27

CCIE 99 Session 1624 scsturge@cisco.com 1999, Cisco Systems, Inc. www.cisco.com 28