VXLAN EVPN Multihoming with Cisco Nexus 9000 Series Switches

Similar documents
Configuring VXLAN Multihoming

Configuring VXLAN EVPN Multi-Site

Configuring Virtual Port Channels

VXLAN Design with Cisco Nexus 9300 Platform Switches

Configuring Virtual Port Channels

Configuring Virtual Port Channels

Optimizing Layer 2 DCI with OTV between Multiple VXLAN EVPN Fabrics (Multifabric)

Introduction to External Connectivity

Configuring Virtual Port Channels

Configuring VXLAN EVPN Multi-Site

Configuring VXLAN EVPN Multi-Site

Implementing VXLAN. Prerequisites for implementing VXLANs. Information about Implementing VXLAN

Hierarchical Fabric Designs The Journey to Multisite. Lukas Krattiger Principal Engineer September 2017

Data Center Configuration. 1. Configuring VXLAN

IP Fabric Reference Architecture

Internet Engineering Task Force (IETF) Request for Comments: N. Bitar Nokia R. Shekhar. Juniper. J. Uttaro AT&T W. Henderickx Nokia March 2018

Internet Engineering Task Force (IETF) ISSN: A. Sajassi Cisco J. Uttaro AT&T May 2018

Ethernet VPN (EVPN) in Data Center

Solution Guide. Infrastructure as a Service: EVPN and VXLAN. Modified: Copyright 2016, Juniper Networks, Inc.

Ethernet VPN (EVPN) and Provider Backbone Bridging-EVPN: Next Generation Solutions for MPLS-based Ethernet Services. Introduction and Application Note

VXLAN Overview: Cisco Nexus 9000 Series Switches

VXLAN EVPN Multi-Site Design and Deployment

Traffic Load Balancing in EVPN/VXLAN Networks. Tech Note

Virtual Extensible LAN and Ethernet Virtual Private Network

Provisioning Overlay Networks

Configuring Rapid PVST+

Unicast Forwarding. Unicast. Unicast Forwarding Flows Overview. Intra Subnet Forwarding (Bridging) Unicast, on page 1

Overview. Overview. OTV Fundamentals. OTV Terms. This chapter provides an overview for Overlay Transport Virtualization (OTV) on Cisco NX-OS devices.

Spirent TestCenter EVPN and PBB-EVPN AppNote

Configuring Rapid PVST+ Using NX-OS

Contents. EVPN overview 1

Configuring Optional STP Features

Huawei CloudEngine Series. VXLAN Technology White Paper. Issue 06 Date HUAWEI TECHNOLOGIES CO., LTD.

BESS work on control planes for DC overlay networks A short overview

CCNA Semester 3 labs. Part 1 of 1 Labs for chapters 1 8

Configuring Rapid PVST+

Real4Test. Real IT Certification Exam Study materials/braindumps

Network Virtualization in IP Fabric with BGP EVPN

Cisco Nexus 7000 Series NX-OS VXLAN Configuration Guide

Configuring Port Channels

Intended status: Standards Track. Cisco Systems October 22, 2018

HPE FlexFabric 5940 Switch Series

Configuring STP and RSTP

Higher scalability to address more Layer 2 segments: up to 16 million VXLAN segments.

OTV Technology Introduction and Deployment Considerations

Configuring Port Channels

Border Provisioning Use Case in VXLAN BGP EVPN Fabrics - Multi-Site

Configuring Port Channels

EVPN Overview. Cloud and services virtualization. Remove protocols and network simplification. Integration of L2 and L3 services over the same VPN

Cloud Data Center Architecture Guide

Configuring STP Extensions Using Cisco NX-OS

Configuring Private VLANs Using NX-OS

EXTREME VALIDATED DESIGN. Network Virtualization in IP Fabric with BGP EVPN

Configuring VLANs. Understanding VLANs CHAPTER

DHCP Relay in VXLAN BGP EVPN

VLAN Configuration. Understanding VLANs CHAPTER

Configuring Virtual Private LAN Services

Configuring MPLS and EoMPLS

CCNA 3 (v v6.0) Chapter 3 Exam Answers % Full

Enterprise. Nexus 1000V. L2/L3 Fabric WAN/PE. Customer VRF. MPLS Backbone. Service Provider Data Center-1 Customer VRF WAN/PE OTV OTV.

Building Blocks in EVPN VXLAN for Multi-Service Fabrics. Aldrin Isaac Co-author RFC7432 Juniper Networks

Cisco Nexus 3000 Series Switch NX-OS Verified Scalability Guide, Release 7.x

Cisco EXAM Cisco ADVDESIGN. Buy Full Product.

Configuring VLANs. Understanding VLANs CHAPTER

Building Data Center Networks with VXLAN EVPN Overlays Part I

InterAS Option B. Information About InterAS. InterAS and ASBR

LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF

Implementing VXLAN in DataCenter

Configuring MST Using Cisco NX-OS

Exam Questions

Configuring STP and Prestandard IEEE 802.1s MST

EVPN Multicast. Disha Chopra

Pluribus Data Center Interconnect Validated

Feature Information for BGP Control Plane, page 1 BGP Control Plane Setup, page 1. Feature Information for BGP Control Plane

Segment Routing on Cisco Nexus 9500, 9300, 9200, 3200, and 3100 Platform Switches

Configuring PIM. Information About PIM. Send document comments to CHAPTER

Configuring VLANs. Understanding VLANs CHAPTER

Implementing IEEE 802.1ah Provider Backbone Bridge

Describing the STP. Enhancements to STP. Configuring PortFast. Describing PortFast. Configuring. Verifying

Nexus 7000 Peer Switch Configuration (Hybrid Setup)

Configuring VLANs. Understanding VLANs CHAPTER

VXLAN Multipod Design for Intra-Data Center and Geographically Dispersed Data Center Sites

Deploying LISP Host Mobility with an Extended Subnet

Configuring Virtual Private LAN Service (VPLS) and VPLS BGP-Based Autodiscovery

Contents. Introduction. Prerequisites. Requirements. Components Used

Implementing Multiple Spanning Tree Protocol

EVPN Command Reference

Configuring VLANs. Understanding VLANs CHAPTER

Buy full file at

Nexus 9000/3000 Graceful Insertion and Removal (GIR)

Configuring STP Extensions

Designing Mul+- Tenant Data Centers using EVPN- IRB. Neeraj Malhotra, Principal Engineer, Cisco Ahmed Abeer, Technical Marke<ng Engineer, Cisco

JN0-343 Q&As. Juniper Networks Certified Internet Specialist (JNCIS-ENT) Pass Juniper JN0-343 Exam with 100% Guarantee

Service Graph Design with Cisco Application Centric Infrastructure

Configure Multipoint Layer 2 Services

Configuring Spanning Tree Protocol

Virtual Hub & Spoke with BGP EVPNs

Configuring Spanning Tree Protocol

Configuring Optional STP Features

Best Practices come from YOU Cisco and/or its affiliates. All rights reserved.

Transcription:

White Paper VXLAN EVPN Multihoming with Cisco Nexus 9000 Series Switches 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 1 of 27

Contents Introduction... 3 Traditional vpc Multihoming... 3 BGP EVPN Multihoming... 3 BGP EVPN Multihoming... 3 Terminology... 3 EVPN Multihoming Redundancy Group... 3 Ethernet Segment Identifier... 4 LACP Bundling... 4 Layer 2 Gateway Spanning Tree Protocol... 6 Guidelines and Recommendations for L2G-STP... 10 EVPN Multihoming Feature Enablement... 11 EVPN Multihoming Implementation... 11 Sample Configuration... 12 EVPN Multihoming: Local Traffic Flows... 15 Locally Bridged Traffic... 15 Access Failure for Locally Bridged Traffic... 15 Core Failure for Locally Bridged Traffic... 16 Locally Routed Traffic... 17 Access Failure for Locally Routed Traffic... 18 Core Failure for Locally Routed Traffic... 18 EVPN Multihoming: Remote Traffic Flows... 19 Remote Bridged Traffic... 20 Ethernet Autodiscovery Route (Type 1) Per Ethernet Segment... 20 MAC-IP Route (Type 2)... 20 Access Failure for Remote Bridged Traffic... 21 Core Failure for Remote Bridged Traffic... 21 Remote Routed Traffic... 22 Access Failure for Remote Routed Traffic... 23 Core Failure for Remote Routed Traffic... 23 EVPN Multihoming Broadcast, Unicast, and Multicast Flows... 24 Designated Forwarder... 25 Split Horizon and Local Bias... 25 Ethernet Segment Route (Type 4)... 26 Designated-Forwarder Election and VLAN Carving... 26 Core and Site Failures for Broadcast, Unicast, and Multicast Traffic... 27 Conclusion... 27 For More Information... 27 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 2 of 27

Introduction Cisco Nexus platforms support multihoming based on virtual port channel (vpc) technology, in which a pair of switches acts as a single device for redundancy, and both switches function in active mode. For Cisco Nexus 9000 Series Switches in a Virtual Extensible LAN (VXLAN) Border Gateway Protocol (BGP) Ethernet VPN (EVPN) environment, two solutions can be used to support Layer 2 multihoming: traditional vpc (emulated or virtual IP addresses) and BGP EVPN techniques. Traditional vpc uses consistency checking, which is a mechanism used by two switches configured as a vpc pair to exchange configuration information and verify compatibility. BGP EVPN lacks this consistency-check mechanism. BGP EVPN instead relies on Link Aggregation Control Protocol (LACP), to detect any misconfiguration.this approach eliminates the multichassis EtherChannel trunk (MCT) link (traditionally used by vpc) and offers more flexibility than traditional vpc because each VXLAN tunnel endpoint (VTEP) can be part of one or more redundancy groups and can potentially support any number of VTEPs in a given group. Traditional vpc Multihoming A virtual or emulated IP address is used as the VTEP IP for vpc connected hosts. Both vpc peers share this emulated address when advertising MAC and IP host routes for hosts that are multihomed to the vpc peers. This solution requires a dedicated MCT link. This solution will not be discussed further in this document. BGP EVPN Multihoming With a BGP EVPN control plane, each switch can use its own local IP address as the VTEP IP address and still provide active-active redundancy. During certain failure scenarios, BGP EVPN based multihoming also provides fast convergence that otherwise cannot be achieved without a control protocol (data-plane flood and learn). BGP EVPN Multihoming This section provides an overview of BGP EVPN multihoming. Terminology The following terminology is used in discussing BGP EVPN multihoming: The EVPN instance (EVI) is represented by the virtual network identifier (VNI). The MAC Virtual Routing and Forwarding (MAC-VRF) instance is a container for housing the virtual forwarding table for MAC addresses. The unique route distinguisher and import and export target can be configured per MAC-VRF instance. The Ethernet segment (ES) is a set of bundled links. The Ethernet segment identifier (ESI) represents each Ethernet segment uniquely across the network. EVPN Multihoming Redundancy Group Figure 1 shows a dual-homed topology in which VTEPs L1 and L2 are distributed anycast VXLAN gateways performing integrated routing bridging (IRB). Host H2 is connected to an access switch that is dual-homed to both L1 and L2. 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 3 of 27

Figure 1. [EVPN Multihoming] The access switch is connected to L1 and L2 through a bundled pair of physical links. The switch is not aware that the bundle is configured on two different devices on the other side. However, both L1 and L2 must be aware that they are parts of the same bundle. Note that there is no MCT link between L1 and L2, and each VTEP can have multiple such bundle links shared with the same set of neighbors. To make the VTEPs (L1 and L2) aware that they belong to the same bundle link, Cisco NX- OS Software uses the ESI and the system MAC address configured for the interface (port [po]). Ethernet Segment Identifier EVPN introduces the concept of the Ethernet segment identifier, or ESI. Each VTEP is configured with a 10-byte ESI value for the bundled link that they share with the multihomed neighbor. This ESI value can be configured manually or derived automatically. LACP Bundling LACP can be turned on to detect ESI misconfigurations in the multihomed port-channel bundle. LACP sends the ESI-configured MAC address in the hello messages sent to the access switch. LACP is not mandated with ESI. A given ESI interface (port) shares the same ESI across the VTEPs in the group. As shown in Figure 2, the access switch receives the same configured MAC address from both VTEPs (L1 and L2), so it puts the bundled link in the up state. 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 4 of 27

Figure 2. [EVPN Multihoming and LACP] Because the Ethernet segment MAC address can be shared across all the Ethernet segments on the VTEP, LACP protocol data units (PDUs) use the Ethernet segment MAC address as the system MAC address, and the administrator key carries the ESI. The approach recommended here is to run LACP between VTEPs and access devices because LACP PDUs have a mechanism to detect and act on misconfigured ESIs. If a mismatch occurs on configured ESIs on the same port, LACP will bring down one of the links (the first link to come online will remain active). By default on most Cisco Nexus platforms, LACP sets a port to the suspended state if it does not receive an LACP PDU from the peer. This behavior is based on the lacp suspend- individual feature, which is enabled by default. This feature helps prevent loops created as a result of ESI configuration mismatch, so you should enable this command on port channels on access switches and servers. However, in some scenarios (such as power-on autoprovisioning [POAP] and NetBoot), it can cause servers to fail to boot because these servers require LACP to logically bring up the port. If you are using a static port channel and have mismatched ESIs, the MAC address is learned from both VTEP1 and VTEP2, and both will advertise the same MAC address but belonging to different ESIs. This behavior will trigger a MAC address move scenario, and eventually no traffic will be forwarded to that node for MAC addresses that are learned on both VTEP1 and VTEP2. 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 5 of 27

Layer 2 Gateway Spanning Tree Protocol This section provides an overview of Layer 2 Gateway Spanning Tree Protocol (L2G-STP). Figure 3 shows the topology. Figure 3. [EVPN Multihoming and L2G-STP] L2G-STP builds a loop-free tree topology. However, the Spanning Tree Protocol root must always (essentially) be in the VXLAN fabric. A bridge ID for Spanning Tree Protocol consists of a MAC address and bridge priority. When the system is running in the VXLAN fabric, the system automatically assigns the VTEPs with the MAC address c84c.75fa.6000 from a pool of reserved MAC addresses. As a result, each switch uses the same MAC address for the bridge ID emulating a single logical pseudo root. L2G-STP is disabled by default on EVPN ESI multihoming VLANs. You will need to explicitly issue a command to enable it (on all VTEPs). Table 1 lists the commands for configuring L2G-STP. Table 1. L2G-STP Configuration Commands Description Enable Spanning Tree Protocol mode. Enable default domain 0. Block loops only with pseudo- root emulation, with no tunneling of Bridge Protocol Data Units (BPDUs). Explicit domain ID is needed to tunnel encoded BPDUs to the core and processes received from the core. Command spanning-tree mode <rapid-pvst, mst> spanning-tree domain enable spanning-tree domain 1 Configure Spanning Tree Protocol priority. spanning-tree mst <id> priority 8192 spanning-tree vlan <id> priority 8192 Disable L2G-STP on a VTEP. spanning-tree domain disable 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 6 of 27

All L2G-STP VLANs should be set to a lower spanning-tree priority than the customer-edge (CE) topology to help ensure that the VTEP is the spanning-tree root for this VLAN. If the access switches have a higher priority, you can set the L2G-STP priority to 0 to retain the L2G- STP root in the VXLAN fabric. A configuration example is shown here. TOR9-leaf4# show spanning-tree summary Switch is in mst mode (IEEE Standard) Root bridge for: MST0000 L2 Gateway STP bridge for: MST0000 L2 Gateway Domain ID: 1 Port Type Default is disable Edge Port [PortFast] BPDU Guard Default is disabled Edge Port [PortFast] BPDU Filter Default is disabled Bridge Assurance is enabled Loopguard Default is disabled Pathcost method used is long PVST Simulation is enabled STP-Lite is disabled Name Blocking Listening Learning Forwarding STP Active ---------------------- -------- --------- -------- ---------- ---------- MST0000 0 0 0 12 12 ---------------------- -------- --------- -------- ---------- ---------- 1 mst 0 0 0 12 12 TOR9-leaf4# show spanning-tree vlan 1001 MST0000 Spanning tree enabled protocol mstp Root ID Priority 8192 Address c84c.75fa.6001- L2G-STP reserved mac+ domain id This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Bridge ID Priority 8192 (priority 8192 sys-id-ext 0) Address c84c.75fa.6001 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 7 of 27

This output shows that the spanning-tree priority is set to 8192 (the default is 32768). Spanning-tree priority is set in multiples of 4096. The priority for individual instances is calculated as Priority + Instance_ID. In this case, the priority is calculated as 8192 + 0 = 8192. With L2G-STP, access ports (VTEP ports connected to access switches) have root guard enabled implicitly. If a superior BPDU is received on an edge port of a VTEP, the port is placed in the Layer 2 gateway inconsistent state until the condition is cleared as shown here: 2016 Aug 29 19:14:19 TOR9-leaf4 %$ VDC-1 %$ %STP-2-L2GW_BACKBONE_BLOCK: L2 Gateway Backbone port inconsistency blocking port Ethernet1/1 on MST0000. 2016 Aug 29 19:14:19 TOR9-leaf4 %$ VDC-1 %$ %STP-2-L2GW_BACKBONE_BLOCK: L2 Gateway Backbone port inconsistency blocking port port-channel13 on MST0000. TOR9-leaf4# show spanning-tree MST0000 Spanning tree enabled protocol mstp Root ID Priority 8192 Address c84c.75fa.6001 This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Bridge ID Priority 8192 (priority 8192 sys-id-ext 0) Address c84c.75fa.6001 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Interface Role Sts Cost Prio.Nbr Type ---------------- ---- --- --------- -------- -------------------------------- Po1 Desg FWD 20000 128.4096 Edge P2p Po2 Desg FWD 20000 128.4097 Edge P2p Po3 Desg FWD 20000 128.4098 Edge P2p Po12 Desg BKN*2000 128.4107 P2p *L2GW_Inc Po13 Desg BKN*1000 128.4108 P2p *L2GW_Inc Eth1/1 Desg BKN*2000 128.1 P2p *L2GW_Inc To disable L2G-STP on a VTEP, issue this command: spanning-tree domain disable This command will disable L2G-STP on all EVPN ESI multihomed VLANs. The bridge MAC address will be restored to the system MAC address, and the VTEP may not necessarily be the root. In the following case, the access switch has assumed the root role because L2G-STP is disabled. 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 8 of 27

TOR9-leaf4(config)# spanning-tree domain disable TOR9-leaf4# show spanning-tree summary Switch is in mst mode (IEEE Standard) Root bridge for: none L2 Gateway STP is disabled Port Type Default is disable Edge Port [PortFast] BPDU Guard Default is disabled Edge Port [PortFast] BPDU Filter Default is disabled Bridge Assurance is enabled Loopguard Default is disabled Pathcost method used is long PVST Simulation is enabled STP-Lite is disabled Name Blocking Listening Learning Forwarding STP Active ---------------------- -------- --------- -------- ---------- ---------- MST0000 4 0 0 8 12 ---------------------- -------- --------- -------- ---------- ---------- 1 mst 4 0 0 8 12 TOR9-leaf4# show spanning-tree vlan 1001 MST0000 Spanning tree enabled protocol mstp Root ID Priority 4096 Address 00c8.8ba6.5073 Cost 0 Port 4108 (port-channel13) 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 9 of 27

Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Bridge ID Priority 8192 (priority 8192 sys-id-ext 0) Address 5897.bd1d.db95 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec With L2G-STP, access ports on VTEPs cannot be in an edge port, because they would behave like normal spanning-tree ports, receiving BPDUs from access switches. In that case, access ports on VTEPs would lose the advantage of rapid transmission, instead forwarding on Ethernet segment link flap (they will have to go through a proposal and agreement handshake before assuming the FWD-Desg role). Guidelines and Recommendations for L2G-STP Note the following points when using L2G-STP: With L2G-STP enabled, the VXLAN fabric (all VTEPs) emulates a single pseudo-root switch for the customer access switches. With L2G-STP, root guard is inherently enabled by default on all access ports. All access ports from VTEPs connecting to customer access switches will be in the Desg FWD state by default. All ports on customer access switches connecting to VTEPs are either in the root-port FWD or Altn BLK state. Root guard will be activated if better or superior spanning-tree information is received from customer access switches. This process will put the ports in BLK L2GW_Inc state to secure the root on the VXLAN fabric and prevent a loop. Explicit domain ID configuration is needed to enable spanning-tree BPDU tunneling across the fabric. As a best practice, you should configure all VTEPs with the lowest spanning-tree priority of all switches in the spanning-tree domain to which they are attached. By setting all the VTEPs as the root bridge, you make the entire VXLAN fabric appear to be one virtual bridge. ESI interfaces should not be enabled in spanning-tree edge mode to allow L2G-STP to run across the VTEP and access layer. You can continue to use ESIs or orphans (single-homed hosts) in spanning-tree edge mode if they directly connect to hosts or servers that do not run Spanning Tree Protocol and are end hosts. Configure all VTEPs that are connected by a common customer access layer in the same L2G-STP domain: ideally, all VTEPs on the fabric on which the hosts reside and to which the hosts can move. The L2G-STP domain scope is global, and all ESIs on a given VTEP can participate in only one domain. Mappings between Multiple Spanning Tree (MST) instances and VLANs must be consistent across the VTEPs in a given L2G-STP domain. Non-L2G-STP-enabled VTEPs cannot be directly connected to L2G-STP-enabled VTEPs. Doing so will result in conflicts and disputes because the non-l2g-stp VTEP will keep sending BPDUs and can steer the root outside. 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 10 of 27

EVPN Multihoming Feature Enablement NX-OS allows either vpc-based EVPN multihoming or ESI-based EVPN multihoming on a VTEP; both features can t be enabled together. ESI-based multihoming is enabled by entering evpn esi multihoming in the commandline interface (CLI). The evpn esi multihoming feature enables Ethernet segment configuration and generation of Ethernet-segment routes (type 1 and type 2) on the VTEPs. The processing of Ethernet-segment routes with valid ESIs and path-list resolution are not tied to the evpn esi multihoming feature. If the VTEP receives MAC and MAC-IP routes with valid ESIs and the feature is not enabled, the Ethernet-segment-based path resolution logic still will apply to these remote routes. This behavior is required for interoperability between vpc-enabled VTEPs and ESI-enabled VTEPs. EVPN Multihoming Implementation The EVPN overlay draft specifies adaptations to the BGP Multiprotocol Label Switching (MPLS) based EVPN solution so that it can be applied as a network virtualization overlay with VXLAN encapsulation. The role of the provider-edge (PE) node described in BGP MPLS EVPN is equivalent to the role of the VTEP and network virtualization edge device (NVE), in which VTEPs use control-plane learning and distribution through BGP for remote addresses instead of data-plane learning. Five route types are currently defined: Type 1: Ethernet autodiscovery (EAD) route Type 2: MAC and MAC-IP route advertisements Type 3: Inclusive multicast route Type 4: Ethernet segment route Type 5: IP prefix route BGP EVPN running on NX-OS would use route type 2 to advertise MAC and IP (host) address information, route type 3 to carry VTEP information specifically for ingress replication, and route type 5 to advertise IPv4 and IPv6 prefixes in the network layer reachability information (NLRI) with no MAC addresses in the route key. With the introduction of EVPN multihoming, NX-OS can use the EAD route, in which the ESI and the Ethernet tag ID are considered to be part of the prefix in the NLRI. Because endpoint reachability is learned through the BGP control plane, the network convergence time is a function of the number of MAC and IP routes that must be withdrawn by the VTEP in the event of a failure. To manage such a condition, each VTEP advertises a set of one or more EAD routes per Ethernet segment route for each locally attached Ethernet segment. If a failure condition occurs in the attached segment, the VTEP withdraws the corresponding set of EAD routes per Ethernet segment routes. The Ethernet segment route is another route type used by NX-OS with EVPN multihoming. This type is used mainly for designated-forwarder election for broadcast, unknown unicast, and multicast (BUM) traffic. If the Ethernet segment is multihomed, the presence of multiple designated forwarders could result in forwarding loops in addition to potential packet duplication. Therefore, the Ethernet segment route (type 4) is used to elect the designated forwarder as well as to apply split-horizon filtering. All VTEPs and provider-edge nodes that are configured with an Ethernet segment originate this route. To summarize, these are the new implementation concepts for EVPN multihoming: 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 11 of 27

An EAD route per Ethernet segment, also referred to as a Type-1 route, is used to converge the traffic faster during an access-failure scenario. This route has an Ethernet tag of 0xFFFFFFFF. (Refer to the section EVPN Multihoming: Local Traffic Flows. ) An EAD route per EVI, also referred to as Type-1 route, is used for creating aliases and performing load balancing when traffic hashes to only one of the VTEPs. This route cannot have an Ethernet tag value of 0xFFFFFF to differentiate it from EAD route per Ethernet segment. The Ethernet segment route, also referred to as a Type-4 route, is used for designated forwarder election for BUM traffic. (Refer to the section EVPN Multihoming: Remote Traffic Flows. ) Aliasing is used to load-balance traffic to all the connected VTEPs for a given Ethernet segment using the Type-1 EAD route per EVI. Aliasing is performed regardless of the VTEP in which the hosts are actually learned. Mass withdrawal is used for fast convergence during access failure scenarios using the Type-1 EAD route per Ethernet segment. (Refer to the section EVPN Multihoming: Local Traffic Flows. ) Designated forwarder election is used to prevent forwarding loops and duplicates. Only a single VTEP is allowed to decapsulate and forward traffic for a given Ethernet segment. (Refer to the section EVPN Multihoming: Remote Traffic Flows. ) A split horizon is used to prevent forwarding loops and duplicates for BUM traffic. Only the BUM traffic originating from a remote site is allowed to be forwarded to a local site. (Refer to the section EVPN Multihoming: Remote Traffic Flows. ) Implementation items for a future release include the following: The ESI label extended community is used to signal the redundancy mode: single-active or active-active mode. Sample Configuration Table 2 outlines a sample configuration for enabling EVPN multihoming. Table 2. Configuration to Enable EVPN Multihoming Description Enable multihoming globally. This setting enables the EVPN multihoming feature. Enable BGP maximum paths. This setting enables equal-cost multipath (ECMP) for host routes. Otherwise, host routes will have only 1 VTEP as the next hop. Enable core links. This setting tracks uplink interfaces to the core. If all uplinks are down, local Ethernet segment based ports will be shut down or suspended. This setting is used mainly to avoid black-holing south-to-north traffic when no uplinks are available. Configure the Ethernet segment. The ethernet-segment value is the local ESI. The ESI must match on VTEPs on which the port is multihomed. The ESI should be unique per port. The system-mac value is the local system MAC ID. It must match on VTEPs on which the port is multihomed. The system MAC address can be shared across multiple ports. Configure ternary content-addressable memory (TCAM). The hardware accesslist tcam region vpc-convergence setting is used to configure split-horizon access control lists (ACLs) in hardware. This setting helps prevent BUM traffic duplication on shared Ethernet segment ports. Command evpn esi multihoming maximum-paths ibgp x maximum-paths x evpn multihoming core-tracking interface port-channel x ethernet-segment <es-id> system-mac <es-system-mac> hardware access-list tcam region vpc-convergence 256 hardware access-list tcam region arp-ether 256 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 12 of 27

Sample Configuration Figure 4 shows a sample configuration. Figure 4. [EVPN Multihoming Overview and Sample Configuration] Here is the configuration for top-of-rack switch 1 (TOR1): evpn esi multihoming router bgp 1001 address-family l2vpn evpn maximum-paths ibgp 2 interface Ethernet2/1 no switchport evpn multihoming core-tracking mtu 9216 ip address 10.1.1.1/30 ip pim sparse-mode no shutdown interface Ethernet2/2 no switchport evpn multihoming core-tracking mtu 9216 ip address 10.1.1.5/30 ip pim sparse-mode no shutdown 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 13 of 27

interface port-channel11 switchport mode trunk switchport access vlan 1001 switchport trunk allowed vlan 901-902,1001-1050 ethernet-segment 2011 system-mac 0000.0000.2011 mtu 9216 Here is the configuration for TOR2: evpn esi multihoming router bgp 1001 address-family l2vpn evpn maximum-paths ibgp 2 interface Ethernet2/1 no switchport evpn multihoming core-tracking mtu 9216 ip address 10.1.1.2/30 ip pim sparse-mode no shutdown interface Ethernet2/2 no switchport evpn multihoming core-tracking mtu 9216 ip address 10.1.1.6/30 ip pim sparse-mode no shutdown interface port-channel11 switchport mode trunk switchport access vlan 1001 switchport trunk allowed vlan 901-902,1001-1050 ethernet-segment 2011 system-mac 0000.0000.2011 mtu 9216 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 14 of 27

EVPN Multihoming: Local Traffic Flows All VTEPs that are a part of the same redundancy group (as defined by the ESI) act as a single VTEP device with respect to the host access switch. However, there is no MCT link present to bridge and route traffic for local access. Locally Bridged Traffic As shown in Figure 5, host H2 is dual-homed, whereas hosts H1 and H3 are single-homed (such hosts are also known as orphans). Traffic is bridged locally from H1 to H2 through L1. However, if a packet needs to be bridged between orphans H1 and H3, the packet must be bridged through the VXLAN overlay. Figure 5. [EVPN Multihoming and Local Bridging] Access Failure for Locally Bridged Traffic As illustrated in Figure 6, if the ESI link at L1 fails, there is no path for bridged traffic to reach from H1 to H2 except through the overlay. Therefore, the local bridged traffic will take a suboptimal path, similar to the H1-to-H3 orphan flow. Note that when such a condition occurs, the MAC address table entry for H2 changes from a local route pointing to a port-channel interface to a remote overlay route pointing to a peer ID of L2. This change is sent through the system by BGP. 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 15 of 27

Figure 6. [EVPN Multihoming Bridging and Local ESI Link Failure] Core Failure for Locally Bridged Traffic If VTEP L1 becomes isolated from the core, it must not continue to attract access traffic, because it will not be able to encapsulate and send the traffic on the overlay. Thus, the access links must be brought down at L1 if L1 loses core reachability (Figure 7). In this scenario, orphan H1 will lose all connectivity to both remote and locally attached hosts because there is no dedicated MCT link. Figure 7. [EVPN Multihoming Bridging and Core Link Failure] 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 16 of 27

Locally Routed Traffic Consider a scenario in which H1, H2, and H3 are in different subnets and L1 and L2 are distributed anycast gateways (Figure 8). Any packet that is routed from H1 to H2 will be directly sent from L1 through native routing. However, host H3 is not a locally attached adjacency, unlike in the vpc case, in which the Address Resolution Protocol (ARP) entry would have been synchronized with L1 as a locally attached adjacency. Instead, H3 appears as a remote host in the IP address table at L1, installed in the context of L3 VNI. This packet must be encapsulated in the router MAC address of L2 and routed to L2 through the VXLAN overlay. Therefore, traffic routing from H1 to H3 occurs exactly the same way traffic routing between truly remote hosts in different subnets. Figure 8. [EVPN Multihoming and Local Routed Traffic] 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 17 of 27

Access Failure for Locally Routed Traffic If the ESI link at VTEP L1 fails, there is no path for routed traffic to reach from H1 to H2 except through the overlay. Therefore, the local routed traffic takes a suboptimal path, much like the H1-to-H3 orphan flow (Figure 9). Figure 9. [EVPN Multihoming Routing and Local ESI Link Failure] Core Failure for Locally Routed Traffic If VTEP L1 becomes isolated from the core, it must not continue to attract access traffic, because it will not be able to encapsulate and send it on the overlay. Thus, the access links must be brought down at L1 if L1 loses core reachability (Figure 10). In this scenario, orphan H1 will lose all connectivity to both remote and locally attached hosts because there is no dedicated MCT link. 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 18 of 27

Figure 10. [EVPN Multihoming Routing and Core Link Failure] EVPN Multihoming: Remote Traffic Flows Consider a remote VTEP L3 that is sending bridged and routed traffic to the multihomed complex consisting of VTEPs L1 and L2. Because there is no virtual or emulated IP address representing this multihomed complex, L3 must apply ECMP at the source for both bridged and routed traffic (Figure 11). This section describes how ECMP is used at L3 for both bridged and routed cases and how the system responds to core and access failures. Figure 11. [EVPN Multihoming and Remote Bridging] 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 19 of 27

Remote Bridged Traffic In Figure 11, remote host H5 wants to bridge traffic to host H2, which is sitting behind the EVPN multihomed complex (L1 and L2). Host H2 will build an ECMP list in accordance with the rules defined in RFC 7432. In the MAC address table at L3, the MAC address entry for H2 points to an ECMP path list consisting of IP-L1 and IP-L2. Any bridged traffic going from H5 to H2 will be VXLAN encapsulated and load balanced to L1 and L2. When the ECMP list is created, the following constructs need to be kept in mind: Mass withdrawal: Failures that cause path-list correction should be independent of the scale of MAC addresses. Aliasing: Path-list insertions may be independent of the scale of MAC addresses (depending on support for optional routes). The following sections present the main constructs needed to create this MAC address ECMP path list. Ethernet Autodiscovery Route (Type 1) Per Ethernet Segment EVPN defines a mechanism to efficiently and quickly signal the need to update the forwarding tables after a failure in connectivity to an Ethernet segment. To accomplish this, each provider-edge node advertises a set of one or more EAD routes per Ethernet segment for each locally attached Ethernet segment (Table 3). Table 3. Ethernet Autodiscovery Route (Route Type 1) per Ethernet Segment NLRI Route type Ethernet segment (Type 1) Route distinguisher Router-ID: Segment-ID (VNI << 8) ESI <Type: 1B><MAC: 6B><LD: 3B> Ethernet tag MAX-ET MPLS label 0 ATTRS ESI label extended community Single Active = False ESI Label = 0 Next hop Route target NVE loopback IP address Subset of list of route targets of MAC-VRF instances associated with all the EVIs active on the Ethernet segment MAC-IP Route (Type 2) The MAC and IP address route remains the same as that used in the current vpc multihoming and standalone single-homing solutions. However, now it will have a non-zero ESI field, which indicates that this is a multihomed host and so is a candidate for ECMP path resolution (Table 4). Table 4. MAC and IP Address Route (Route Type 2) NLRI Route type MAC and IP address route (Type 2) Route distinguisher Route distinguisher of MAC-VRF instance associated with the host ESI <Type: 1B><MAC: 6B><LD: 3B> Ethernet tag MAC address IP address Labels MAX-ET MAC address of the host IP address of the host L2 VNI associated with the MAC-VRF instance L3 VNI associated with the L3-VRF instance 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 20 of 27

ATTRS Next hop Loopback of NVE Route target Export the route target configured under the MAC-VRF instance or L3-VRF instance associated with the host Access Failure for Remote Bridged Traffic Failure of the ESI links will result in mass withdrawal. The EAD and Ethernet segment route will be withdrawn, leading the remote device to remove the VTEP from the ECMP list for the given Ethernet segment (Figure 12). Figure 12. [EVPN Multihoming Remote Bridging and ESI Link Failure] Core Failure for Remote Bridged Traffic If VTEP L1 becomes isolated from the core, it must not continue to attract access traffic, because it will not be able to encapsulate and send it on the overlay. Thus, the access links must be brought down at L1 if L1 loses core reachability (Figure 13). 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 21 of 27

Figure 13. [EVPN Multihoming Remote Bridging and Core Link Failure] Remote Routed Traffic In Figure 14, L3 is a Layer 3 VXLAN gateway, and H5 and H2 belonging to different subnets. In this case, any intersubnet traffic from L3 to L1 and L2 will be routed at L3, which is a distributed anycast gateway. Both L1 and L2 will advertise the MAC and IP address route for host H2. As a result of receiving these routes, L3 will build an L3 ECMP list consisting of L1 and L2. Figure 14. [EVPN Multihoming Remote Routing] 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 22 of 27

Access Failure for Remote Routed Traffic If the access link pointing to ES1 goes down on L1, the mass-withdrawal route will be sent in the form of EAD and Ethernet segment routes. This transmission will cause L3 to remove L1 from the MAC address ECMP path list, causing the intrasubnet (L2) traffic to converge quickly. L1 will now treat H2 as a remote route reachable through the VXLAN overlay because it is no longer directly connected through the ESI link. This behavior will cause traffic destined for H2 to take a suboptimal path from L3 to L1 to L2, as shown in Figure 15. Figure 15. [EVPN Multihoming Remote Routing and ESI Link Failure] Intersubnet traffic from H5 to H2 will follow this path: 1. The packet will be sent by H5 to the gateway at L3. 2. L3 will perform symmetric IRB and route the packet to L1 through the VXLAN overlay. 3. L1 will decapsulate the packet and perform an inner IP address lookup for H2. 4. H2 is a remote route, so L1 will route the packet to L2 through the VXLAN overlay. 5. L2 will decapsulate the packet and perform an IP address lookup and then route the packet directly to the attached switch virtual interface (SVI). Thus, routing will occur three times: once each at L3, L1, and L2. This suboptimal behavior will continue until the type-2 route is withdrawn for L1 by BGP. Core Failure for Remote Routed Traffic Core failure for remote routed traffic elicits the same behavior as core failure for remote bridged traffic. (Please refer to the section Core Failure for Remote Bridged Traffic. ) The underlay routing protocol will withdraw L1 s loopback reachability from all remote VTEPs, so L1 will be removed from both the MAC ECMP and IP ECMP lists everywhere (Figure 16). 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 23 of 27

Figure 16. [EVPN Multihoming Remote Routing and Core Link Failure] EVPN Multihoming Broadcast, Unicast, and Multicast Flows NX-OS supports a multicast core in the underlay with ESI. Consider BUM traffic originating from H5. The BUM packets will be encapsulated in the multicast group mapped to the VNI. Because both L1 and L2 have joined the shared tree ( *, G) for the underlay group based on the L2 VNI mapping, each will receive a copy of the BUM traffic (Figure 17). Figure 17. [EVPN Multihoming and BUM Flood Traffic] 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 24 of 27

Designated Forwarder Only one of the VTEPs in the redundancy group should decapsulate and forward BUM traffic over the ESI links. For this purpose, a unique designated forwarder is elected per Ethernet segment. The role of the designated forwarder is to decapsulate BUM traffic originating in the remote segments and forward it to the destination local segment for which the device is the designated forwarder. Note these points about designated-forwarder election: Designated-forwarder election is per Ethernet segment and VLAN. A given VLAN can have a different designated-forwarder for ES1 and ES2. Figure 17 shows L2 and the designated forwarder for both ES1 and ES2 on a given VLAN. Designated-forwarder election applies only to BUM traffic on the receiving side for decapsulation. Every VTEP must decapsulate BUM traffic to forward it to single-homed (or orphan) links. Duplication of the designated-forwarder role will lead to duplicate packets or loops in a dual-homed network, so the designated forwarder must be unique per ES and VLAN. Split Horizon and Local Bias Consider BUM traffic originating from H2. Assume that this traffic is hashed at L1. L1 will encapsulate this traffic in the overlay multicast group and send the packet to the core. All VTEPs that have joined this multicast group with the same L2 VNI will receive this packet. Additionally, L1 will locally replicate the BUM packet on all directly connected orphan and ESI ports. For example, if the BUM packet originated at ES1, L1 will locally replicate the packet to ES2 and the orphan ports. This technique for replicating packets to all the locally attached links is called local bias (Figure 18). Figure 18. [EVPN Multihoming Split Horizon and Local Bias] 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 25 of 27

Remote VTEPs will decapsulate the packet and forward it to their ESI and orphan links based on the designed forwarder state. However, this packet will also be received at L2, which belongs to the same redundancy group as the originating VTEP L1. L2 must decapsulate the packet to send it to orphan ports. However, even though L2 is the designated forwarder for ES1, L2 must not forward this packet to the ES1 link. This packet was received from a peer that shares ES1 with L1, and L1 would have used local bias, and you do not want duplicate copies to be received on ES2. Therefore, L2 (the designated forwarder) will apply a split-horizon filter for the L1 IP address on ES1 and ES2 that it shares with L1. This filter is applied in the context of a VLAN. Ethernet Segment Route (Type 4) The Ethernet segment route is used to elect the designated forwarder and to apply split-horizon filtering. All the VTEPs that are configured with an Ethernet segment originate this route. The Ethernet segment route is exported and imported when ESI is locally configured under the port channel (Table 5). Table 5. Ethernet Segment Route (Route Type 4) NLRI Route type Ethernet segment (Type 4) Route distinguisher Router-ID: Base + Port channel number ESI <Type : 1B><MAC : 6B><LD : 3B> ATTRS Originator IP address Ethernet segment import route target NVE loopback IP address 6-byte MAC address derived from ESI Designated-Forwarder Election and VLAN Carving Upon configuration of the ESI, both L1 and L2 will advertise the Ethernet segment route. The ESI MAC address is common between L1 and L2 and unique in the network. Hence, only L1 and L2 will import each other s Ethernet segment routes (Figure 19). Figure 19. [EVPN Multihoming and DF Election] 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 26 of 27

Core and Site Failures for Broadcast, Unicast, and Multicast Traffic If the access link pertaining to ES1 fails at L1, L1 will withdraw the Ethernet segment route for ES1. This withdrawal will trigger recomputation of the designated forwarder. Because L2 will be the only VTEP left in the ordinal table, it will take over the designated-forwarder role for all VLANs. Conclusion BGP EVPN multihoming on Cisco Nexus 9000 Series Switches requires little operational and cabling cost. It offers provisioning simplicity, flow-based load balancing, multipathing, and fail-safe redundancy. For More Information VXLAN: A framework for overlaying Virtualized Layer 2 Networks over Layer 3 Networks -- RFC 7348 https://tools.ietf.org/html/rfc7348 BGP MPLS based EVPN -- RFC 7432 https://tools.ietf.org/html/rfc7432 Requirements for Ethernet VPN (EVPN) https://tools.ietf.org/html/rfc7209 Printed in USA C11-738489-02 08/17 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 27 of 27