Help! BRKRST Cisco and/or its affiliates. All rights reserved. Cisco Public 2

Similar documents
Cisco recommends that you have basic knowledge of Performance Routing (PfR).

PfRv3 Zero SLA Support

IWAN Under the Hood - Next Gen Performance Routing and DMVPN. David Prall, Communication Architect CCIE 6508 (R&S/SP/Security)

Performance Routing Version 3 Configuration Guide

Intelligent WAN Multiple Data Center Deployment Guide

PfRv3 Path of Last Resort

PfRv3 Inter-DC Optimization

Intelligent WAN : CVU update

Intelligent WAN 2.0 Traffic Independent Design and Intelligent Path Selection

Chapter 4 Lab 4-2, Redistribution Between EIGRP and OSPF

Intelligent WAN Multiple VRFs Deployment Guide

Chapter H through R. loss (PfR), page 28. load-balance, page 23 local (PfR), page 24 logging (PfR), page 26

Implementing Next Generation Performance Routing PfRv3

Cisco recommends that you have basic knowledge of Performance Routing (PfR).

Návrh inteligentní WAN sítě

Chapter 5: Maintaining and Troubleshooting Routing Solutions

ARCHIVED DOCUMENT. - The topics in the document are now covered by more recent content.

Cisco IOS Performance Routing Version 3 Command Reference

Performance Routing Version 3 Commands

Intelligent WAN Deployment Guide

scope scope {global vrf vrf-name} no scope {global vrf vrf-name} Syntax Description

Intelligent WAN (IWAN) Design and Deployment

Intelligent WAN High Availability and Scalability Deployment Guide

RealCiscoLAB.com. Chapter 2 Lab 2-2, EIGRP Load Balancing. Topology. Objectives. Background. CCNPv6 ROUTE

IWAN APIC-EM Application Cisco Intelligent WAN

This document describes how to perform datapath packet tracing for Cisco IOS -XE software via the Packet Trace feature.

Configuration and Management of Networks

Performing Path Traces

FlexVPN HA Dual Hub Configuration Example

set active-probe (PfR)

Chapter 2 Lab 2-1, EIGRP Configuration, Bandwidth, and Adjacencies

Configure IOS XR Traffic Controller (XTC)

Chapter 4 Lab 4-1, Redistribution Between EIGRP and OSPF. Topology. Objectives. CCNPv7 ROUTE

Ch. 5 Maintaining and Troubleshooting Routing Solutions. Net412- Network troubleshooting

Lab 5-3 Redistribution Between EIGRP and IS-IS

Serviceability of SD-WAN

Chapter 1 Lab 1-1, Basic RIPng and Default Gateway Configuration

Chapter 5 Lab 5-1, Configure and Verify Path Control Using PBR. Topology. Objectives. Background. Required Resources. CCNPv7 ROUTE

WAN Edge MPLSoL2 Service

Chapter 2: Configuring the Enhanced Interior Gateway Routing Protocol

CIS 83 LAB 3 - EIGRP Rich Simms September 23, Objective. Scenario. Topology

Lab- Troubleshooting Basic EIGRP for 1Pv4

Section 6. Implementing EIGRP ICND2

Deploying Performance Routing

Cisco Service Advertisement Framework Deployment Guide

Technology Overview. Overview CHAPTER

Segment Routing On Demand Next Hop for L3/L3VPN

Cisco Intelligent WAN

Cisco Performance Routing

DNA SA Border Node Support

The Radio Aware Routing feature offers the following benefits: Provides faster network convergence through immediate recognition of changes.

IP Enhanced IGRP Commands

Lab 2-3 Summarization and Default Network Advertisement

Shortcut Switching Enhancements for NHRP in DMVPN Networks

Configuring Basic Performance Routing

Troubleshooting Routing Solutions

Interchassis Asymmetric Routing Support for Zone-Based Firewall and NAT

Deploying and Administering Cisco s Digital Network Architecture (DNA) and Intelligent WAN (IWAN) (DNADDC)

CCIE Routing and Switching (v5.0)

CCNP ROUTE Workbook - EIGRP

Intelligent WAN NetFlow Monitoring Deployment Guide

IOS Routing Internals

Symbols. Numerics I N D E X

MPLS VPN--Inter-AS Option AB

CVP Enterprise Cisco SD-WAN Retail Profile (Hybrid WAN, Segmentation, Zone-Based Firewall, Quality of Service, and Centralized Policies)

Chapter 4 Lab 4-2, Controlling Routing Updates. Topology. Objectives. CCNPv7 ROUTE

Network as an Enforcer (NaaE) Cisco Services. Network as an Enforcer Cisco and/or its affiliates. All rights reserved.

CCNA 3 (v v6.0) Chapter 7 Exam Answers % Full

LABRST-2099 iwan Deployment using NSO

BGP Inbound Optimization Using Performance Routing

CCNP ROUTE 6.0 Student Lab Manual

Intelligent WAN Architecture Enabling the Digital Branch

Intelligent WAN Design Summary

Performance Routing (PfR) Master Controller Redundancy Configuration

EIGRP. About EIGRP. CLI Book 1: Cisco ASA Series General Operations CLI Configuration Guide, 9.7 1

Optimized Edge Routing Configuration Guide, Cisco IOS Release 15.1MT

Routing with a distance vector protocol - EIGRP

Configuring Virtual Private LAN Service (VPLS) and VPLS BGP-Based Autodiscovery

MPLS VPN Inter-AS Option AB

LAB14: Named EIGRP IPv4

SD-WAN Deployment Guide (CVD)

Setting Up OER Network Components

EIGRP Dynamic Metric Calculations

ECMP Load Balancing. MPLS: Layer 3 VPNs Configuration Guide, Cisco IOS XE Release 3S (Cisco ASR 900 Series) 1

Deploying IWAN Routers

Configuring Cisco Mediatrace

Enterprise SD-WAN Financial Profile (Hybrid WAN, Segmentation, Quality of Service, Centralized Policies)

Introduction. Lab Diagram

LiveAction IWAN Management

Topology & EIGRP Invocation. Router# show ipv6 protocols. Router# show ipv6 eigrp neighbors [ detail fa0/0 ]

Enhancements in EIGRP

Internetwork Expert s CCNP Bootcamp. Enhanced Interior Gateway Routing Protocol (EIGRP) What is EIGRP? Enhanced Interior Gateway Routing Protocol

Highly Available Wide Area Network Design

IOS Routing Internals

Configuring EIGRP. Overview CHAPTER

mpls ldp atm vc-merge through mpls static binding ipv4

Advanced Networking: Routing & Switching 2 Chapter 7

VRF, MPLS and MP-BGP Fundamentals

Medianet: An Architectural Approach for Optimal Video Collaboration

Cisco SD-WAN (Viptela) Migration, QoS and Advanced Policies Hands-on Lab

Transcription:

Help! 2

Understanding and Troubleshooting Intelligent Path Control in IWAN Brandon Lynch Network Engineer, Core Software Group Richard Furr Technical Leader, Technical Services

Agenda Introduction PfRv3 Operational State Learning a Traffic-Class Traffic-Classes Gone Bad Load-Balancing Takeaways

Intelligent Path Control The Beginning Optimized Edge Routing (OER) was first introduced to bring in intelligent path control with monitoring via IP SLA. Performance Routing v2 (PfRv2) was later introduced to simplify configuration and automate IP SLA responder discovery. Policies were locally configured. Monitoring was done via NetFlow. Scalability was improved to better accommodate large enterprises. 5

Intelligent Path Control PfRv3 Built-in site discovery and centralized policy distribution from Master Controller Intelligent monitoring and reachability measurement via Smart Probes Passive monitoring done via Performance Monitor Fast failure detection Application recognition with NBAR and VRF-aware One-touch provisioning with PnP / APIC-EM / IWAN App 6

Purpose of the Session PfRv3 Master (Controller)! Reinforce understanding of core PfRv3 terminology Demonstrate how each component works together to create intelligent path control and traffic optimization Present proven troubleshooting methodology illustrated through real-world examples Make YOU successful! 7

Foundational Terms: Enterprise Prefix Designates a supernet that typically covers all smaller subnets across the enterprise Used to define the enterprise domain for PfR control Typically defined on classful boundaries (ex. 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) Hub DC INET This list also plays an important role in the decisionmaking process to load-balance traffic when this feature is enabled. Asheville Raleigh 8

Foundational Terms: Site Prefix Defines a subnet that exists at an IWAN-enabled site that is eligible for traffic control Dynamically learned by default (can be statically configured) Published via EIGRP SAF to the Hub MC for distribution to all sites Hub DC INET Asheville Raleigh 9

Foundational Terms: Channel Flow designation consisting of a destination site-id, DSCP, and PfR label Created upon the formation of trafficclasses (if not previously active) or upon receipt of a probe from a remote site to provide path measurement Generated per path for path-preference in a policy Indicates if traffic is active from local site to remote site (Initiated and open) or only from remote site to local site (Discovered and open) Asheville Hub DC INET Raleigh 10

Foundational Terms: Smart Probe Synthetic RTP frame generated by PfR for performance measurement (packet loss, delay, jitter) and reachability Sent with a source IP of the local site- ID and a destination IP of the remote site-id Source/Destination Port = 18000/19000 Marked with the DSCP associated with the channel on which the probe is sent Intercepted by a BR in the data-plane and measured by Performance Monitor Probing rates vary based on IWAN version. Asheville Hub DC INET Raleigh 11

How PfRv3 Probes On channels with data traffic present (i.e. the channel/path is in use), the default probing rate is 1 packet every 10 seconds. On channels with no data traffic (i.e. backup path channels), PfRv3 synthetically generates RTP frames at 20 packets per second (default). Higher probing rates on backup paths allow PfR to obtain an accurate measurement of the path to determine viability for placing traffic in the event of a primary path failure. Zero-SLA is a feature that provides the ability to only probe on the default DSCP. All path measurements against the default DSCP are then applied to all other active DSCP s on that same path. Path-of-Last-Resort mutes probing on all channels of this path when the path is in Standby mode. When Active, probes are only sent on the default DSCP at 1 packet every 10 seconds. 12

Foundational Terms: Traffic-Class Combination of a destination prefix (site-prefix or internet prefix) and DSCP Matched to a class in the configured policy or falls into the default policy if load-balance is configured Utilizes a channel per path based on class configuration to measure performance and make path-selection decisions Shows current path used and recent path changes (if any) as well as associated reasons Asheville Hub DC INET Raleigh 13

Additional Breakout Sessions and Labs BRKCRS-2000: Intelligent WAN (IWAN) Architecture BRKRST-2362: IWAN Implementing Performance Routing (PfRv3) BRKCRS-2007: Migrating Your Existing WAN to Cisco s IWAN BRKRST-3413: IWAN Serviceability: Deploying, Monitoring, and Operating BRKRST-2043: IWAN AVC/QoS Design BRKRST-2557: IWAN and NFV Orchestration for Managed Service Providers BRKNMS-1040: IWAN and AVC Management using Cisco Prime Infrastructure and APIC-EM BRKCRS-2002: IWAN Design and Deployment Workshop LABRST-2400: Packet Capturing Tools in Routing Environments 14

IWAN @ Cisco Live US 2017 Recommended path for IWAN see additional slides 1. IWAN design and architecture 2. Migrating to IWAN 3. Operation & Troubleshooting 4. Deep Dive into individual IWAN building blocks 5. Customer Case Study & Panel (PNLCRS-2005) 15

IWAN @ Cisco Live US 2017 for Enterprise Customers Tue, Jun 27, 1:30 p.m. Wed, Jun 28, 8:00 a.m. BRKCRS-2007 Tue, Jun 27, 1:30 p.m. BRKCRS-2000 Mon, Jun 26, 1:30 p.m. TECCRS-2004 Sun, Jun 25, 8:00 a.m. - 5:00 p.m. BRKRST-3413 IWAN Serviceability: Deploying, Monitoring, and Operating Understanding and Troubleshooting Intelligent Path Control in IWAN Migrating Your Existing WAN to Cisco's IWAN Breakout Session IWAN Architecture 8 hours Seminar Implementing IWAN LTRRST-3019 Design, Deploy, and Operate IWAN Mon. and Wed. LTRCRS-2005 Building and Migrating to IWAN Wed, Jun 28, 8:00 a.m. PNLCRS-2005 IWAN Panel Thu, Jun 29, 10:30 a.m. CCSRST-2000 IWAN Case Study Tue, Jun 27, 8:00 a.m. Refer to Session Catalog for more: 37 hits for IWAN 16

IWAN @ Cisco Live US 2017 for Service Providers Tue, Jun 27, 1:30 p.m. PSOSPG-2003 Wed, Jun 28, 3:30 p.m. - 4:30 p.m. BRKRST-2557 Tue, Jun 27, 1:30 p.m. - 3:30 p.m. BRKCRS-2000 Mon, Jun 26, 1:30 p.m. - 3:30 p.m. TECCRS-2004 Sun, Jun 25, 8:00 a.m. - 5:00 p.m. BRKRST-3413 IWAN Serviceability: Deploying, Monitoring, and Operating Cisco SD WAN for Service Providers IWAN and NFV Orchestration for Managed Service Providers Breakout Session IWAN Architecture 8 hours Seminar Implementing IWAN CCSSP-1000 Cisco + Verizon: VMS Success Story Wed, Jun 28, 9:30 a.m. - 10:30 a.m. LTRRST-3019 Design, Deploy, and Operate an IWAN Network Mon. and Wed. PNLCRS-2005 IWAN Panel Thu, Jun 29, 10:30 a.m. Refer to Session Catalog for more: 37 hits for IWAN 17

IWAN at Cisco Live Las Vegas 2017 Use different event types to learn more about IWAN Full day Technical Seminar on Sunday 2 IWAN seminars (8 hours) Presentations: 15 IWAN Breakout Sessions - 90 or 120 Min. 2 Customer Success Story presentations - Huntington Bank, Verizon Hands-on labs: 5 IWAN Instructor-led labs (4 hours) 3 IWAN Self-placed walk-in labs (45 Min.) Meet the IWAN Business Unit Experts MTE - Meet the Engineer technical meetings (max. 1 hour) Whisper Suites management level meetings 18

IWAN Breakout sessions Understanding IWAN Design, Architecture and Building Blocks: BRKCRS-2000 IWAN Architecture BRKRST-2043 IWAN AVC/QoS Design BRKSEC-4054 Advanced Concepts of DMVPN BRKRST-2362 IWAN - Implementing Performance Routing (PfRv3) BRKCRS-2002 IWAN Design and Deployment Workshop IWAN Migration, Operation and Troubleshooting: BRKCRS-2007 Migrating Your Existing WAN to Cisco's IWAN Understanding and Troubleshooting Intelligent Path Control in IWAN BRKRST-3413 IWAN Serviceability: Deploying, Monitoring, and Operating BRKNMS-1040 IWAN and AVC Management using Cisco Prime Infrastructure and APIC-EM IWAN for Service Providers: BRKRST-2557 IWAN and NFV Orchestration for Managed Service Providers PSOSPG-2003 Cisco SD WAN for Service Providers 19

IWAN Labs Instructor-led (4 hours): LTRCRS-2005 Building and Migrating to Cisco's Intelligent WAN (IWAN) LTRRST-3015 Advanced IWAN PfR w/qos Hands on Lab LTRRST-3019 Design, Deploy, and Operate an Intelligent WAN (IWAN) Network Walk-in Self-Placed (45 Min): LABSDN-2005 Introduction to iwan LABRST-2013 DMVPN overlay routing for IWAN deployments LABSDN-2910 iwan deployment with APIC-EM 20

IWAN Case Studies Enterprise: CCSRST-2000 IWAN Migration Case Study (Huntington Bank) Service Provider: CCSSP-1000 Cisco + Verizon: Virtualized Managed Services (VMS) Success Story 21

Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the Cisco Live Mobile App 2. Click Join the Discussion 3. Install Spark or go directly to the space 4. Enter messages/questions in the space Cisco Spark spaces will be available until July 3, 2017. cs.co/ciscolivebot#

PfRv3 Operational State Interface Discovery

Interface Discovery Background Branches learn paths based on Smart Probes sent from s. The sets the Discovery Probe flag in the Smart Probe, allowing the branch BR to discovery the interface after intercepting the probe in the data path. The probes identify path color (name), POP ID (DC ID), path ID, Zero-SLA support, and Path of Last Resort (PLR) support. While interface discovery typically happens on the default DSCP, any DSCP can trigger discovery. The Discovery Probe flag is set for each probe regardless of DSCP. This ensures that if there is unreachability on the default DSCP while other DSCPs are forwarding properly, we don t lose the path in PfR. 24

Interface Discovery - How It Works Branch is defined with the IP. Branch initiates EIGRP SAF Hello messages towards the. EIGRP SAF adjacency is formed and the discovers the Branch MC as a new discovered-site in PfRv3. The syncs to the s to update the discovered-site database. Once the learns the new site-id, the local RIB is consulted in order to identify a parent-route in order to reach the site-id. After a parent-route has been programmed, smart probes are generated for the default DSCP towards the new branch s site-id with the next-hop identified from the RIB. The Discovery Probe flag is set to 1, identifying this as a interface discovery smart probe. The Branch BR intercepts the smart probe and exports this to the Branch MC, effectively learning the new path. 25

Single Branch/Hub Topology Tunnel10 INET Tunnel20 (10.10.3.254) 26

Interface Discovery Who is the Hub? Tunnel10 domain iwan vrf default border source-interface Loopback0 master local INET master branch source-interface Loopback0 hub 10.10.100.254 Tunnel20 (10.10.3.254) Try to build an EIGRP SAF adjacency with the hub! 27

Interface Discovery Initiate EIGRP SAF EIGRP SAF Hello received from 10.10.3.254! This site wants to be my neighbor! Tunnel10 ISR4K_MC_Branch2#show ip route 10.10.100.254 Routing entry for 10.10.100.254/32 Known via "eigrp 10", distance 90, metric 5376640 Tag 101, type internal Redistributing via eigrp 10 Last update from 172.17.10.1 on Tunnel10, 11:35:00 ago Routing Descriptor Blocks: * 172.17.10.1, from 172.17.10.1, 11:35:00 ago, via Tunnel10 Route metric is 5376640, traffic share count is 1 Total delay is 10001 INET microseconds, Tunnel20 minimum bandwidth is 20000 Kbit Reliability 255/255, minimum MTU 1400 bytes Loading 1/255, Hops 1 Route tag 101 (10.10.3.254) 28

Interface Discovery SAF Adjacency Formed Transit Tunnel10 Different Branch Site DC1_HUB_MC#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq INET (sec) Hub (ms) BR Cnt Num 1 10.10.3.254 Lo0 581 23:35:26 6 100 0 2938 4 10.10.200.254 Lo0 581 1w0d 1 100 0 5274 3 10.10.100.253 Lo0 597 1w0d 1 100 0 1 0 10.10.4.254 Lo0 581 1w0d 1 100 0 8895... INET Tunnel20 ISR4K_MC_Branch1#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 0 10.10.100.254 Lo0 586 17:17:14 1 100 0 4949 NOTE: EIGRP SAF adjacencies will also form for the following: (1) LAN-LAN on multi-router branches (2) Loopback-Loopback between MC and BR s on multi-router branches (10.10.3.254) 29

Interface Discovery Sync Discovered-Site DB Sync new site to hub BR databases! DC1 BR#show domain iwan border site-capability... Tunnel10 Site id : 10.10.3.254 ----------------------------------------------------------- Capability Major Minor ----------------------------------------------------------- Domain 2 0 ----------------------------------------------------------- Zero-SLA 1 0 ----------------------------------------------------------- Mul-Hop 1 0 ----------------------------------------------------------- INET (10.10.2.254) Tunnel20 30

Interface Discovery Sync Discovered-Site DB Sync new site to hub BR databases! DC1_HUB_MC#show domain iwan master discovered-sites *** Domain MC DISCOVERED sites *** Number of sites: 5 *Traffic classes [Performance based][load-balance based]... DC1 BR#show Site ID: 10.10.3.254 domain iwan border site-capability... Site Discovered:00:01:19 Tunnel10 ago Site DSCP id ::default[0]-number 10.10.2.254 of traffic classes[0][0] ----------------------------------------------------------- DSCP :cs1[8]-number of traffic classes[0][0] DSCP Capability :af11[10]-number of traffic Major classes[0][0] Minor ----------------------------------------------------------- DSCP :af12[12]-number traffic classes[0][0] DSCP Domain :af13[14]-number of traffic 2 classes[0][0] 0 ----------------------------------------------------------- DSCP :af22[20]-number traffic classes[0][0] DSCP Zero-SLA :af23[22]-number of traffic 1 classes[0][0] 0 ----------------------------------------------------------- DSCP :ef[46]-number of traffic classes[0][0] Site Mul-Hop Traffic Classes: 0 1 0 ----------------------------------------------------------- INET (10.10.2.254) Tunnel20 31

Interface Discovery Sync Discovered-Site DB Sync new site to hub BR databases! DC1_HUB_MC#show domain iwan master discovered-sites *** Domain MC DISCOVERED sites *** Number of sites: 5 *Traffic classes [Performance based][load-balance DC1_INET_BR#show domain based] iwan border site-capability...... DC1 BR#show Site ID: 10.10.3.254 domain iwan border site-capability Site id : 10.10.3.254... Site Discovered:00:01:19 Tunnel10 ago ----------------------------------------------------------- INET Tunnel20 Site DSCP id ::default[0]-number 10.10.3.254 of traffic classes[0][0] Capability Major Minor ----------------------------------------------------------- DSCP :cs1[8]-number of traffic ----------------------------------------------------------- classes[0][0] DSCP Capability :af11[10]-number of traffic Major classes[0][0] Minor Domain 2 0 ----------------------------------------------------------- DSCP :af12[12]-number traffic ----------------------------------------------------------- classes[0][0] DSCP Domain :af13[14]-number of traffic 2 classes[0][0] 0 Zero-SLA 1 0 ----------------------------------------------------------- DSCP :af22[20]-number traffic ----------------------------------------------------------- classes[0][0] DSCP Zero-SLA :af23[22]-number of traffic 1 classes[0][0] 0 Mul-Hop 1 0 ----------------------------------------------------------- DSCP :ef[46]-number of traffic ----------------------------------------------------------- classes[0][0] Site Mul-Hop Traffic Classes: 0 1 0 ----------------------------------------------------------- (10.10.3.254) 32

Interface Discovery Send Probes to Branch All Border Routers (both hub and branch) should learn and prefer routes for destination site-id s out of their tunnel interfaces. DC1_INET_BR#show ip route 10.10.3.254 DC1 BR#show ip route 10.10.3.254 Routing entry for 10.10.2.254/32 Routing entry Tunnel10 for 10.10.2.254/32 Known via "eigrp 10", INET distance 90, metric Tunnel20 138250880, type internal Known via "eigrp 10", distance 90, metric 5125760, type internal Redistributing via eigrp 10 Redistributing via eigrp 10 Last update from 172.17.20.3 on Tunnel20, 00:00:27 ago Last update from 172.17.10.3 on Tunnel10, 1d15h ago Overrides from "PfR" Overrides from "PfR" Routing Descriptor Blocks: Routing Descriptor Blocks: * 172.17.20.3, from 172.17.20.3, 00:00:27 ago, via Tunnel20 * 172.17.10.3, from 172.17.10.3, 1d15h ago, via Tunnel10 Route metric is 138250880, traffic share count is 1 Route metric is 5125760, traffic share count is 1 Total delay is 270001 microseconds, minimum bandwidth is 500000 Kbit Total delay is 10001 microseconds, minimum bandwidth is 1000000 Kbit Reliability 255/255, minimum MTU 1400 bytes Reliability 255/255, minimum MTU 1400 bytes Loading 1/255, Hops 2 Loading 1/255, Hops 1 (10.10.3.254) 33

Interface Discovery Send Probes to Branch DC1 BR#show domain iwan border channels dscp default... Channel id: 86 Version : 3 Site id : 10.10.3.254 DSCP : default[0] Service provider : Pfr-Label : 0:0 0:1 [0x1] DC1 BR#show domain iwan border parent-route Channel state : Initiated and open Border Parent Route Details: Channel next hop : 172.17.10.3 RX Reachability : Reachable Prot: EIGRP, Network: 10.10.3.254/32, Gateway: 172.17.10.3, Interface: Tunnel10, TX Reachability Ref count: 1: Reachable Prot: EIGRP, Network: Tunnel10 10.10.2.254/32, Gateway: 172.17.10.4, Interface: Tunnel10, Supports Ref count: Zero-SLA INET 2 : Yes Prot: EIGRP, Network: 10.10.4.254/32, Gateway: 172.17.10.5, Interface: Tunnel10, Muted Ref by count: Zero-SLA 1 : No (10.10.3.254) Tunnel20 Muted by Path of Last Resort : No Number of Probes sent : 257596 Number of Probes received : 117792 Number of SMP Profile Bursts sent: 146085 Number of Active Channel Probes sent: 14738 Number of Reachability Probes sent: 105846 Number of Force Unreaches sent: 0 Last Probe sent : 110 msec Ago Last Probe received: 2282 msec ago Number of Data Packets sent : 54945 Number of Data Packets received : 0 Smart Probe in Burst: No Smart Probe enable Burst: Yes RX Reachability indicates we have received probes on the channel. TX Reachability shows that we have a next-hop learned from the RIB. 34

Interface Discovery Send Probes to Branch Tunnel10 Channel id: 283840 Version : 3 Site id : 10.10.100.254 DSCP : default[0] Service provider : INET Pfr-Label : 0:2 0:0 [0x20000] Channel state : Discovered and open Channel next hop : 172.17.20.1 RX Reachability : Reachable TX Reachability : Reachable Supports Zero-SLA : Yes Muted by Zero-SLA : No Muted by Path of Last Resort : No Number of Probes sent : 1133 Number of Probes received : 1104 Number of SMP Profile Bursts sent: 626 Number of Active Channel Probes sent: 61 INET Tunnel20 Number of Reachability Probes sent: 449 Number of Force Unreaches sent: 0 ISR4K_MC_Branch2#show domain iwan border parent-route Last Probe sent : 568 msec Ago Border Parent Route Details: Last Probe received: 99 msec ago Number of Data Packets sent : 0 Prot: EIGRP, Network: 10.10.100.254/32, Gateway: 172.17.20.1, Interface: Tunnel20, Ref count: 2 Number of Data Packets received : 0 Prot: EIGRP, Network: 10.10.100.254/32, Gateway: 172.17.10.1, Interface: Tunnel10, Ref count: 2 Smart Probe in Burst: No Smart Probe enable Burst: Yes (10.10.2.254) 35

Interface Discovery Send Probes to Branch All Border Routers (both hub and branch) should learn and prefer routes for destination site-id s out of their tunnel interfaces. Channel id: 283840 Version : 3 Site id : 10.10.100.254 DSCP : default[0] Service provider : INET ISR4K_MC_Branch2#show domain iwan master status beg Borders: DC1 BR#show Pfr-Label domain iwan : 0:2 border 0:0 channels [0x20000] dscp default Borders:... Channel state : Discovered and open IP address: 10.10.3.254 Channel id: 86 Channel next hop : 172.17.20.1 Version: 2 Version : 3 RX Hub Reachability BR : Reachable Connection status: CONNECTED (Last Updated 3w0d ago ) Site id : 10.10.3.254 TX Reachability : Reachable Interfaces configured: DSCP : default[0] Supports Zero-SLA : Yes Name: Tunnel20 type: external Service Provider: INET Status: Service UP provider Zero-SLA: Muted : NO by Zero-SLA Path of : Last No Resort: Disabled Number of default Channels: 0 Pfr-Label : 0:0 Muted 0:1 by Path [0x1] of Last Resort : No DC1 BR#show domain iwan border parent-route Channel state Number : Initiated of Probes and sent open : 1133 Path-id list: 0:2 1:2 Border Parent Route Details: Channel next Number hop : 172.17.10.3 of Probes received RX : Reachability 1104 indicates we DC1_INET_BR#show RX Reachability ip route Number : 10.10.3.254 Reachable of SMP Profile Bursts sent: 626 Name: Tunnel10 type: external Service Provider: Prot: DC1 BR#show EIGRP, Network: ip route 10.10.3.254/32, Gateway: 172.17.10.3, Interface: Routing Status: Tunnel10, entry TX Reachability UP for Zero-SLA: Ref 10.10.2.254/32 count: Number 1: Reachable NO of Active Path of Channel Last have Resort: Probes received Disabled sent: probes 61 on the Number of default Channels: Prot: Routing EIGRP, entry Network: Tunnel10 for 10.10.2.254/32, 0 Gateway: 172.17.10.4, Interface: Known Tunnel10, via Supports "eigrp Ref count: Zero-SLA 10", INET Number distance 2 : of YesReachability 90, metric Tunnel20 138250880, channel. Probes sent: TX type Reachability 449 internal shows Prot: Known EIGRP, via "eigrp Network: 10", 10.10.4.254/32, distance 90, metric Gateway: 5125760, 172.17.10.5, type internal Interface: Redistributing Tunnel10, Muted Ref by via count: Zero-SLA eigrp Number 1 10 : of No Force Unreaches ISR4K_MC_Branch2#show domain iwan border parent-route that sent: we have 0 a next-hop learned Path-id list: 0:1 1:0 Redistributing via eigrp 10 Last update Muted from by Path 172.17.20.3 Last of Last Probe on Resort sent Tunnel20, : 568 No 00:00:27 msec ago Border Parent Route Details: from Ago the RIB. Last update from 172.17.10.3 on Tunnel10, 1d15h ago Overrides Number from of "PfR" Probes Last sent Probe : received: 257596 99 msec ago Tunnel if: Tunnel0 Overrides from "PfR" Routing Number Descriptor of Probes Blocks: Number received of Data : Packets 117792 sent : 0 Prot: EIGRP, Network: 10.10.100.254/32, Gateway: 172.17.20.1, Interface: Tunnel20, Ref count: 2 Routing Descriptor Blocks: * 172.17.20.3, Number of from SMP 172.17.20.3, Number Profile Bursts Data 00:00:27 Packets sent: ago, 146085 received via Tunnel20 : 0 Prot: EIGRP, Network: 10.10.100.254/32, Gateway: 172.17.10.1, Interface: Tunnel10, Ref count: 2 * 172.17.10.3, from 172.17.10.3, 1d15h ago, via Tunnel10 Route Number metric of is Active 138250880, Smart Channel Probe traffic in Probes Burst: share sent: No count 14738is 1 Route metric is 5125760, traffic share count is 1 Total Number delay of is Reachability 270001 Smart microseconds, Probe Probes enable sent: Burst: minimum 105846 Yes bandwidth is 500000 Kbit Total delay is 10001 microseconds, minimum bandwidth is 1000000 Kbit Reliability Number of 255/255, Force Unreaches minimum MTU sent: 14000 bytes Reliability 255/255, minimum MTU 1400 bytes Loading Last 1/255, Probe sent Hops : 2 110 msec Ago Loading 1/255, Hops 1 Last Probe received: 282 msec ago Number Branch of Data MC/BR Packets sent : 54945 Number of Data Packets received : 0 Smart (10.10.3.254) Probe in Burst: No Smart Probe enable Burst: Yes 36

Troubleshooting Interface Discovery ISR4K_MC_Branch2#show domain iwan master status *** Domain MC Status *** Master VRF: Global Tunnel10 No interfaces have been discovered on the branch, indicating that either (a) no smart probes have been received or (b) no smart probes have been processed from the hub BR s on the configured tunnels. Instance Type: Branch Instance id: 0 Operational status: Up Configured status: Up Loopback IP Address: 10.10.3.254 Load Balancing: Operational Status: Down <OUTPUT OMITTED> INET (10.10.3.254) Tunnel20 Borders: IP address: 10.10.3.254 Version: 2 Connection status: CONNECTED (Last Updated 00:09:42 ago ) Interfaces configured: 37

Troubleshooting Interface Discovery Tunnel10 ISR4K_MC_Branch2#show run sec domain domain iwan vrf default border source-interface Loopback0 master local master branch source-interface Loopback0 hub 10.10.100.254 INET Tunnel20 The correct hub MC IP is configured. (10.10.3.254) 38

Troubleshooting Interface Discovery The tunnel is up and the route to the hub MC IP is learned via this interface. Tunnel10 ISR4K_MC_Branch2#show run sec domain domain iwan vrf default border source-interface Loopback0 master local master branch source-interface Loopback0 hub 10.10.100.254 INET ISR4K_MC_Branch2#show ip route 10.10.100.254 Routing entry for 10.10.100.254/32 Known via "eigrp 10", distance 90, metric 5376640, type internal Redistributing via nhrp, eigrp 10 Last update from 172.17.10.1 on Tunnel10, 00:21:04 ago Routing Descriptor Blocks: Tunnel20 * 172.17.10.1, from 172.17.10.1, 00:21:04 ago, via Tunnel10 Route metric is 5376640, traffic share count is 1 Total delay is 10001 microseconds, minimum bandwidth is 20000 Kbit The Reliability correct hub 255/255, MC minimum IP is configured. MTU 1400 bytes Loading 1/255, Hops 1 (10.10.3.254) 39

Troubleshooting Interface Discovery ISR4K_MC_Branch2#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 0 10.10.100.254 Lo0 545 00:15:43 1 100 0 2636 The tunnel is up and the route to the hub MC IP is learned via this interface. Tunnel10 ISR4K_MC_Branch2#show run sec domain domain iwan vrf default border source-interface Loopback0 master local master branch source-interface Loopback0 hub 10.10.100.254 The SAF adjacency to the hub MC is up and functional. ISR4K_MC_Branch2#show ip route 10.10.100.254 Routing entry for 10.10.100.254/32 Known via "eigrp 10", distance 90, metric 5376640, type internal Redistributing via nhrp, eigrp 10 Last update from 172.17.10.1 on Tunnel10, 00:21:04 ago We INET need to check Tunnel20 the hub site! Routing Descriptor Blocks: * 172.17.10.1, from 172.17.10.1, 00:21:04 ago, via Tunnel10 Route metric is 5376640, traffic share count is 1 Total delay is 10001 microseconds, minimum bandwidth is 20000 Kbit The Reliability correct hub 255/255, MC minimum IP is configured. MTU 1400 bytes Loading 1/255, Hops 1 (10.10.3.254) 40

Troubleshooting Interface Discovery DC1 BR#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 3 10.10.3.254 Lo0 558 00:16:37 1 100 0 0 1 10.10.100.253 Lo0 529 00:16:20 1 100 0 2 2 10.10.4.254 Lo0 587 4d18h 1 100 0 511 0 10.10.2.254 Lo0 500 1w0d 1 100 0 784 The corresponding branch adjacency is also active on the hub MC and shows no problems. Tunnel10 INET Tunnel20 (10.10.3.254) 41

Troubleshooting Interface Discovery DC1 BR#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq Tunnel10 (sec) (ms) Cnt Num 3 10.10.3.254 Lo0 558 00:16:37 1 100 0 0 1 10.10.100.253 Lo0 529 00:16:20 1 100 0 2 2 DC1_HUB_MC#show 10.10.4.254 ip route 10.10.3.254 Lo0 587 4d18h 1 100 0 511 0 Routing 10.10.2.254 entry for 10.0.0.0/8 Lo0 500 1w0d 1 100 0 784 Known via "eigrp 10", distance 90, metric 2848, type internal Redistributing via eigrp 10 Last update from 10.10.100.6 on GigabitEthernet0/3, 00:00:06 ago Routing Descriptor Blocks: 10.10.100.6, from 10.10.100.6, 00:00:06 ago, via GigabitEthernet0/3 Route metric is 2848, traffic share count is 1 Total delay is 11 microseconds, minimum bandwidth is 1000000 Kbit Reliability 255/255, minimum MTU 1500 bytes Loading 1/255, Hops 1 * 10.10.100.2, from 10.10.100.2, 00:00:06 ago, via GigabitEthernet0/3 Route metric is 2848, traffic share count is 1 Total delay is 11 microseconds, minimum bandwidth is 1000000 Kbit Reliability 255/255, minimum MTU 1500 bytes Loading 1/255, Hops 1 The corresponding branch adjacency is also active on the hub MC and shows no problems. INET (10.10.3.254) The has a route pointing to the LAN interface of each Hub BR to reach the spoke s site-id. Let s Tunnel20 check the MC channels. 42

Troubleshooting Interface Discovery DC1_HUB_MC#show domain iwan master channels dst-site-id 10.10.3.254 Legend: * (Value obtained from Network delay:) Tunnel10 Channels are Available but unreachable. Why? Channel Id: 35 Dst Site-Id: 10.10.3.254 Link Name: INET DSCP: default [0] pfr-label: 0:0 0:2 [0x2] TCs: 0 BackupTCs: 0 Channel Created: 00:10:21 Hub ago BR Provisional State: Initiated and open Operational state: Available but unreachable <OUTPUT OMITTED> Latest TCA Bucket Last Updated : 00:00:13 ago Local unreachable TCA received(check for stale TCA 00:00:05 later) --------- INET Local unreachable TCA received on the MC. What does this mean? Tunnel20 Channel Id: 36 Dst Site-Id: 10.10.3.254 Link Name: DSCP: default [0] pfr-label: 0:0 0:1 [0x1] TCs: 0 BackupTCs: 0 Channel Created: 00:14:07 ago Provisional State: Initiated and open Operational state: Available but unreachable <OUTPUT OMITTED> Latest TCA Bucket Last Updated : 00:00:25 ago Local unreachable TCA received(check for stale TCA 00:00:05 later) (10.10.3.254) Check the border channels! 43

Troubleshooting Interface Discovery The last probe received on the path was over 20 minutes ago. DC1 BR#show domain iwan border channels dst-site-id 10.10.3.254 DC1_INET_BR#show domain iwan border channels dst-site-id 10.10.3.254 Border Smart Probe Stats: Border Smart Probe Stats: Channel id: 36 Version : 3 Site id : Tunnel10 10.10.3.254 DSCP : default[0] Service provider : Pfr-Label : 0:0 0:1 [0x1] Channel state : Initiated and open Channel next hop : 172.17.10.4 RX Reachability : Un-Reachable TX Reachability : Reachable Channel id: 35 Version : 3 Site id : 10.10.3.254 However, the hub BR s continue to send probes towards the branch. Is the branch receiving INET and processing Tunnel20 these? Let s see! DSCP : default[0] Service provider : INET Pfr-Label : 0:0 0:2 [0x2] Channel state : Initiated and open Channel next hop : 172.17.20.4 RX Reachability : Un-Reachable TX Reachability : Reachable <OUTPUT OMITTED> Last Probe sent : 428 msec Ago Last Probe received: 1364055 msec ago... <OUTPUT OMITTED> (10.10.3.254) Last Probe sent : 485 msec Ago Last Probe received: 1754157 msec ago... 44

Troubleshooting Interface Discovery EPC 1) Captured on-demand from CLI 2) Quick configuration 3) No visualization into forwarding UPGRADE Packet Trace 1) Captured on-demand from CLI 2) Quick configuration 3) Shows path trace for features Tunnel10 INET Tunnel20 How can we see ingress probes on the branch router? (10.10.3.254) 45

Packet-Trace: Details Designed to address the challenge with troubleshooting datapath issues in live high-scale environment Packet-trace provides visibility into the treatment of packets of an IOS-XE platform to troubleshoot, diagnose, or gain a deeper understanding of the actions taken on a packet during packet processing. Integrated platform condition debugging (debug platform condition), making it a viable option even under heavy traffic situations seen in production environments. Three specific levels of inspection are provided by packet-trace. Each level adds a deeper look into the packet processing at the expense of some packet processing overhead. Packet Trace is supported on the ASR1000, ISR4000, and CSR1000V 46

Packet-Trace: Path Data Path data may be collected per packet for a limited number of packets and is made up of different types of data as follows: Common path data (e.g. IP tuple) Feature specific data (e.g. NAT) Feature Invocation Array (FIA) trace optionally enabled Copy of all or part of the incoming and/or outgoing packet optionally enabled Capturing path data potentially has significant impact on packet processing capability specifically FIA trace and packet copy. FIA tracing creates many path data entries costing instructions and DRAM writes Packet copy creates many DRAM read/writes Packet-trace will only affect the performance of packets traced (i.e. those matched by the user provided conditions) 47

Packet Trace Workflow Enable and Define Buffer Criteria Define Condition Start/Stop, View ASR1000 CSR1000V ISR4000 48

Packet Trace Configuration Steps IOS-XE 15.3(3)S / 3.10.0S release and later: NOTE: debug platform packet-trace enable is not required in devices running 16.X and later. debug platform packet-trace enable debug platform packet-trace packet 8192 circular fia-trace data-size 2048 debug platform packet-trace copy packet both L3 size 64 debug platform condition ipv4 access-list 101 both debug platform condition start debug platform condition stop Review Data show platform packet-trace summary show platform packet-trace packet all show platform packet-trace packet 5 Verify/Clear Configuration show platform packet-trace configuration clear platform condition all 49

Troubleshooting Interface Discovery Match smart probes from the hub to the branch. ISR4K_MC_Branch2#show platform packet-trace summary Pkt Input Output State Reason 0 Tu10 Tu10 DROP 360 (PfRv3ProbeErr) 1 Tu10 Tu10 DROP 360 (PfRv3ProbeErr) 2 Tu10 Tu10 DROP 360 (PfRv3ProbeErr)... ISR4K_MC_Branch2# show ip access-lists SMP Extended IP access Tunnel10 list SMP 10 permit udp host 10.10.100.254 eq 18000 host 10.10.3.254 eq 19000 ISR4K_MC_Branch2# debug platform packet-trace packet 8192 fia-trace ISR4K_MC_Branch2# debug platform packet-trace copy packet input l2 size 2048 ISR4K_MC_Branch2# debug platform condition ipv4 access-list SMP both ISR4K_MC_Branch2# debug platform condition start INET Tunnel20 The summary shows the captured packets are dropped! Let s take a look at one of these... (10.10.3.254) 50

Troubleshooting Interface Discovery ISR4K_MC_Branch2#show platform packet-trace packet 0 Packet: 0 CBUG ID: 103067 Summary Input : Tunnel10 Output : Tunnel10 State : DROP 360 (PfRv3ProbeErr) Timestamp Start : 3399675002511155 ns (05/05/2017 14:50:40.181249 UTC) Stop : 3399675002724511 ns (05/05/2017 14:50:40.181462 UTC) Path Trace Feature: IPV4 Input : Tunnel10 Output : <unknown> Source : 10.10.100.254 Destination : 10.10.3.254 Protocol : 17 (UDP) SrcPort : 18000 DstPort : 19000 Tunnel10 <OUTPUT OMITTED> Feature: FIA_TRACE Input : Tunnel10 Output : <unknown> Entry : 0x11087ddc - IPV4_INPUT_CENT_SMP_PROCESS_EXT Lapsed time : 598800 ns Drop code matches previous output. This is a smart probe from the hub to our site-id. Why is it getting dropped? INET Let s look Tunnel20 at the datapath counters! (10.10.3.254) 51

Troubleshooting Interface Discovery ISR4K_MC_Branch2#show platform hardware qfp active feature pfrv3 datapath global sec Smart Probe Smart Probe: channel disc: 1942 channel reach: 342 channel unreach: 1461 channel force unreach: 26 channel inital->reach: 1170 channel all unreach: 131 interface disc: 144 interface color change: 0 transit smart probe: 0 tag change: 0 drop no uidb_sb: 0 drop pkt init: 0 drop err site-id: 0 drop no channel disc: 325105 drop no interface disc: 0 Tunnel10 drop no color change: 0 drop transit smart probe:11908 drop invalid TTL probe: 0 drop tag conflict: 0 drop no next-hop: 3658 drop invalid src site-id:0 drop on recycle: 0 drop wrong output intf: 1 drop throttled punt: 302 INET (10.10.3.254) Tunnel20 52

Troubleshooting Interface Discovery Tunnel10 ISR4K_MC_Branch2#show platform hardware qfp active feature pfrv3 datapath ISR4K_MC_Branch2#$ve global sec Smart feature Probe pfrv3 datapath global sec Smart Probe Smart Probe: channel disc: 1942 Smart Probe: channel disc: 1942 channel reach: 342 channel reach: 342 channel unreach: 1461 channel unreach: 1461 channel force unreach: 26 channel force unreach: 26 channel inital->reach: 1170 channel inital->reach: 1170 channel all unreach: 131 channel all unreach: 131 interface disc: 144 interface disc: 144 interface color change: 0 transit smart probe: 0 tag change: 0 drop no uidb_sb: 0 drop pkt init: 0 drop err site-id: 0 drop no channel disc: 325105 drop no interface disc: 0 drop no color change: 0 drop transit smart probe:11908 drop invalid TTL probe: 0 drop tag conflict: 0 drop no next-hop: 3658 drop invalid src site-id:0 drop on recycle: 0 drop wrong output intf: 1 drop throttled punt: 302 INET interface color change: 0 transit smart probe: 0 tag change: 0 drop no uidb_sb: 0 drop pkt init: 0 drop err site-id: 0 drop no channel disc: 325105 drop no interface disc: 0 drop no color change: 0 drop transit smart probe:12024 drop invalid TTL probe: 0 drop tag conflict: 0 drop no next-hop: 3658 drop invalid src site-id:0 drop on recycle: 0 drop wrong output intf: 1 drop throttled punt: 302 Tunnel20 Transit probe drop counter is Branch incrementing! MC/BR Why does the branch think (10.10.3.254) these frames are transit probes? 53

Troubleshooting Interface Discovery ISR4K_MC_Branch2#show domain iwan master site-prefix Change will be published between 5-60 seconds Next Publish 01:53:47 later Prefix DB Origin: 10.10.3.254 Last publish Status : Total publish errors : 1 Total learned prefix discards: 0 Prefix Flag: S-From SAF; L-Learned; T-Top Level; C-Configured; M-shared Tunnel10 Site-id Site-prefix Last Updated DC Bitmap Flag -------------------------------------------------------------------------------- 10.10.2.254 10.10.2.254/32 00:35:16 ago 0x0 S 10.10.100.254 10.10.3.254/32 00:11:39 ago 0x1 S,C,M 10.10.4.254 10.10.4.254/32 00:35:16 ago 0x0 S 10.10.100.254 10.10.100.0/28 00:11:39 ago 0x1 S,C,M... After checking the site-prefix database, the hub MC is advertising that it owns the branch s site-id! INET Tunnel20 (10.10.3.254) 54

Troubleshooting Interface Discovery DC1_HUB_MC#show run sec domain no ip domain lookup ISR4K_MC_Branch2#show ip domain name IWAN_CLdomain iwan master site-prefix domain Change iwan will be published between 5-60 seconds vrf Next default Publish 01:53:47 later Prefix border DB Origin: 10.10.3.254 Last source-interface publish Status Loopback0 : Total master publish local errors : 1 Total master learned hub prefix discards: 0 Prefix source-interface Flag: Tunnel10 S-From Loopback0 SAF; L-Learned; T-Top Level; C-Configured; M-shared site-prefixes prefix-list SITE-PREFIXES Site-id... Site-prefix Last Updated DC Bitmap Flag -------------------------------------------------------------------------------- 10.10.2.254 DC1 BR#show ip prefix-list 10.10.2.254/32 SITE-PREFIXES 00:35:16 ago 0x0 S 10.10.100.254 ip prefix-list SITE-PREFIXES: 10.10.3.254/32 8 entries 00:11:39 ago 0x1 S,C,M 10.10.4.254 10.10.4.254/32 00:35:16 ago 0x0 S 10.10.100.254 <OUTPUT OMITTED> 10.10.100.0/28 00:11:39 ago 0x1 S,C,M... seq 45 permit 10.10.3.254/32 After checking the site-prefix A site-prefix database, update the was hub MC is advertising misconfigured! A that server it owns IP that the lives branch s site-id! at the DC should be 10.10.30.254. INET (10.10.3.254) Tunnel20 55

Troubleshooting Interface Discovery ISR4K_MC_Branch2#show domain iwan master status <OUTPUT OMITTED> Minimum Requirement: Met Tunnel10 Borders: IP address: 10.10.3.254 Version: 2 Connection status: CONNECTED (Last Updated 3w2d ago ) Interfaces configured: Name: Tunnel10 type: external Service Provider: Status: UP Zero-SLA: NO Path of Last Resort: Disabled Number of default Channels: 1 Path-id list: 0:1 1:1 INET Tunnel20 Name: Tunnel20 type: external Service Provider: INET Status: UP Zero-SLA: NO Path of Last Resort: Disabled Number of default Channels: 1 Path-id list: 1:2 0:2 Tunnel if: Tunnel0 The PfRv3 interfaces are now discovered with the correct parameters. Hub The BR branch can now begin building TC s and optimizing traffic! (10.10.3.254) 56

PfRv3 Operational State Minimum Requirement

What is the Minimum Requirement for PfRv3? Simply defined, it is the information that each PfRv3 site needs in order to create channels and begin controlling traffic. The is the origin of the domain policy. BR requires that a TCP controlchannel is established with the MC. 5 Subservice Types used by PfRv3: Globals Capability PMI Policy Site-prefix Site 1 INET (10.10.1.254) 58

PfRv3 Subservice Details - What are Globals [5]? Globals are published by the and provide important operating parameters to the domain, including: PfRv3 TCP Port (17749 by default) Collector address Route control [on or off] Minimum Mask length Syslog address Timer values such as the Unreachable timer Byte loss / packet loss thresholds Load sharing configuration POP Preference 59

PfRv3 Subservice Details - What is Capability [4]? Published by MC s to inform other sites about our capabilities Helps ensure remote site is compatible with the local site R4-Site1-MCBR#show domain iwan master site-capability device-capb ----------------------------------------------------------- Capability Major Minor ----------------------------------------------------------- Domain 2 0 ----------------------------------------------------------- Zero-SLA 1 0 ----------------------------------------------------------- Mul-Hop 1 0 ----------------------------------------------------------- 60

PfRv3 Subservice Details - What is PMI [3]? Published by the and used by borders Defines key/non-key fields for data-plane performance monitor instances Allows borders to learn traffic classes and dynamic site prefixes Used to account for the performance of SMP s and data traffic on a channel R4-Site1-MCBR#show domain iwan border pmi inc PMI ****CENT PMI INFORMATION**** PMI[Ingress-per-DSCP]-FLOW MONITOR[MON-Ingress-per-DSCP-0-48-232] PMI[Ingress-per-DSCP-quick ]-FLOW MONITOR[MON-Ingress-per-DSCP-quick -0-48-233] PMI[Egress-aggregate]-FLOW MONITOR[MON-Egress-aggregate-0-48-234] PMI[Egress-prefix-learn]-FLOW MONITOR[MON-Egress-prefix-learn-0-48-235] R4-Site1-MCBR# 61

PfRv3 Subservice Details - What is Policy [2]? Published by the and used by Transit or Branch MC s User-configured policies Defines what traffic will be controlled by PfR Used to configure path preference Sets the loss, delay and jitter thresholds R4-Site1-MCBR#show domain iwan master policy ----------------------------------------------------------- class ef sequence 10 path-preference mpls fallback inet class type: Dscp Based match dscp ef policy custom priority 20 packet-loss-rate threshold 1.0 percent priority 10 one-way-delay threshold 50 msec priority 20 byte-loss-rate threshold 1.0 percent class af21 sequence 20 path-preference inet fallback mpls class type: Dscp Based match dscp af21 policy custom priority 10 packet-loss-rate threshold 2.0 percent priority 10 byte-loss-rate threshold 2.0 percent class be sequence 30 path-preference inet fallback mpls class type: Dscp Based match dscp default policy best-effort priority 2 packet-loss-rate threshold 10.0 percent priority 1 one-way-delay threshold 500 msec priority 2 byte-loss-rate threshold 10.0 percent ----------------------------------------------------------- 62

PfRv3 Subservice Details - What is Site-Prefix [1]? Published by MC s to advertise local prefixes and is required for route control R4-Site1-MCBR#show domain iwan master site-prefix Change will be published between 5-60 seconds Next Publish 00:00:38 later Prefix DB Origin: 10.10.1.254 Last publish Status : Peering Success Total publish errors : 0 Total learned prefix discards: 0 Prefix Flag: S-From SAF; L-Learned; T-Top Level; C-Configured; M-shared Site ID == MC s PfR loopback What prefix is located at the site Site-id Site-prefix Last Updated DC Bitmap Flag -------------------------------------------------------------------------------- 10.10.1.254 10.10.1.253/32 00:00:11 ago 0x0 L 10.10.1.254 10.10.1.254/32 00:00:11 ago 0x0 L 10.10.3.254 10.10.3.254/32 01:34:46 ago 0x0 S <output omitted> 10.10.100.254 10.10.100.254/32 00:22:50 ago 0x1 S 10.10.200.254 10.10.200.254/32 01:19:36 ago 0x4 S 10.10.100.254 10.10.200.0/24 00:22:50 ago 0x5 S,C,M 10.10.200.254 10.10.200.0/24 00:22:50 ago 0x5 S,C,M 255.255.255.255 *10.33.33.0/24 06:02:09 ago 0x0 S,T -------------------------------------------------------------------------------- 63

How are PfR s Requirements Advertised? PfRv3 utilizes a service routing DB and EIGRP SAF (Service Advertisement Framework) to supply operational requirements to the IWAN domain. PFR Process PFR Process Service Routing Service Routing EIGRP SAF Service Advertisement EIGRP SAF Neighbors EIGRP SAF Service Advertisement 64

PfR, Service Routing DB and EIGRP SAF Topology The domain role of the router determines what it is subscribed to and publishes. PFR Process Service Routing EIGRP SAF Service Advertisement 65

PfR, Service Routing DB and EIGRP SAF Topology The domain role of the router determines what it is subscribed to and publishes. R1-DC1-MC#show domain iwan master peering Peering state: Enabled Origin: Loopback0 What are we interested in from other sites? Peering type: Listener Subscribed service: cent-policy (2) : site-prefix (1) : PFR Last Hub Notification MC Info: 00:02:30 ago, Size: 161, Compressed size: 157, Status: Peering Success, Count: 3 Capability Process(4) : Last Notification Info: 00:00:27 ago, Size: 366, Compressed size: 226, Status: Peering Success, Count: 5 globals (5) : pmi (3) : Published service: site-prefix (1) : Service Routing Last Publish Info: 00:05:59 ago, Size: 354, Compressed size: 177, Status: Peering Success cent-policy (2) : Last Publish Info: 00:05:58 ago, Size: 2187, Compressed size: 443, Status: Peering Success pmi (3) : Last Publish Info: 00:05:55 ago, Size: 2452, Compressed size: 509, Status: Peering Success Capability (4) : EIGRP SAF globals Service (5) : Advertisement What are we advertising? Last Publish Info: 00:03:00 ago, Size: 425, Compressed size: 213, Status: Peering Success Last Publish Info: 00:06:09 ago, Size: 943, Compressed size: 404, Status: Peering Success 66

Service Publishing and Subscription By Device Type Globals Capability PMI Policy Site-Prefix Publish Publish Subscribe Publish Publish Publish Subscribe Transit MC Subscribe Publish Subscribe ----NA---- Subscribe Publish Subscribe Branch MC Subscribe Publish Subscribe ----NA---- Subscribe Publish Subscribe Border Subscribe Subscribe Subscribe ----NA---- Subscribe 67

How are PfR s Requirements Advertised? PfRv3 maintains a publish and subscribe relationship with the service routing database. Updates are encoded in XML and compressed by PfRv3 for transport across the domain through EIGRP SAF adjacencies. EIGRP SAF adjacencies are established between PfRv3 routers to send and receive the required service updates. PFR Process Globals Capability PMI Policy Site Prefix XML { } Service Routing EIGRP SAF 68

What is in the Service Routing DB? The Service Routing Database contains the published information from the entire domain. PFR Process Service Routing EIGRP SAF Service Advertisement R1-DC1-MC#show service-routing database Service-Routing Database Service ID (Service:Subservice:Instance) Trust Domain Owner Size ------------------------------------------------ --------- ------ ----- ----- 103:1:00000000-0000-0000-0000-00000A0A01FE Learned 59501 16 162 103:1:00000000-0000-0000-0000-00000A0A03FE Learned 59501 16 157 103:1:00000000-0000-0000-0000-00000A0A64FE Connected 59501 15 197 103:1:00000000-0000-0000-0000-00000A0AC8FE Learned 59501 16 171 103:2:00000000-0000-0000-0000-00000A0A64FE Connected 59501 15 463 103:3:00000000-0000-0000-0000-00000A0A64FE Connected 59501 15 529 103:4:00000000-0000-0000-0000-00000A0A01FE Learned 59501 16 252 103:4:00000000-0000-0000-0000-00000A0A03FE Learned 59501 16 249 103:4:00000000-0000-0000-0000-00000A0A64FE Connected 59501 15 233 103:4:00000000-0000-0000-0000-00000A0AC8FE Learned 59501 16 241 103:5:00000000-0000-0000-0000-00000A0A64FE Connected 59501 15 424 69

Service Routing DB Contents PfRv3 encodes the service in XML and publishes it to the service routing DB. Example of a published service from the (owner handle = 15): PFR Process Service Routing EIGRP SAF Service Advertisement R1-DC1-MC#show service-routing database 103:3:0.0.0.A0A64FE Service-Routing Database Service ID (Service:Subservice:Instance) Trust Domain Owner Size ------------------------------------------------ --------- ------ ----- ----- 103:3:00000000-0000-0000-0000-00000A0A64FE Connected 59501 15 529 Owner: CENT Peering hub default Sequence: 5 AFI: * VRF: Default, Tableid: 0 Subscription Handles: 16 Reachability: IPv4: 0.0.0.0:0, protocol 0 70

Service Routing DB Contents PfRv3 encodes the service in XML and publishes it to the service routing DB. Example of a learned service from another PfR site (owner handle = 16): PFR Process Service Routing EIGRP SAF Service Advertisement R1-DC1-MC#show service-routing database 103:4:0.0.0.A0A01FE Service-Routing Database Service ID (Service:Subservice:Instance) Trust Domain Owner Size ------------------------------------------------ --------- ------ ----- ----- 103:4:00000000-0000-0000-0000-00000A0A01FE Learned 59501 16 175 Owner: SAF Forwarder AS(59501) SFv4 VRF() Sequence: 4 AFI: IPv4 VRF: Default, Tableid: 0 Subscription Handles: 13 Reachability: IPv4: 0.0.0.0:0, protocol 71

Service Routing DB and EIGRP SAF Interface Client Handle is used to identify the subscriptions by PfR. PFR Process Service Routing EIGRP SAF Service Advertisement R1-DC1-MC#show eigrp service-family ipv4 subscriptions detail EIGRP-SFv4 VR(#AUTOCFG#) Subscriptions for AS(59501)/ID Subscription: Client Subscription Client Handle Name Context Service:Subservice:Instance 11: 15 CENT Peering hub default 0x00007F45FEBC8FE0 103:3:FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF Notifications: 0 12: 15 CENT Peering hub default 0x00007F45FEBC8FE0 103:5:FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF Notifications: 0 13: 15 CENT Peering hub default 0x00007F45FEBC8FE0 103:4:FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF Notifications: 5 14: 15 CENT Peering hub default 0x00007F45FEBC8FE0 103:1:FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF Notifications: 3 15: 15 CENT Peering hub default 0x00007F45FEBC8FE0 103:2:FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF Notifications: 0 16: 16 SAF Forwarder AS(59501) SFv4 VRF() 0x00007F462ADBF188 65535:65535:FFFFFFFF.FFFFFFFF.FFFFFFFF.FFFFFFFF Notifications: 7 R1-DC1-MC# 72

EIGRP SAF Topology Table Contents EIGRP SAF Topology Table R1-DC1-MC#show eigrp service-family ipv4 topology EIGRP-SFv4 VR(#AUTOCFG#) Topology Table for AS(59501)/ID Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply, r - reply Status, s - sia Status PFR Process Service Routing EIGRP SAF Service Advertisement P 103:2:0.0.0.A0A64FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:5:0.0.0.A0A64FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:4:0.0.0.A0A01FE, 1 successors, FD is 327843840 via 10.10.1.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A03FE, 1 successors, FD is 327843840 via 10.10.3.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0AC8FE, 1 successors, FD is 327843840 via 10.10.200.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A64FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:3:0.0.0.A0A64FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:1:0.0.0.A0A01FE, 1 successors, FD is 327843840 via 10.10.1.254 (327843840/327745536), Loopback0 P 103:1:0.0.0.A0A03FE, 1 successors, FD is 327843840 via 10.10.3.254 (327843840/327745536), Loopback0 P 103:1:0.0.0.A0AC8FE, 1 successors, FD is 327843840 via 10.10.200.254 (327843840/327745536), Loopback0 P 103:1:0.0.0.A0A64FE, 1 successors, FD is 327745536 via Connected, Null0 73

EIGRP SAF Topology Table Contents Examining a Specific Topology Entry: Branch MC PFR Process Service Transit Routing MC EIGRP SAF Service Advertisement R1-DC1-MC#show eigrp service-family ipv4 topology 103:4:0.0.0.A0A01FE EIGRP-SFv4 VR(#AUTOCFG#) Topology Entry for AS(59501)/ID for <> State is Passive, Query origin flag is 1, 1 Successor(s), FD is 327843840 Length:[251], Sequence No:[4] Originating Address: 0.0.0.0 Port: 0, Protocol: 0 Descriptor Blocks: 10.10.1.254 (Loopback0), from 10.10.1.254, Send flag is 0x0 Composite metric is (327843840/327745536), service is Internal Vector metric: Minimum bandwidth is 8000000 Kbit Total delay is 5001250000 picoseconds Reliability is 255/255 Load is 1/255 Minimum MTU is 1514 Hop count is 1 Originating router is 10.10.1.254 10.10.200.254 (Loopback0), from 10.10.200.254, Send flag is 0x0 Composite metric is (328007680/327925760), service is Internal Vector metric: Minimum bandwidth is 8000000 Kbit Total delay is 5003750000 picoseconds Reliability is 255/255 Load is 1/255 Minimum MTU is 1514 Hop count is 3 Originating router is 10.10.1.254 74

Subservices Published and Subscribed INET (10.10.3.254) 75

Subservices Published and Subscribed R1-DC1-MC#show domain iwan master peering Peering state: Enabled Origin: Loopback0 Peering type: Listener Subscribed service: cent-policy (2) : site-prefix (1) : Last Notification Info: 00:38:15 ago, Size: 226, Compressed size: 171, Status: Peering Success, Count: 9 Capability (4) : Last Notification Info: 00:13:16 ago, Size: 461, Compressed size: 250, Status: Peering Success, Count: 10 globals (5) : pmi (3) : Published service: site-prefix (1) : Last Publish Info: 01:41:31 ago, Size: 354, Compressed size: 177, Status: Peering Success cent-policy (2) : Last Publish Info: 01:41:35 ago, Size: 2187, Compressed size: 443, Status: Peering Success pmi (3) : Last Publish Info: 01:41:32 ago, Size: 2452, Compressed size: 509, Status: Peering Success Capability (4) : Last Publish Info: 01:39:42 ago, Size: 425, Compressed size: 213, Status: Peering Success globals (5) : Last Publish Info: 01:41:45 ago, Size: 943, Compressed size: 404, Status: Peering Success INET (10.10.3.254) 76

Subservices Published and Subscribed R4-Site1-MCBR#show domain iwan master peering Peering state: Enabled Origin: Loopback0(10.10.1.254) Peering type: Listener, Peer(With 10.10.100.254) Subscribed service: cent-policy (2) : Last Notification Info: 04:43:51 ago, Size: 2187, Compressed size: 463, Status: Peering Success, Count: 1 site-prefix (1) : Last Notification Info: 00:01:18 ago, Size: 226, Compressed size: 171, Status: Peering Success, Count: 30 Capability (4) : Last Notification Info: 01:02:45 ago, Size: 425, Compressed size: 233, Status: Peering Success, Count: 27 globals (5) : Last Notification Info: 03:41:55 ago, Size: 943, Compressed size: 424, Status: Peering Success, Count: 3 Published service: site-prefix (1) : Last Publish Info: 00:41:04 ago, Size: 202, Compressed size: 142, Status: Peering Success Capability (4) : Last Publish Info: 01:39:50 ago, Size: 492, Compressed size: 231, Status: Peering Success INET (10.10.3.254) 77

Subservices Published and Subscribed R3-DC1-BR-INT#show domain iwan border peering Peering state: Enabled Origin: Loopback0(10.10.100.252) Peering type: Peer(With 10.10.100.254) Subscribed service: pmi (3) : Last Notification Info: 05:05:33 ago, Size: 2452, Compressed size: 529, Status: Peering Success, Count: 1 site-prefix (1) : Last Notification Info: 00:02:14 ago, Size: 226, Compressed size: 171, Status: Peering Success, Count: 18 globals (5) : Last Notification Info: 05:05:44 ago, Size: 943, Compressed size: 424, Status: Peering Success, Count: 1 Capability (4) : Last Notification Info: 00:00:09 ago, Size: 427, Compressed size: 241, Status: Peering Success, Count: 18 INET Published service: N/A (10.10.3.254) 78

Service Routing Entries Explained R4-Site1-MCBR#show service-routing database 103:1:0.0.0.A0A64FE Service-Routing Database Service ID (Service:Subservice:Instance) Trust Domain Owner Size ------------------------------------------------ --------- ------ ----- ----- 103:1:00000000-0000-0000-0000-00000A0A64FE Learned 59501 10 197 Owner: SAF Forwarder AS(59501) SFv4 VRF() Sequence: 8 AFI: IPv4 VRF: Default, Tableid: 0 Subscription Handles: 10, 14 Reachability: IPv4: 0.0.0.0:0, protocol 0 Entries are composed of Service : Subservice : Instance Domain 59501 is the EIGRP SAF Process-ID Subscription handles which service routing client is interested 79

Service Routing Entries Explained R4-Site1-MCBR#show service-routing database 103:1:0.0.0.A0A64FE Service-Routing Database Service ID (Service:Subservice:Instance) Trust Domain Owner Size ------------------------------------------------ --------- ------ ----- ----- 103:1:00000000-0000-0000-0000-00000A0A64FE Learned 59501 10 197 Owner: SAF Forwarder AS(59501) SFv4 VRF() Sequence: 8 AFI: IPv4 VRF: Default, Service Tableid: PfRv3 0 will always be 103 Subscription Handles: 10, 14 Reachability: IPv4: 0.0.0.0:0, protocol 0 80

Service Routing Entries Explained R4-Site1-MCBR#show service-routing database 103:1:0.0.0.A0A64FE Service-Routing Database Service ID (Service:Subservice:Instance) Trust Domain Owner Size ------------------------------------------------ --------- ------ ----- ----- 103:1:00000000-0000-0000-0000-00000A0A64FE Learned 59501 10 197 Owner: SAF Forwarder AS(59501) SFv4 VRF() Sequence: 8 AFI: IPv4 VRF: Default, Tableid: 0 Subscription Handles: 10, 14 Subservice 5 = Globals 4 = Capability 3 = PMI 2 = Policy 1 = Site-prefix Reachability: IPv4: 0.0.0.0:0, protocol 0 81

Service Routing Entries Explained R4-Site1-MCBR#show service-routing database 103:1:0.0.0.A0A64FE Service-Routing Database Service ID (Service:Subservice:Instance) Trust Domain Owner Size ------------------------------------------------ --------- ------ ----- ----- 103:1:00000000-0000-0000-0000-00000A0A64FE Learned 59501 10 197 Owner: SAF Forwarder AS(59501) SFv4 VRF() Sequence: 8 AFI: IPv4 VRF: Default, Tableid: 0 Subscription Handles: 10, 14 Instance Reachability: contains the PFR loopback IP of the originator in Hex IPv4: 0.0.0.0:0, protocol 0 0A0A64FE == 10.10.100.254 0A = 10 0A = 10 64 = 100 FE = 254 82

EIGRP SAF Topology Entries 103:3 = PMI Advertisement 0A0A64FE == 10.10.100.254 0A = 10 0A = 10 64 = 100 FE = 254 103:1 = Site Prefix Advertisement 0A0A64FE == 10.10.100.254 0A = 10 0A = 10 64 = 100 FE = 254 R4-Site1-MCBR#show eigrp service-family ipv4 topology EIGRP-SFv4 VR(#AUTOCFG#) Topology Table for AS(59501)/ID(10.10.1.254) Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply, r - reply Status, s - sia Status P 103:2:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:5:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A03FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:4:0.0.0.A0AC8FE, 1 successors, FD 327925760 This is the s PMI advertisement! via 10.10.100.254 (327925760/327843840), Loopback0 P 103:4:0.0.0.A0A01FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:3:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:1:0.0.0.A0A01FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:1:0.0.0.A0A03FE, 1 successors, FD is 327925760 This is via the 10.10.100.254 s Site-Prefix (327925760/327843840), Advertisement! Loopback0 P 103:1:0.0.0.A0AC8FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:1:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 83

EIGRP SAF Topology Entries 103:2 = Policy Advertisement 0A0A64FE == 10.10.100.254 0A = 10 0A = 10 64 = 100 FE = 254 103:3 = PMI Advertisement 0A0A64FE == 10.10.100.254 0A = 10 0A = 10 64 = 100 FE = 254 R4-Site1-MCBR#show eigrp service-family ipv4 topology EIGRP-SFv4 VR(#AUTOCFG#) Topology Table for AS(59501)/ID(10.10.1.254) Codes: P - Passive, A - Active, U - Update, Q - Query, R - This is the s Policy advertisement! Reply, r - reply Status, s - sia Status P 103:2:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:5:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A03FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:4:0.0.0.A0AC8FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:4:0.0.0.A0A01FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:3:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:1:0.0.0.A0A01FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:1:0.0.0.A0A03FE, 1 successors, FD is 327925760 This is via the 10.10.100.254 s PMI (327925760/327843840), advertisement! Loopback0 P 103:1:0.0.0.A0AC8FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:1:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 84

EIGRP SAF Topology Entries 103:4 = Capability Advertisement 0A0A64FE == 10.10.100.254 0A = 10 0A = 10 64 = 100 FE = 254 103:5 = Globals Advertisement 0A0A64FE == 10.10.100.254 0A = 10 0A = 10 64 = 100 FE = 254 R4-Site1-MCBR#show eigrp service-family ipv4 topology EIGRP-SFv4 VR(#AUTOCFG#) Topology Table for AS(59501)/ID(10.10.1.254) Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply, r - reply Status, s - sia Status This is the s Capability advertisement! P 103:2:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:5:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A03FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:4:0.0.0.A0AC8FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:4:0.0.0.A0A01FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:3:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:1:0.0.0.A0A01FE, 1 successors, FD is 327745536 via Connected, Null0 P 103:1:0.0.0.A0A03FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:1:0.0.0.A0AC8FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:1:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 This is the s Globals advertisement! 85

A Note on SAF Only One EIGRP Topology Example: BR does not subscribe to Policy 103:2 It is, however, present in EIGRP SAF and the service-routing database. INET (10.10.3.254) 86

A Note on SAF Only One EIGRP Topology Example: BR does not subscribe to Policy 103:2 It is, however, present in EIGRP SAF and the service-routing database. R3-DC1-BR-INT#show domain iwan border peering Peering state: Enabled Origin: Loopback0(10.10.100.252) Peering type: Peer(With 10.10.100.254) No Policy! Subscribed service: pmi (3) : Last Notification Info: 05:05:33 ago, Size: 2452, Compressed size: 529, Status: Peering Success, Count: 1 site-prefix (1) : Last Notification Info: 00:02:14 ago, Size: 226, Compressed size: 171, Status: Peering Success, Count: 18 globals (5) : Last Notification Info: 05:05:44 ago, Size: 943, Compressed size: 424, Status: Peering Success, Count: 1 Capability (4) : Last Notification Info: 00:00:09 ago, Size: 427, Compressed size: 241, Status: Peering Success, Count: 18 INET Published service: N/A (10.10.3.254) 87

A Note on SAF Only One EIGRP Topology Example: BR does not subscribe to Policy 103:2 It is, however, present in EIGRP SAF and the R3-DC1-BR-INT#show eigrp service-family ipv4 topology service routing database. EIGRP-SFv4 VR(#AUTOCFG#) Topology Table for AS(59501)/ID(10.10.100.252) Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply, r - reply Status, s - sia Status P 103:2:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:5:0.0.0.A0A64FE, 1 successors, FD is 327843840 via 10.10.100.254 (327843840/327745536), Loopback0 P 103:4:0.0.0.A0A01FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 P 103:4:0.0.0.A0A03FE, 1 successors, FD is 327925760 via 10.10.100.254 (327925760/327843840), Loopback0 <output omitted> 103:2 == Policy! INET (10.10.3.254) 88

Visualizing EIGRP SAF Peering DCI Transit (10.10.200.254) Tunnel10 Tunnel20 INET (10.10.1.254) Branch BR (10.10.1.253) (10.10.3.254) 89

Process Startup to Minimum Requirement Met Operational Setup local DB s (capability, siteprefix, route-control) Open TCP 17749 Socket Open UDP 9995/9996/9997 for the collector and create templates Setup EIGRP SAF peering Encode and Publish globals Encode and Publish capability Encode and Publish PMI Encode and Publish site-id and site-prefixes Encode and publish PFRv3 Policy Branch MC Min. Req. Setup local DB (peering, site prefix, route-control) Open TCP 17749 Socket Open UDP 9995/9996/9997 for the collector and create templates Setup EIGRP SAF peering Globals update received from MC Policy update received from MC Encode and publish capability Encode and publish Site-id and site-prefixes BR Min. Req. Setup local DB (peering, site prefix, route-control) Setup DP for probes Exporter created for ODE Connect TCP 17749 to MC Setup EIGRP SAF peering BR to BR Auto-tunnel created Globals update received from MC PMI update received from 90

Identifying a Problem with Minimum Requirement Why is minimum requirement not met? INET.May 4 04:10:25.362 EDT: %DOMAIN-5-BR_STATUS: Minimum Requirement Not Met. Details:Instance=0: VRF=default: Min Requirement Mask=9 Site 1 (10.10.1.254) 91

Border Minimum Requirement Check TCP First PfRv3 uses a TCP socket between BR(s) and their local MC for process communication. This connection must be established or minimum requirement will not be met. INET Branch BR (10.10.1.253) TCP 17749 (10.10.1.254) 92

Border Minimum Requirement Check TCP First PfRv3 uses a TCP socket between BR(s) and their local MC Tue for Jun process 13 16:19:51.114 communication. R5-Site1-BR#show domain iwan border status -------------------------------------------------------------------- **** Border Status **** This connection Instance Status: must UP be established or minimum Master: 10.10.1.254 requirement will not be met. Present status last updated: 00:00:31 ago Loopback: Configured Loopback0 UP (10.10.1.253) Master version: 0 Connection Status with Master: DOWN (retry in 00:00:03) MC connection info: Socket Pending: Socket is not connected Disconnected for: 00:00:31 <output omitted> Minimum Requirement: Not Met Peering Db Absent PMI update: Not received Globals Update: Not received (Will attempt shut/no-shut if min requirement not meet in 1191 secs) Branch BR (10.10.1.253) TCP not connected INET TCP 17749 (10.10.1.254) 93

Troubleshooting Minimum Requirement Not Met R4-Site1-MCBR#show domain iwan master status *** Domain MC Status *** Master VRF: Global Instance Type: Branch Instance id: 0 Operational status: Up Configured status: Up Loopback IP Address: 10.10.1.254 Load Balancing: Operational Status: Down Route Control: Enabled Transit Site Affinity: Enabled Load Sharing: Enabled Connection Keepalive: 60 seconds Mitigation mode Aggressive: Disabled Policy threshold variance: 20 Minimum Mask Length Internet: 24 Minimum Mask Length Enterprise: 24 Syslog TCA suppress timer: 180 seconds Traffic-Class Ageout Timer: 5 minutes Minimum Packet Loss Calculation Threshold: 15 packets Minimum Bytes Loss Calculation Threshold: 1 bytes Minimum Requirement: Not Met Policy update: Not received Globals Update: Not received (Will attempt shut/no-shut if min requirement not meet in 1187 secs) We have a problem! Site 1 INET (10.10.1.254) 94

Verify the Peering Status R4-Site1-MCBR#show domain iwan master peering Peering state: Enabled Origin: Loopback0(10.10.1.254) Peering type: Listener, Peer(With 10.10.100.254) Subscribed service: Missing Policy! cent-policy (2) : site-prefix (1) : Last Notification Info: 00:00:54 ago, Size: 161, Compressed size: 157, Status: Peering Success, Count: 1 Capability (4) : Last Notification Info: 00:01:06 ago, Size: 238, Missing Compressed Globals! size: 175, Status: Peering Success, Count: 1 globals (5) : Published service: site-prefix (1) : Last Publish Info: 00:00:54 ago, Size: 161, Compressed size: 137, Status: Peering Success Capability (4) : Last Publish Info: 00:01:09 ago, Size: 238, Compressed size: 155, Status: Peering Success INET Site 1 (10.10.1.254) 95

Do We Have a SAF Neighbor? R4-Site1-MCBR#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 4 10.10.100.254 Lo0 590 00:01:53 1 5000 1 0 3 10.10.1.253 Lo0 591 00:01:55 1 100 0 4 2 10.10.1.34 Gi5 592 00:01:55 2 100 0 3 1 10.10.1.3 Gi4 588 00:01:55 3 100 0 2 0 10.10.1.19 Gi3 496 00:01:55 4 100 0 1 Non-Zero Queue INET Site 1 (10.10.1.254) 96

Do We Have a SAF Neighbor? Non-Zero Queue R4-Site1-MCBR#show eigrp service-family ipv4 neighbors detail EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 4 10.10.100.254 Lo0 586 00:01:56 1 5000 1 0 Remote Static neighbor (static multihop) Version 23.0/4.0, Retrans: 24, Retries: 24, Waiting for Init, Waiting for Init Ack Topology-ids from peer - 0 Topologies advertised to peer: base UPDATE seq 778 ser 0-0 Sent 116524 Init Sequenced No ACK from Hub neighbor! BR INET S Site 1 (10.10.1.254) 97

Overlay Routing Problem No Connectivity! R4-Site1-MCBR#ping 10.10.100.254 source loopback0 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 10.10.100.254, timeout Hub is 2 BR seconds: Packet sent with a source address of 10.10.1.254... Success rate is 0 percent (0/5) R4-Site1-MCBR# R4-Site1-MCBR#sh ip route 10.10.100.254 % Subnet not in table Missing route! INET S Site 1 (10.10.1.254) 98

Overlay Routing Problem No Connectivity! INET S Site 1 (10.10.1.254) 99

Overlay Routing Problem No Connectivity! router eigrp IWAN-EIGRP! address-family ipv4 unicast autonomous-system 100! af-interface Loopback0 passive-interface exit-af-interface! af-interface Tunnel10 hello-interval 20 hold-time 60 exit-af-interface! topology base distribute-list prefix no-mc out exit-af-topology network 10.10.0.0 0.0.255.255 network 172.17.10.0 0.0.0.255 exit-address-family Misconfiguration! S Site 1 INET (10.10.1.254) 100

Avoiding These Types of Problems Configure all PfRv3 loopbacks as /32. Don t summarize the / Transit MC loopbacks. Use a leak-map on the BR s. INET Site 1 (10.10.1.254) 101

Avoiding These Types of Problems R2-DC1-BR-#sh run sec router eigrp router eigrp IWAN-EIGRP! address-family ipv4 unicast autonomous-system 100 Configure all PfRv3 loopbacks as /32.! af-interface Loopback0 Don t passive-interface summarize the / Transit exit-af-interface MC! loopbacks. af-interface Tunnel10 hello-interval 20 Use hold-time a leak-map 60 on the BR s. summary-address 10.0.0.0 255.0.0.0 leak-map allow-mc exit-af-interface! topology base exit-af-topology network 10.10.0.0 0.0.255.255 network 172.17.10.0 0.0.0.255 exit-address-family! ip prefix-list allow-mc seq 5 permit 10.10.100.254/32 ip prefix-list allow-mc seq 10 permit 10.10.200.254/32! route-map allow-mc permit 10 match ip address prefix-list allow-mc! Fixed! S S Site 1 Summary with leak-map! INET (10.10.1.254) 102

Route Verification S R4-Site1-MCBR#show ip route 10.10.100.254 Routing entry for 10.10.100.254/32 Known via "eigrp 100", distance 90, metric 10250880, type Hub internal BR Redistributing via eigrp 100 Last update from 172.17.10.1 on Tunnel10, 00:00:17 ago Overrides from "PfR" Routing Descriptor Blocks: * 172.17.10.1, from 172.17.10.1, 00:00:17 ago, via Tunnel10 Route metric is 10250880, traffic share count is 1 Total delay is 20011 microseconds, minimum bandwidth is 1000000 Kbit Reliability 255/255, minimum MTU 1400 bytes Loading 1/255, Hops 2 New Route from INET S Site 1 (10.10.1.254) 103

EIGRP SAF Neighbor Verification S R4-Site1-MCBR#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 4 10.10.100.254 Lo0 571 00:05:50 5 100 0 38 3 10.10.1.253 Lo0 549 00:17:49 1 100 0 4 2 10.10.1.34 Gi5 545 00:17:49 1 100 0 3 1 10.10.1.3 R4-Site1-MCBR#show eigrpgi4 service-family ipv4 traffic 523 00:17:49 1 100 0 2 0 10.10.1.19 EIGRP-SFv4 VR(#AUTOCFG#) Gi3 Service-Family Traffic 519 Statistics 00:17:49 for 1 AS(59501) 100 0 1 Hellos sent/received: 5403/7036 Updates sent/received: 1323/651 Queries sent/received: 25/38 Replies sent/received: 39/25 Acks sent/received: 399/958 SIA-Queries sent/received: 0/0 SIA-Replies sent/received: 0/0 Hello Process ID: 470 PDM Process ID: 368 Socket Queue: 0/10000/7/0 (current/max/highest/drops) Input Queue: 0/10000/7/0 (current/max/highest/drops) No Drops! S Site 1 Neighbor is UP to Hub! INET (10.10.1.254) 104

Policy and Globals are Now Received over SAF R4-Site1-MCBR#show domain iwan master status *** Domain MC Status *** Master VRF: Global Instance Type: Branch Instance id: 0 Operational status: Up Configured status: Up Loopback IP Address: 10.10.1.254 Load Balancing: Operational Status: Down External Collector: 192.168.1.1 port: 2055 Route Control: Enabled Transit Site Affinity: Enabled Load Sharing: Enabled Connection Keepalive: 60 seconds Mitigation mode Aggressive: Disabled Policy threshold variance: 20 Minimum Mask Length Internet: 24 Minimum Mask Length Enterprise: 24 Syslog TCA suppress timer: 180 seconds Traffic-Class Ageout Timer: 5 minutes Minimum Packet Loss Calculation Threshold: 15 packets Minimum Bytes Loss Calculation Threshold: 1 bytes Minimum Requirement: Met Min Req. is MET! S S Site 1 INET (10.10.1.254) 105

Branch BR Minimum Requirement Not Met R5-Site1-BR#show domain iwan border status Thu May 04 03:55:52.382 In this scenario, the **** Border Status **** has met Minimum Requirement but Instance Status: UP the branch BR has not. Present status last updated: 00:05:28 ago Loopback: Configured Loopback0 UP (10.10.1.253) This Master: results 10.10.1.254 in the INET path not Master version: 2 being used as a PfR exit interface! -------------------------------------------------------------------- (10.10.1.254) Connection Status with Master: UP MC connection info: CONNECTION SUCCESSFUL Connected for: 00:05:28 Route-Control: Enabled Asymmetric Routing: Disabled Minimum Mask Length Internet: 24 Minimum Mask Length Enterprise: 24 Connection Keepalive: 60 seconds Sampling: off Channel Unreachable Threshold Timer: 4 seconds Minimum Packet Loss Calculation Threshold: 15 packets Minimum Byte Loss Calculation Threshold: 1 bytes Monitor cache usage: 4000 (20%) Auto allocated Minimum Requirement: Not Met PMI update: Not received Globals Update: Not received (Will attempt shut/no-shut if min requirement not meet in 872 secs) Site1 LAN We have a problem! INET Reset SAF after expiry Branch BR (10.10.1.253) 106

What Information is the BR Missing? We are missing the PMI and Globals update from the branch MC. No notification information for either of these parameters! (10.10.1.254) INET Branch BR (10.10.1.253) R5-Site1-BR#show domain iwan border peering Peering state: Enabled Origin: Loopback0(10.10.1.253) Peering type: Peer(With 10.10.1.254) Subscribed service: pmi (3) : site-prefix (1) : Missing PMI Site1 LAN Last Notification Info: 00:19:28 ago, Size: 244, Compressed size: 172, Status: Peering Success, Count: 1 globals (5) : Capability (4) : Missing Globals Last Notification Info: 00:19:28 ago, Size: 332, Compressed size: 220, Status: Peering Success, Count: 1 Published service: N/A 107

Does the BR have an EIGRP SAF Neighbor? INET (10.10.1.254) R5-Site1-BR#show eigrp service-family ipv4 neighbors EIGRP-SFv4 VR(#AUTOCFG#) Service-Family Neighbors for AS(59501) H Address Interface Hold Uptime SRTT RTO Q Seq (sec) (ms) Cnt Num 0 10.10.1.254 Lo0 513 00:27:07 5 100 0 3401 Branch BR (10.10.1.253) Branch MC SAF is up! Capability and Site- Prefix, but only from Branch MC! R5-Site1-BR#show eigrp service-family ipv4 topology Site1 LAN EIGRP-SFv4 VR(#AUTOCFG#) Topology Table for AS(59501)/ID(10.10.1.253) Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply, r - reply Status, s - sia Status P 103:4:0.0.0.A0A01FE, 1 successors, FD is 327843840 via 10.10.1.254 (327843840/327745536), Loopback0 P 103:1:0.0.0.A0A01FE, 1 successors, FD is 327843840 via 10.10.1.254 (327843840/327745536), Loopback0 108

Why Won t the Branch MC Send Globals and PMI? This is often a common branch design. The branch BR is not L2- adjacent to the Branch MC. EIGRP SAF uses multicast hello and update packets on the branch LAN interface(s). (10.10.1.254) S INET Branch BR (10.10.1.253) Site1 LAN 109

Why is this a Design Requirement? The updates must be forwarded by SAF over the LAN interface using multicast. Multicast updates are allowed because of the auto-configured leak-map. (10.10.1.254) R4-Site1-MCBR#show derived-config sec router eigrp router eigrp #AUTOCFG# (API-generated auto-configuration, not user configurable)! service-family ipv4 autonomous-system 59501 eigrp stub connected leak-map Auto-configuration enforced Site1 LAN INET Branch BR (10.10.1.253) R5-Site1-BR#show derived-config sec router eigrp router eigrp #AUTOCFG# (API-generated auto-configuration, not user configurable)! service-family ipv4 autonomous-system 59501 eigrp stub connected! 110

Solving the Problem at the Branch The solution is to create a direct connection between the Branch MC and the Branch BR. This can be a physical link or a GRE tunnel R5-Site1-BR# can be used. (10.10.1.254) INET Branch BR (10.10.1.253).May 4 04:30:43.153 EDT: %DUAL-5-NBRCHANGE: EIGRP-SFv4 59501: Neighbor 10.10.1.18 (GigabitEthernet3) is up: new adjacency R5-Site1-BR#show domain iwan border status S Thu May 04 04:31:30.107 -------------------------------------------------------------------- **** Border Status **** Site1 LAN Instance Status: UP Present status last updated: 00:00:47 ago Loopback: Configured Loopback0 UP (10.10.1.253) <output omitted> Minimum Requirement: Met Min. Req. is now Met! 111

Troubleshooting Steps (Checklist) 1. Ensure the is operational 2. New Deployment or it has worked before, other sites up with the peer? 3. EIGRP SAF neighbor state is the peer reachable? 4. Is the route to the SAF neighbor correct? 5. Are the SAF peering loopbacks configured as /32 mask? 6. Are EIGRP SAF updates being sent and received? (non-zero OutQ) (EIGRP update troubleshooting) 7. If SAF is UP, what services are missing to meet the minimum requirement? 8. If SAF is up do the service routing db size and sequence number match with SAF topology table entry? 9. Use PFR debugging to investigate specific service failures (peering, capabilities, communication, pmi, policy, process) INET (10.10.3.254) 112

Minimum Requirement Summary Minimum requirement is met when the router has received its operating parameters from the. EIGRP SAF is the protocol used by PfR to deliver the configured policy to the domain. EIGRP SAF peering is unicast between PfR loopbacks, with the exception of a dual router branch where multicast is also used on LAN interfaces. Ensure the TCP 17749 socket is established between MC and BR. Newer versions will call this out in the show domain <name> border status command as socket not connected. PfR will restart peering after 20 minutes of minimum requirement not being met (TCP down or services not received over SAF). 113

Learning a Traffic-Class

How are Traffic-Classes Learned? Prerequisites MC has PfR policy (what to control) TCP 17749 socket established (MC BR) SAF is up and Min Requirement Met Border has Globals, PMI, Site-Prefixes, Capability updates, and CEF enabled PMI applied on tunnel interfaces Branch borders have discovered WAN interfaces Route entry for the destination that will route traffic out of the BR tunnel to the destination (allows PMI to see the traffic and learn) Tunnel20 INET Host 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) 115

PMI is a Key Component in Learning a TC There are three monitors applied to the tunnel at the BR: Egress Monitor [1] - Dynamic site-prefix Egress Monitor [2] - Tunnel BW and learning new traffic flows Ingress Monitor [3] - Monitors ingress traffic and smart probes PMI details are learned through EIGRP SAF from the. PMI cache exports are sent over UDP 9995 to the MC collector. NOTE: A fourth monitor is added if quick-monitors are defined on the hub MC. Tunnel20 INET Host 10.10.1.4 1 2 3 1 2 3 Host 10.10.100.222 Tunnel10 (10.10.1.254) 116

PMI for Learning Dynamic Site-Prefixes The Egress Monitor [1] is applied in the egress direction for dynamic site-prefix learning. R5-Site1-BR#show domain iwan border pmi PMI[Egress-prefix-learn]-FLOW MONITOR [MON-Egress-prefix-learn-0-48-353] monitor-interval:30 minimum-mask-length:24 minimum-mask-length enterprise:24 key-list: ipv4 source prefix ipv4 source mask routing vrf input Non-key-list: counter bytes long counter packets long timestamp absolute monitoring-interval start interface input DSCP-list:N/A Class:CENT-Class-Egress-ANY-0-354 Exporter-list: 10.10.1.254 Export interval Tunnel20 INET Host 10.10.1.4 Exported to Branch MC 1 Host 10.10.100.222 Tunnel10 (10.10.1.254) 117

PMI for Egress Bandwidth and TC Learning R5-Site1-BR#show domain iwan border pmi The PMI[Egress-aggregate]-FLOW Egress Monitor MONITOR [2] is applied [MON-Egress-aggregate-0-48-352] in the monitor-interval:30 egress direction for Trigger Nbar:No reporting minimum-mask-length:24 tunnel bandwidth and minimum-mask-length enterprise:24 learning new traffic flows. key-list: timestamp absolute monitoring-interval start ipv4 destination prefix ipv4 destination mask pfr site destination prefix ipv4 pfr site destination prefix mask ipv4 ip dscp interface output Non-key-list: counter bytes long counter packets long ip protocol pfr site destination id ipv4 pfr site source id ipv4 pfr br ipv4 address interface output physical snmp DSCP-list:N/A Class:CENT-Class-Egress-ANY-0-354 Exporter-list: 10.10.1.254 Export interval Tunnel20 INET Host 10.10.1.4 Exported to Branch MC 2 Host 10.10.100.222 Tunnel10 (10.10.1.254) 118

PMI for SMP and Performance Measurements R5-Site1-BR#show domain iwan border pmi Ingress policy CENT-Policy-Ingress-0-179: Ingress policy activated on: Tunnel200 The Ingress Monitor [3] monitors ingress traffic and smart probes monitor-interval:30 and allows the router key-list: to calculate delay, jitter, packet loss, and reachability. ------------------------------------------------------------------------- PMI[Ingress-per-DSCP]-FLOW MONITOR[MON-Ingress-per-DSCP-0-48-354] pfr site source id ipv4 pfr site destination id ipv4 ip dscp interface input policy performance-monitor classification hierarchy pfr label identifier Non-key-list: transport packets lost rate transport bytes lost rate pfr one-way-delay network delay average transport rtp jitter inter arrival mean counter bytes long counter packets long timestamp absolute monitoring-interval start DSCP-list: af21-[class:cent-class-ingress-dscp-af21-0-355] Host 10.10.1.4 Host 10.10.100.222 Tunnel20 Policy from configuration Tunnel10 INET packet-loss-rate:react_id[882]-priority[10]-threshold[2.0 percent] byte-loss-rate:react_id[883]-priority[10]-threshold[2.0 percent] <OUTPUT OMITTED> Exporter-list:None 3 (10.10.1.254) Not Exported (unless a user configured collector is present) 119

What is a Quick Monitor? R5-Site1-BR#show domain iwan border pmi A quick monitor allows -0-48-355] Performance monitor-interval:4monitor to report key-list: traffic statistics in shorter intervals site source id ipv4 than pfrthe site default destination time id ipv4 (30 seconds). PMI[Ingress-per-DSCP-quick ]-FLOW MONITOR[MON-Ingress-per-DSCP-quick ip dscp interface input policy performance-monitor classification hierarchy Provides for quicker detection pfr label identifier and Non-key-list: reaction to problems that may transport packets lost rate arise transport for business-critical bytes lost rate traffic pfr one-way-delay such as video and voice. network delay average transport rtp jitter inter arrival mean counter bytes long Only one quick monitor value can counter packets long be applied timestamp absolute per domain. monitoring-interval start DSCP-list: ef-[class:cent-class-ingress-dscp-ef-0-357] packet-loss-rate:react_id[888]-priority[20]-threshold[1.0 percent] Host quick designation indicates 10.10.100.222 that a quick monitor was defined on the hub MC for a shorter monitor interval. Tunnel20in 4-second intervals. INET Host one-way-delay:react_id[889]-priority[10]-threshold[50 msec] network-delay-avg:react_id[890]-priority[10]-threshold[100 10.10.1.4 msec] byte-loss-rate:react_id[891]-priority[20]-threshold[1.0 percent] All DSCP s listed under this quick monitor will have statistics reported 4 Tunnel10 (10.10.1.254) 120

Learning a Traffic-Class Step-by-Step Traffic from the branch is routed over the tunnel toward the hub site. The [Egress-aggregate] and [Egress-prefix-learn] monitors create records in the performance monitor cache. Tunnel20 INET Host 10.10.100.222 Tunnel10 NOTE: CEF Switching must be enabled Host 10.10.1.4 1 Record 2 3 (10.10.1.254) 121

Learning a Traffic-Class Dynamic Site-Prefix Traffic from the branch is routed over the tunnel toward the hub site. The [Egress-prefix-learn] monitor R5-Site1-BR#show performance monitor cache monitor MON-Egress-prefix-learn-0-48-353 creates a record in the detail performance format record Monitor: MON-Egress-prefix-learn-0-48-353 monitor cache. Data Collection Monitor: Cache type: Cache size: 700 Current entries: 1 High Watermark: 1 Flows added: 1 Flows aged: 0 Synchronized timeout (secs): 30 IPV4 SOURCE PREFIX: 10.10.1.0 IPV4 SOURCE MASK: /28 IP VRF ID INPUT: 0 (DEFAULT) counter bytes long: 1227300 counter packets long: 12273 timestamp monitor start: 23:44:30.000 interface input: Tu0 Synchronized (Platform cache) Tunnel20 INET Source Prefix discovered! Record Host 10.10.1.4 1 Host 10.10.100.222 Tunnel10 (10.10.1.254) 122

Learning a Traffic-Class Recognizing the Flow R5-Site1-BR#show performance monitor cache monitor Traffic from MON-Egress-aggregate-0-48-322 the branch is routed detail format over record Monitor: MON-Egress-aggregate-0-48-322 the tunnel toward the hub site. Data Collection Monitor: Cache type: The [Egress-aggregate] Cache size: 4000 monitor Current entries: 1 creates a record in the performance High Watermark: 2 monitor cache. Synchronized (Platform cache) Flows added: 264 Flows aged: 263 Synchronized timeout (secs): 30 TIMESTAMP MONITOR START: 09:16:00.000 IPV4 DESTINATION PREFIX: 10.10.100.0 IPV4 DESTINATION MASK: /24 IPV4 DESTINATION SITE PREFIX: 10.10.100.0 IPV4 DESTINATION SITE PREFIX MASK: /24 IP DSCP: 0x12 INTERFACE OUTPUT: Tu200 counter bytes long: 900800 counter packets long: 9008 ip protocol: 1 pfr destination site id: 10.10.100.254 pfr source site id: 10.10.1.254 ipv4 pfr br address: 10.10.1.253 interface output physical snmp index: 11 Destination Site-Prefix Tunnel20 DSCP INET Output Interface Destination Host Site-ID 10.10.1.4 Record 2 Host 10.10.100.222 Tunnel10 (10.10.1.254) 123

Learning a Traffic-Class Cache Export The cache entries are exported to the branch MC (UDP 9995) when the monitor interval expires. Host 10.10.100.222 Tunnel20 INET Record Monitor Interval Expires! Tunnel10 Host 10.10.1.4 (10.10.1.254) 124

Learning a Traffic-Class MC Site-Prefix Addition Upon receiving the export the branch MC creates a dynamic site-prefix entry. Host 10.10.100.222.May 4 07:05:00.273 EDT: MC-V9_data:[0]:SRC[10.10.1.253] EGRESS PFX LEARN:src_pfx: 10.10.1.0 ; src_pfx_len: 28 ; vrf_id: 0 ; cnt_bytes: 1182800; cnt_pkts: 11828 ; Tunnel20 Tunnel10.May 4 07:05:00.273 EDT: MC-V9_data:[0]:SRC[10.10.1.253] EGRESS PFX LEARN:br_ip:10.10.1.253; INET if_input=4; if_input_idx=0;.may 4 07:05:00.273 EDT: MC-PFX_DB:[0]: No match found in search exact internal.may 4 07:05:00.273 EDT: MC-PFX_DB:[0]: Insert site-prefix[10.10.1.0], site-id[10.10.1.254] Site-prefix added! Host 10.10.1.4 (10.10.1.254) 125

Learning a Traffic-Class MC Site-Prefix Addition New dynamic entry is advertised to the rest of the domain via EIGRP SAF. Host 10.10.100.222 Tunnel20 INET EIGRP SAF Tunnel10 Host 10.10.1.4 (10.10.1.254) 126

Learning a Traffic-Class MC Receives Flow Details Traffic flow details are received by the branch MC. Host 10.10.100.222 BR Address Destination site-prefix DSCP.May 4 07:05:00.274 EDT: MC-V9_data:[0]:SRC[10.10.1.253] EGRESS DATA:SrcPfx:255.255.255.255/255;DstPfx:10.10.100.0/24;DstSitePfx:10.10.100.0/24;appl:4294967295;dscp:0x12. Tunnel20 May 4 07:05:00.274 EDT: MC-V9_data:[0]:SRC[10.10.1.253] if_output: 11;SrcSiteId: 0.0.0.0 ;DstSiteId: 10.10.100.254 ;cnt_bytes: 1182700; cnt_pkts: 11827 ;Protocol:1 ; INET Destination site-id Tunnel10 Host 10.10.1.4 (10.10.1.254) 127

Learning a Traffic-Class Should a TC Be Created? Is there a policy match? Host 10.10.100.222 R4-Site1-MCBR#show domain iwan master policy -------------------------------------------------- <output omitted> class af21 sequence 20 path-preference inet fallback mpls class type: Dscp Based match dscp af21 policy custom priority 10 packet-loss-rate threshold 2.0 percent priority 10 byte-loss-rate threshold 2.0 percent Tunnel20 INET Policy Match! Tunnel10.May 4 07:05:00.274 EDT: CENT:MC:POL:[0]:matched policy sequence 20 The Branch MC will check against the policy received from the as well as the site-prefix DB. Host 10.10.1.4 (10.10.1.254) 128

Learning a Traffic-Class Should a TC Be Created? What are the possible outcomes at this stage? (load-balance disabled) Host 10.10.100.222 Destination is internet TC..May 10 07:19:31.013 EDT: CENT:MC:POL:[0]:No policy match Destination is in the enterprise prefix-list. Tunnel20 INET.May 10 07:16:01.017 EDT: CENT:MC:RCDB:[0]:This is a non-site Tunnel10 Load-balance scenarios will be covered later in the presentation! No DSCP match but destination site-prefix exists..may 10 07:07:01.017 EDT: CENT:MC:POL:[0]:No policy match Host 10.10.1.4 (10.10.1.254) 129

Learning a Traffic-Class Traffic-Class is Created If all conditions match, a TC is created and added to routecontrol database (RCDB). Host 10.10.100.222 Tunnel20 INET Tunnel10.May 4 07:05:00.275 EDT: CENT:MC:RCDB:[0]:TC[dst_pfx=10.10.100.0/24, app_id=4294967295, dscp=18] learned from Border 10.10.1.253 Host 10.10.1.4 (10.10.1.254) 130

Learning a Traffic-Class MC Creates Channels Channels are added by the Branch MC. Borders are asked to create channels. Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) 131

Learning a Traffic-Class MC Creates Channels Channels are added by the Branch MC. Borders are asked to create corresponding channels which will be used for probe generation and accounting. Tunnel20 INET Host 10.10.100.222 Tunnel10.May 4 07:05:00.274 EDT: Channel[316, 10.10.100.254{ipv4},0x6D706C7300000000,0x12,1376256]: Added Channel.May 4 07:05:00.275 EDT: Channel[317, 10.10.100.254{ipv4},0x696E657400000000,0x12,2097152]: Added Channel Host 10.10.1.4 Chan (10.10.1.254) 132

Learning a Traffic-Class Create Channels Both BR s create channels to the hub site. A message is sent to the MC confirming channel creation. Host 10.10.100.222 R5-Site1-BR#show domain iwan border parent-route Border Parent Route Details: Prot: EIGRP, Network: 10.10.100.254/32, Gateway: 172.17.20.1, Interface: Tunnel20, Ref count: 2 Tunnel20 INET Tunnel10.May 4 07:05:00.217 EDT: BR_SMP_CHAN[0]:Chan[317, 10.10.100.254{ipv4}, 0x696E657400000000, 0x12, 2097152]: Created with next hop 172.17.20.1 Channel NH is from the parent-route. Host 10.10.1.4 C-OK (10.10.1.254) 133

Learning a Traffic-Class MC Attempts Control MC attempts to control a TC but channels are not ready yet. At this point, we have a TC but it remains in UNCONTROLLED [UC]..May 4 07:05:00.277 EDT: MC-PDP:[0]:TC[25, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: Traffic class not under any channel. going to Uncontrolled state Tunnel20 INET.May 4 07:05:00.277 EDT: MC-PDP:[0]:TC[25, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: TC update message sent for BR:10.10.1.254 Host 10.10.100.222 Tunnel10.May 4 07:05:00.278 EDT: MC-PDP:[0]:TC[25, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: TC update message sent for BR:10.10.1.253 Host 10.10.1.4 UC (10.10.1.254) 134

Learning a Traffic-Class Back-Off Timer MC starts a back-off timer for 30 seconds. During this back-off period, the channels should become reachable. Host 10.10.100.222 Tunnel20 INET Channels reachable!..may 4 07:05:00.278 EDT: MC-TIMER:[0]: Start tw_timer=7f13ce8c5e38/7f13cf16ec90 delay=30000 context=7f13ce8c5cc0 by CENT-MC-0, cent_rc_backoff_timer_start:3335 Tunnel10.May 4 07:05:01.482 EDT: CENT:MC:IPC:[0]:MC Received IPC: 120 CENT_IPC_CHAN_INIT_TO_REACH_STATE.May 4 07:05:02.280 EDT: CENT:MC:IPC:[0]:MC Received IPC: 120 CENT_IPC_CHAN_INIT_TO_REACH_STATE Host 10.10.1.4 Reach (10.10.1.254) 135

Learning a Traffic-Class PDP Chooses a Channel After the timer expires, Policy Decision Point (PDP) runs on the MC to choose an exit. Candidates are added to a list of possible exit channels and evaluated to choose the best one. Tunnel20 INET Timer expires, PDP starts!.may 4 07:05:31.290 EDT: CENT:MC:PDP:[0]:Backoff Timer timeout for TC id:25. Goto pickexit.may 4 07:05:31.290 EDT: MC-PDP:[0]:TC[25, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: creating candidates for TC id: 25 Host 10.10.100.222 Tunnel10 Host 10.10.1.4 (10.10.1.254) 136

Learning a Traffic-Class PDP Chooses a Channel PDP looks for a perfect channel from the list of candidates. Best exit is chosen based on pathpreference, channel state and egress BW. Border routers are informed of the decision by the MC. Tunnel20 INET Host 10.10.100.222 Exit found! Tunnel10.May 4 07:05:31.293 EDT: MC-PDP:[0]:TC[25, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: 696E657400000000(18) 2 Exit is the best exit.may 4 07:05:31.294 EDT: MC-PDP:[0]:TC[25, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: TC update message sent for BR:10.10.1.254.May 4 07:05:31.294 EDT: MC-PDP:[0]:TC[25, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: TC update message sent for BR:10.10.1.253 Host 10.10.1.4 Tell BR s! TC CN (10.10.1.254) 137

Learning a Traffic-Class Controlled State R4-Site1-MCBR#show domain iwan master traffic-classes Dst-Site-Prefix: 10.10.100.0/24 DSCP: af21 [18] Traffic class id:25 Clock Time: 07:05:39 (EDT) 05/04/2017 TC Learned: 00:00:39 ago Present State: CONTROLLED Current Performance Status: in-policy Current Service Provider: inet since 00:00:08 (hold until 81 sec) Previous Service Provider: Unknown BW Used: 420 Kbps Present WAN interface: Tunnel20 in Border 10.10.1.253 Present Channel (primary): 317 inet pfr-label:0:32 0:0 [0x200000] Backup Channel: 316 mpls pfr-label:0:21 0:0 [0x150000] Destination Site ID bitmap: 5 Destination Site ID: 10.10.100.254 (Active) Alternate Destination site: 10.10.200.254 (Active) Class-Sequence in use: 20 Tunnel20 Class Name: af21 using policy User-definedINET priority 10 packet-loss-rate threshold 2.0 percent priority 10 byte-loss-rate threshold 2.0 percent BW Updated: 00:00:09 ago Reason for Latest Route Change: Uncontrolled to Controlled Transition Route Change History: Date and Time Previous Exit Current Exit Reason TC Controlled Host 1: 06:47:09 (EDT) 02/09/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) inet(0:32 0:0)/10.10.1.253/Tu200 (Ch:317) Uncontrolled 10.10.1.4 to Controlled Transition -------------------------------------------------------------------------------- Total Traffic Classes: 1 Site: 1 Internet: 0 Exit and BR IP Channels Host 10.10.100.222 Shared Destination Site-Prefix Tunnel10 Last change (10.10.1.254) 138

Learning a Traffic-Class Controlled State Host 10.10.100.222 R5-Site1-BR#show domain iwan border traffic-classes Src-Site-Prefix: ANY Dst-Site-Prefix: 10.10.100.0/24 DSCP: af21 [18] Traffic class id: 25 TC Learned: 00:00:45 ago Present State: CONTROLLED Destination Site ID: 10.10.100.254 If_index: 18 Primary chan id: 317 Primary chan Presence: LOCAL CHANNEL Primary interface: Tunnel20 Primary Nexthop: 172.17.20.1 (EIGRP) Backup chan id: 316 Backup chan Presence: NEIGHBOR_CHANNEL via border 10.10.1.254 Backup interface: Tunnel0 Controlled Local Channel Tunnel20 INET NH Source Backup Channel Tunnel10 Backup channel is reachable over the auto-tunnel that is dynamically created between BR s. Host 10.10.1.4 Auto Tunnel (10.10.1.254) 139

Troubleshooting Traffic-Class Creation Traffic is arriving but no TC is built. Where do we start? Host 10.10.100.222 Tunnel20 INET R4-Site1-MCBR#show domain iwan master traffic-classes Total Traffic Classes: 0 Site: 0 Internet: 0 Tunnel10 No TC! Let s look at the parameters of the traffic! Host 10.10.1.4 (10.10.1.254) 140

Starting the Investigation Is the destination within the enterprise boundary (defined in the enterprise prefix-list)? If so, is there a covering site-prefix in the database? What DSCP is the traffic marked with? Does a TC already exist? NO! Tunnel20 INET Host 10.10.100.222 Tunnel10 Host 10.10.1.4 (10.10.1.254) 141

Starting the Investigation R4-Site1-MCBR#show domain iwan master status *** Domain MC Status *** Is the destination within the enterprise Master VRF: Global boundary (defined in the Instance enterprise Type: Branch prefix-list)? Instance id: 0 Operational status: Up If Configured so, is there status: a Up covering site-prefix Loopback IP Address: 10.10.1.254 in the database? <output omitted> Minimum Requirement: Met What Borders: DSCP is the traffic marked IP address: 10.10.1.254 with? Version: 2 Connection status: CONNECTED (Last Updated 6d02h ago ) Interfaces configured: Does a TC already exist? NO! Tunnel20 INET Name: Tunnel10 type: external Service Provider: mpls Status: UP Zero-SLA: NO Path of Last Resort: Disabled <output omitted> IP address: 10.10.1.253 Version: 2 Connection status: CONNECTED (Last Updated 6d02h ago ) Host Interfaces configured: Name: Tunnel20 type: external Service Provider: inet Status: UP Zero-SLA: NO Path of Last Resort: Disabled 10.10.1.4 PfR is operational! Min Requirement is Met! Host 10.10.100.222 Tunnel10 WAN Discovered (10.10.1.254) WAN Discovered 142

Does a Performance Monitor Cache Entry Exist? Minimum requirement is met and we have both WAN interfaces discovered. Is PfR recognizing the traffic? Verify that an egress monitor cache entry exists for this flow on the BR. Tunnel20 INET Host 10.10.100.222 Tunnel10 Host 10.10.1.4 (10.10.1.254) 143

Does a Performance Monitor Cache Entry Exist? R5-Site1-BR#show performance monitor cache monitor MON-Egress-aggregate-0-48-368 detail format record Monitor: MON-Egress-aggregate-0-48-368 Minimum requirement is met and Data Collection Monitor: we have both WAN interfaces Cache type: discovered. Cache size: 4000 Current entries: 1 High Watermark: 2 Is PfR recognizing the traffic? Flows added: 7 Flows aged: 6 Synchronized timeout (secs): 30 Verify that an egress monitor cache entry exists for this flow on the BR. TIMESTAMP MONITOR START: 09:17:30.000 IPV4 DESTINATION PREFIX: 10.10.100.0 IPV4 DESTINATION MASK: /24 IPV4 DESTINATION SITE PREFIX: 10.10.100.0 IPV4 DESTINATION SITE PREFIX MASK: /24 IP DSCP: 0x12 INTERFACE OUTPUT: Tu20 counter bytes long: 1032000 counter packets long: 10320 ip protocol: 1 pfr destination site id: 10.10.200.254 pfr source site id: 10.10.1.254 ipv4 pfr br address: 0.0.0.0 interface output physical snmp index: 0 Synchronized (Platform cache) Tunnel20 INET Site-prefix looks correct DSCP is correct Counter increasing Host 10.10.100.222 Tunnel10 Destination site-id is not the! Host (10.10.1.254) 10.10.1.4 144

Verifying the Destination Site-ID and Site-Prefix Why is the destination site-id incorrect in the cache entry? The site-prefix entry is not owned by a PfR site. Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) 145

Verifying the Destination Site-ID and Site-Prefix R4-Site1-MCBR#show domain iwan master site-prefix Change will be published between 5-60 seconds Why is the destination site-id Next Publish 00:31:50 later incorrect Prefix DB in Origin: the cache 10.10.1.254 entry? Last publish Status : Peering Success Total publish errors : 0 The Total site-prefix learned prefix entry discards: is not 0 owned by a PfR site. Prefix Flag: S-From SAF; L-Learned; T-Top Level; C-Configured; M-shared Site-id Site-prefix Last Updated DC Bitmap Flag -------------------------------------------------------------------------------- 10.10.1.254 10.10.1.0/28 08:28:55 ago 0x0 L 10.10.1.254 10.10.1.16/28 00:00:24 ago 0x0 L 10.10.1.254 10.10.1.254/32 6d02h ago 0x0 L The site-id is incorrect! Tunnel20 INET 10.10.3.254 10.10.3.254/32 01:44:24 ago 0x0 S 10.10.100.254 10.10.100.254/32 00:02:49 ago 0x1 S 255.255.255.255 *10.10.100.0/24 01:03:14 ago 0x0 S,T 10.10.200.254 10.10.200.254/32 00:01:02 ago 0x4 S 10.10.100.254 10.10.200.0/24 00:01:02 ago 0x5 S,C,M 10.10.200.254 10.10.200.0/24 00:01:02 ago 0x5 S,C,M 255.255.255.255 *10.33.33.0/24 6d02h ago 0x0 S,T 255.255.255.255 *10.66.66.0/24 6d02h ago 0x0 S,T 255.255.255.255 *10.133.133.0/24 6d02h ago 0x0 S,T Host 255.255.255.255 *10.166.166.0/24 6d02h ago 0x0 S,T -------------------------------------------------------------------------------- 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) 146

Site-Prefix Verification Go to the site that should be originating the prefix and verify its entry. Is the site-prefix entry the same or different when comparing the two sites? Tunnel20 INET Host 10.10.100.222 Tunnel10 Host 10.10.1.4 (10.10.1.254) 147

Site-Prefix Verification Go to the site that should be originating the prefix and verify its entry. R1-DC1-MC#show domain iwan master site-prefix Change will be published between 5-60 seconds Next Publish 01:20:54 later Is the Prefix site-prefix DB Origin: 10.10.100.254 entry the same or Last publish Status : Peering Success different Total publish when errors comparing : 0 the two sites? Total learned prefix discards: 0 Prefix Flag: S-From SAF; L-Learned; T-Top Level; C-Configured; M-shared Tunnel20 INET Site-id Site-prefix Last Updated DC Bitmap Flag -------------------------------------------------------------------------------- 10.10.1.254 10.10.1.0/28 00:04:25 ago 0x0 S 10.10.1.254 10.10.1.16/28 00:04:25 ago 0x0 S 10.10.1.254 10.10.1.254/32 00:04:25 ago 0x0 S Same incorrect entry is present here! 10.10.3.254 10.10.3.254/32 00:20:39 ago 0x0 S 10.10.100.254 10.10.100.254/32 00:00:00 ago 0x1 L 255.255.255.255 *10.10.100.0/24 02:14:48 ago 0x0 T 10.10.200.254 10.10.200.254/32 00:37:18 ago 0x4 S Host -------------------------------------------------------------------------------- 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) 148

Site-Prefix Verification Since this is a hub site, siteprefixes are known by static configuration. Branches can be static or dynamic. R1-DC1-MC#sh run sec domain domain iwan vrf default master Hub hub BR source-interface Loopback0 site-prefixes prefix-list site-prefix load-balance enterprise-prefix prefix-list ent-prefix class ef sequence 10 match dscp ef policy custom <output omitted> Tunnel20 INET Host 10.10.100.222 MC has site-prefix and enterpriseprefix lists configured. Tunnel10 10.10.100.0/24 is missing from the site-prefix list but is part of the enterprise-prefix list! Host 10.10.1.4 R1-DC1-MC#show run inc prefix-list ip prefix-list ent-prefix seq 5 permit 10.33.33.0/24 ip prefix-list ent-prefix seq 10 permit 10.133.133.0/24 ip prefix-list ent-prefix seq 15 permit 10.66.66.0/24 ip prefix-list ent-prefix seq 20 permit 10.166.166.0/24 ip prefix-list ent-prefix seq 25 permit 10.10.100.0/24 ip prefix-list site-prefix seq 10 permit 10.10.200.0/24 (10.10.1.254) 149

Correcting the Site-Prefix List After correcting the site-prefix entry, the TC is now learned and controlled. Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) 150

Correcting the Site-Prefix List After correcting the site-prefix entry, the TC is now learned and controlled. R4-Site1-MCBR#show domain iwan master traffic-classes Dst-Site-Prefix: 10.10.100.0/24 DSCP: af21 [18] Traffic class id:36 Clock Time: 09:45:59 (EDT) 05/10/2017 TC Learned: 00:00:32 ago Present State: CONTROLLED Current Performance Status: in-policy Current Service Provider: inet since 00:00:01 (hold until 88 sec) Previous Service Provider: Unknown BW Used: 451 Kbps Present WAN interface: Tunnel20 in Border 10.10.1.253Tunnel20 Present Channel (primary): 406 inet pfr-label:0:32 0:0 [0x200000] INET Backup Channel: 405 mpls pfr-label:0:21 0:0 [0x150000] Destination Site ID bitmap: 1 Destination Site ID: 10.10.100.254 (Active) Class-Sequence in use: 20 Class Name: af21 using policy User-defined priority 10 packet-loss-rate threshold 2.0 percent priority 10 byte-loss-rate threshold 2.0 percent BW Updated: 00:00:02 ago Host Reason for Latest Route Change: Uncontrolled to Controlled Transition Dst-Site-Prefix now correctly belongs to the hub. 10.10.1.4 TC is controlled on INET path. Host 10.10.100.222 Tunnel10 (10.10.1.254) 151

Troubleshooting Example TC Not Created Traffic is sent by the host but no TC is created. Where do we start? Host 10.10.100.222 Tunnel20 INET R4-Site1-MCBR#show domain iwan master traffic-classes Total Traffic Classes: 0 Site: 0 Internet: 0 Tunnel10 No TC! Host 10.10.1.4 (10.10.1.254) 152

TC Not Created Start with Collecting the Facts The source is 10.10.1.4 and the destination is 10.10.100.222 What DSCP is the traffic marked with? Is the destination within the enterprise boundary (defined in the enterprise prefix-list)? R4-Site1-MCBR#show domain iwan master site-prefix 10.10.100.254 Site-id Site-prefix Last Updated Tunnel20 DC Bitmap Flag -------------------------------------------------------------------------------- INET 10.10.100.254 10.10.100.254/32 00:08:44 ago 0x1 S 10.10.100.254 10.10.200.0/24 00:08:44 ago 0x1 S,C,M 10.10.100.254 10.10.100.0/24 00:08:44 ago 0x1 S,C,M If so, is there a covering site-prefix in the database? Do any TC s already exist? SRC Host 10.10.1.4 EF DST Host 10.10.100.222 Tunnel10 Owned by the hub site (10.10.1.254) 153

TC Not Created Start with Collecting the Facts The source is 10.10.1.4 and the destination is 10.10.100.222. What DSCP is the traffic marked with? R4-Site1-MCBR#show domain iwan master traffic-classes summary Is the destination within the enterprise boundary (defined in the enterprise prefix-list)? If so, is there a covering site-prefix Total Traffic Classes: 4 Site: 4 Internet: 0 in the database? Do any TC s already exist? APP - APPLICATION, TC-ID - TRAFFIC-CLASS-ID, APP-ID - APPLICATION-ID Current-EXIT - Service-Provider(PFR-label)/Border/Interface(Channel-ID) UC - UNCONTROLLED, PE - PICK-EXIT, CN - CONTROLLED, UK - UNKNOWN Host 10.10.1.4 Host 10.10.100.222 An EF traffic-class is present but to a different site! Tunnel20 Tunnel10 INET Dst-Site-Pfx Dst-Site-Id State DSCP TC-ID APP-ID APP Current-Exit R4-Site1-MCBR#show domain iwan master site-prefix 10.10.100.254 10.10.100.0/24 Site-id 10.10.100.254 Site-prefix CN af21[18] Last Updated 43 N/A DC Bitmap N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:418) Flag 10.10.3.0/28 -------------------------------------------------------------------------------- 10.10.3.254 CN ef[46] 5 N/A N/A mpls(0:0 0:0)/10.10.1.254/Tu10(Ch:421) 10.10.200.0/24 10.10.100.254 10.10.100.254 10.10.100.254/32 CN af21[18] 00:08:4445 ago N/A 0x1 N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:418) S 10.10.200.0/24 10.10.100.254 10.10.100.254 10.10.200.0/24 CN default[0] 00:08:4444 ago N/A 0x1 N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:410) S,C,M 10.10.100.254 10.10.100.0/24 00:08:44 ago 0x1 S,C,M SRC EF DST Owned by the hub site (10.10.1.254) 154

Verify What Other TC s Have Been Learned The source is 10.10.1.4 and the destination is 10.10.100.222. What DSCP is the traffic marked with? R4-Site1-MCBR#show domain iwan master traffic-classes summary Is the destination within the enterprise boundary (defined in the enterprise prefix-list)? If so, is there a covering site-prefix Total Traffic Classes: 4 Site: 4 Internet: 0 in the database? Do any TC s already exist? APP - APPLICATION, TC-ID - TRAFFIC-CLASS-ID, APP-ID - APPLICATION-ID Current-EXIT - Service-Provider(PFR-label)/Border/Interface(Channel-ID) UC - UNCONTROLLED, PE - PICK-EXIT, CN - CONTROLLED, UK - UNKNOWN AF21 to the correct site, but no EF Dst-Site-Pfx Dst-Site-Id State DSCP TC-ID APP-ID APP Current-Exit Tunnel20 INET Host 10.10.1.4 Host 10.10.100.222 Tunnel10 10.10.100.0/24 10.10.100.254 CN af21[18] 43 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:418) 10.10.3.0/28 10.10.3.254 CN ef[46] 5 N/A N/A mpls(0:0 0:0)/10.10.1.254/Tu10(Ch:421) 10.10.200.0/24 10.10.100.254 CN af21[18] 45 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:418) 10.10.200.0/24 10.10.100.254 CN default[0] 44 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:410) EF (10.10.1.254) 155

Review the Clues Discovered So Far What do we know? 1. We are able to create TC s and control them. 2. We are able to build an EF TC. 3. We are able to build a TC to the correct site but for different DSCP s. Tunnel20 INET Host 10.10.100.222 Tunnel10 Host 10.10.1.4 (10.10.1.254) EF 156

What is the Next Step? The MC is capable of learning and controlling TC s marked as EF, so what are we receiving from this source? Wireshark capture on the source host reveals that DSCP EF is sent out. We can use Flexible Netflow (FNF) or Embedded Packet Capture (EPC) to verify what the BR is receiving from the branch LAN. Tunnel20 INET Host 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) EF 157

What is the Next Step? The MC is capable of learning and controlling TC s marked as EF, so what are we receiving from this source?! Wireshark capture on the source flow monitor tac host reveals record netflow-original that DSCP EF is sent! out.! interface GigabitEthernet4 We can description use Flexible Link to Site1 Netflow LAN ip flow monitor tac input (FNF) ip or address Embedded 10.10.1.3 255.255.255.240 Packet standby use-bia Capture standby (EPC) 1 ip 10.10.1.1 to verify what the negotiation auto BR is receiving from the branch! LAN. Flow Monitor! flow monitor tac record netflow-original!! interface GigabitEthernet4 description Link to Site1 LAN ip flow monitor tac input ip address 10.10.1.2 255.255.255.240 standby use-bia standby 1 ip 10.10.1.1 negotiation auto! Tunnel20 INET Applied ingress Host 10.10.1.4 Host 10.10.100.222 Applied ingress Tunnel10 (10.10.1.254) EF 158

Checking the FNF Cache In checking the cache, no entry is found on the MC. However, we do see an entry on the BR! Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) EF 159

Checking the FNF Cache In checking the cache, no entry is found on the MC. R5-Site1-BR#show flow monitor tac cache format record IPV4 SOURCE ADDRESS: 10.10.1.4 IPV4 DESTINATION ADDRESS: 10.10.100.222 However, we do see an entry on TRNS SOURCE PORT: 0 the BR! TRNS DESTINATION PORT: 2048 INTERFACE INPUT: Gi3 FLOW SAMPLER ID: 0 IP TOS: 0x68 IP PROTOCOL: 17 ip source as: 0 ip destination as: 0 ipv4 next hop address: 172.17.20.1 ipv4 source mask: /28 ipv4 destination mask: /8 tcp flags: 0x00 interface output: Tu20 counter bytes: 3604500 counter packets: 36045 timestamp first: 22:27:14.895 timestamp last: 22:29:15.611 Tunnel20 INET Host 10.10.1.4 Destination IP Traffic is arriving as AF31, not EF! Host 10.10.100.222 Tunnel10 R4-Site1-MCBR#show flow monitor tac cache format record beg 10.10.1.4 EF (10.10.1.254) 160

No Class Defined for AF31 We will not create a TC without a policy match. Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) EF 161

No Class Defined for AF31 R1-DC1-MC#sh run sec domain domain iwan We will vrf not default create a TC without a master hub policy match. source-interface Loopback0 site-prefixes prefix-list site-prefix <output removed> class ef sequence 10 match dscp ef policy custom <output removed> class af21 sequence 20 match dscp af21 policy custom Host 10.10.100.222 Tunnel20 INET May 14 21:46:00.004 EDT: CENT:MC:RCDB:[0]:RC: from Border 10.10.1.253: AppId FFFFFFFF, DSCP 0x1A src network: 255.255.255.255 src length: 255 dest network: 10.10.100.0dest len: 24 dscp: 26 May 14 21:46:00.004 EDT: CENT:MC:POL:[0]:No policy match No match! Tunnel10 AF31 Host 10.10.1.4 (10.10.1.254) EF 162

Traffic-Class is Now Created and Controlled A misconfigured QoS Policy on the LAN resulted in traffic being remarked from EF to AF31. This could have also been a misconfigured policy on the Hub MC without a class defined for this DSCP. Tunnel20 INET Host 10.10.100.222 Tunnel10 Host 10.10.1.4 Auto Tunnel (10.10.1.254) EF 163

Traffic-Class is Now Created and Controlled A misconfigured QoS Policy on the LAN resulted in traffic being remarked from EF to AF31. R4-Site1-MCBR#show domain iwan master traffic-classes summary This could have also been a misconfigured policy on the Hub MC without a class defined for this DSCP. APP - APPLICATION, TC-ID - TRAFFIC-CLASS-ID, APP-ID - APPLICATION-ID Current-EXIT - Service-Provider(PFR-label)/Border/Interface(Channel-ID) UC - UNCONTROLLED, PE - PICK-EXIT, CN - CONTROLLED, UK - UNKNOWN TC is created and controlled Dst-Site-Pfx Dst-Site-Id State DSCP TC-ID APP-ID APP Current-Exit Tunnel20 INET Host 10.10.100.222 Tunnel10 10.10.100.0/24 10.10.100.254 CN ef[46] 47 N/A N/A mpls(0:21 0:0)/10.10.1.254/Tu10(Ch:419) 10.10.100.0/24 10.10.100.254 CN af21[18] 43 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:418) 10.10.3.0/28 10.10.3.254 CN ef[46] 5 N/A N/A mpls(0:0 0:0)/10.10.1.254/Tu10(Ch:421) 10.10.200.0/24 10.10.100.254 CN af21[18] 45 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:418) 10.10.200.0/24 10.10.100.254 CN default[0] 44 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:410) Total Traffic Classes: 5 Site: 5 Internet: 0 R4-Site1-MCBR# Host 10.10.1.4 Auto Tunnel (10.10.1.254) EF 164

All Traffic-Classes are Created but Not Controlled TC s have been created but none are controlled. Multiple DSCP and destination siteprefixes are uncontrolled (UC)! Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) 165

All Traffic-Classes are Created but Not Controlled TC s have been created but none are controlled. R4-Site1-MCBR#show domain iwan master traffic-classes summary Multiple DSCP and destination siteprefixes are uncontrolled (UC)! Dst-Site-ID is blank All TC s are uncontrolled APP - APPLICATION, TC-ID - TRAFFIC-CLASS-ID, APP-ID - APPLICATION-ID Current-EXIT - Service-Provider(PFR-label)/Border/Interface(Channel-ID) UC - UNCONTROLLED, PE - PICK-EXIT, CN - CONTROLLED, UK - UNKNOWN Host 10.10.100.222 Dst-Site-Pfx Dst-Site-Id State DSCP TC-ID APP-ID APP Current-Exit 10.10.100.0/24 UC ef[46] 60 N/A N/A UK(0:0 0:0)/10.10.1.254/UK(Ch:0) 10.10.100.0/24 UC af21[18] 59 Tunnel20 N/A N/A UK(0:0 0:0)/10.10.1.253/UK(Ch:0) 10.10.200.0/24 UC ef[46] 62 INET N/A N/A UK(0:0 0:0)/10.10.1.254/UK(Ch:0) 10.10.200.0/24 UC af21[18] 58 N/A N/A UK(0:0 0:0)/10.10.1.253/UK(Ch:0) 10.10.200.0/24 UC default[0] 61 N/A N/A UK(0:0 0:0)/10.10.1.254/UK(Ch:0) Total Traffic Classes: 5 Site: 5 Internet: 0 Tunnel10 Host 10.10.1.4 (10.10.1.254) 166

Which Site Owns the Destination Prefix? Do we have channels to the sites that own those prefixes? We first need to figure out which site owns these prefixes. Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) 167

Which Site Owns the Destination Prefix? Do we have channels to the sites that own those prefixes? R4-Site1-MCBR#show domain iwan master site-prefix Change will be published between 5-60 seconds Next Publish 01:37:21 later Prefix DB Origin: 10.10.1.254 We first Last need publish to Status figure : Peering out Success which Total publish errors : 0 site owns Total learned these prefixes. discards: 0 Prefix Flag: S-From SAF; L-Learned; T-Top Level; C-Configured; M-shared Site-id Site-prefix Last Updated DC Bitmap Flag -------------------------------------------------------------------------------- 10.10.1.254 10.10.1.0/28 00:00:19 ago 0x0 L Hub site owns this prefix Tunnel20 INET 10.10.1.254 10.10.1.16/28 00:00:19 ago 0x0 L 10.10.1.254 10.10.1.254/32 00:33:47 ago 0x0 L 10.10.100.254 10.10.100.254/32 00:28:04 ago 0x1 S 10.10.100.254 10.10.100.0/24 00:28:04 ago 0x1 S,C,M 10.10.200.254 10.10.200.254/32 00:21:06 ago 0x4 S 10.10.100.254 10.10.200.0/24 00:28:04 ago 0x1 S,C,M 255.255.255.255 *10.33.33.0/24 00:33:47 ago 0x0 S,T 255.255.255.255 *10.66.66.0/24 00:33:47 ago 0x0 S,T 255.255.255.255 *10.133.133.0/24 00:33:47 ago 0x0 S,T Host 255.255.255.255 *10.166.166.0/24 00:33:47 ago 0x0 S,T 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) 168

Verify Channel State to the Hub Verify the branch MC has channels created to the site. Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) 169

Verify Channel State to the Hub from the Branch MC Verify the branch MC has channels created to the site. R4-Site1-MCBR#show domain iwan master channels dst-site-id 10.10.100.254 Legend: * (Value obtained from Network delay:) Channel Id: 441 Dst Site-Id: 10.10.100.254 Link Name: mpls DSCP: af21 [18] pfr-label: 0:21 0:0 [0x150000] TCs: 0 BackupTCs: 0 Channel Created: 00:28:42 ago Channel created but no TC Provisional State: Initiated and open Operational state: Available Channel to hub: TRUE Interface Id: 28 Channel is Available Supports Zero-SLA: Yes Muted by Zero-SLA: No Estimated Channel Egress Bandwidth: 6362 Kbps Immitigable Events Summary: Total Performance Count: 0, Total BW Count: 6 Tunnel20 Site Prefix List INET 10.10.100.254/32 (Routable) 10.10.200.0/24 (Active) 10.10.100.0/24 (Active) ODE Statistics: Received: 0 TCA Statistics: Received: 0 ; Processed: 0 ; Unreach_rcvd: 0 ; Local Unreach_rcvd: 0 TCA lost byte rate: 0 TCA lost packet rate: 0 TCA one-way-delay: 0 TCA network-delay: 0 TCA jitter mean: 0 Host 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) 170

Verify Channel State to the Hub from the Branch MC Channel Id: 442 Dst Site-Id: 10.10.100.254 Link Name: inet DSCP: af21 [18] pfr-label: 0:32 0:0 [0x200000] TCs: 0 BackupTCs: 0 Verify the Channel branch Created: MC 00:28:42 has agochannels Provisional State: Initiated and open created to the site. Operational state: Available Channel to hub: TRUE Interface Id: 32 Supports Zero-SLA: Yes Muted by Zero-SLA: No Estimated Channel Egress Bandwidth: 2413 Kbps Immitigable Events Summary: Total Performance Count: 0, Total BW Count: 2 Site Prefix List 10.10.100.254/32 (Routable) 10.10.200.0/24 (Active) 10.10.100.0/24 (Active) ODE Statistics: Received: 0 TCA Statistics: INET Tunnel20 INET Received: 0 ; Processed: 0 ; Unreach_rcvd: 0 ; Local Unreach_rcvd: 0 TCA lost byte rate: 0 TCA lost packet rate: 0 TCA one-way-delay: 0 TCA network-delay: 0 TCA jitter mean: 0 Channel created but no TC Channel is Available Host 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) 171

Verify the Channels on the Branch BR s Check that channels are properly created on the branch BR s and show the correct programming. Host 10.10.100.222 Tunnel20 INET Tunnel10 Host 10.10.1.4 (10.10.1.254) 172

Verify the Channels on the Branch BR s Border Smart Probe Stats: Check that channels are properly created on Channel the branch id: 442 BR s and Version : 3 INET show the correct Site id programming. : 10.10.100.254 NH correct Reachable Probes working Probes working R5-Site1-BR#show domain iwan border channels DSCP : af21[18] Service provider : inet Pfr-Label : 0:32 0:0 [0x200000] Channel state : Initiated and open Channel next hop : 172.17.20.1 RX Reachability : Reachable TX Reachability : Reachable Supports Zero-SLA : Yes Muted by Zero-SLA : No Muted by Path of Last Resort : No channels Tunnel20 Muted by Zero-SLA : No NOTE: Initiated and open INET means this channel was initiated by data. Discovered and open implies it was created by the receipt of a smart probe. Number of Probes sent : 3395 Number of Probes received : 3222 Number of SMP Profile Bursts sent: 1862 Number of Active Channel Probes sent: 186 Number of Reachability Probes sent: 1393 Number of Force Unreaches sent: 0 Last Probe sent : 296 msec Ago Last Probe received: 284 msec ago Number of Data Packets sent : 130688 Number of Data Packets received : 1243229 Smart Probe in Burst: No Smart Probe enable Burst: Yes Host 10.10.1.4 R4-Site1-MCBR# show domain iwan border Border Smart Probe Stats: Channel id: 441 Version : 3 Site id : 10.10.100.254 DSCP : af21[18] Service provider : mpls Pfr-Label : 0:21 0:0 [0x150000] Channel state : Initiated and open Channel next hop : 172.17.10.1 RX Reachability : Reachable TX Reachability : Reachable Supports Zero-SLA : Yes Host 10.10.100.222 NH correct Reachable Tunnel10 Muted by Path of Last Resort : No Number of Probes sent : 3385 Number of Probes received : 3201 Number of SMP Profile Bursts sent: 1831 Number of Active Channel Probes sent: 183 Number of Reachability Probes sent: 1371 Number of Force Unreaches sent: 0 Last Probe sent : 376 msec Ago Last Probe received: 116 msec ago Number of Data Packets sent : 737607 Number of Data Packets received : 0 Smart Probe in Burst: No Smart Probe enable Burst: Yes Probes working Branch Probes MC/BR (10.10.1.254) working 173

What is Known and What Next? What do we know so far? 1. TC is Learned, but UC 2. Affects all TC s 3. We have channels that are reachable to the destination site 4. The DSCP matches our configured policy The next step in TC control is the PDP decision so let s verify that! Tunnel20 INET Host 10.10.1.4 Host 10.10.100.222 Tunnel10 (10.10.1.254) 174

What is Known and What Next? Detail KW! R4-Site1-MCBR#show domain iwan master traffic-classes detail Dst-Site-Prefix: 10.10.100.0/24 DSCP: af21 [18] Traffic Hub class MC id:59 Clock Time: 00:03:42 (EDT) 05/15/2017 TC Learned: 00:33:11 ago 1. TC is Present Learned, State: but UC 00:00:24 later) 2. Affects Destination all TC s Site ID bitmap: 1 Destination Site ID: 0.0.0.0 3. We have Class-Sequence channels in use: that are 20 reachable Class Name: to the destination af21 site using policy User-defined priority 10 packet-loss-rate threshold 2.0 percent 4. The DSCP priority matches 10 byte-loss-rate our configured threshold 2.0 percent BW Updated: 00:00:11 ago policy Method for choosing channel: Random Route Change History: Tunnel20 Channel is Reachable INET What do we know so far? The next step in TC control is the PDP decision Bandwidth so let s verify that! UN-CONTROLLED at border 10.10.1.253 (An attempt to control will be made -------------------------------------------------------------------------------- Policy Decision Point Matrix: Host Host 10.10.100.222 Tunnel10 There is not enough BW! Test Status Legend: U - Usable, R - Reachable, L - TCA Loss, D - TCA Delay, J - TCA Jitter, B - 95% Codes: P - Passed, F - Failed, * - Present Channel, + - Backup Channel Policy match TC is UC! Exit Path-Pref NH U R L D J B -------------------------------------------------------------------------------------- 10.10.1.4 inet(0:32 0:0)/10.10.1.253/Tu20 (Ch:442) Primary Act P P - - - F mpls(0:21 0:0)/10.10.1.254/Tu10 (Ch:441) Fallback Act P P - - - F (10.10.1.254) 175

Check the Bandwidth Available on the Tunnels Root cause was a misconfiguration! Tunnel bandwidth is 100 Kbps by default and wasn t modified. Tunnel20 INET Default BW R4-Site1-MCBR#show int tu10 Tunnel10 is up, line protocol is up Hardware is Tunnel Internet address is 172.17.10.4/24 MTU 9960 bytes, BW 100 Kbit/sec, DLY 20000 usec, reliability 255/255, txload 58/255, rxload 255/255 Encapsulation TUNNEL, loopback not set Host 10.10.100.222 Tunnel10 R4-Site1-MCBR#show int tu20 Tunnel20 is up, line protocol is up Hardware is Tunnel Internet address is 172.17.20.5/24 MTU 9960 bytes, BW 100 Kbit/sec, DLY 20000 usec, reliability 255/255, txload 51/255, rxload 255/255 Encapsulation TUNNEL, loopback not set Host 10.10.1.4 (10.10.1.254) 176

Another Way to See the Problem Exit Utilization We could have also checked the exits available on the branch MC and verified utilization. Here we see the utilization far exceeds the configured capacity! Host 10.10.100.222 R4-Site1-MCBR#show domain iwan master exits Tunnel20 INET Problem! Tunnel10 BR address: 10.10.1.254 Name: Tunnel10 type: external Path: mpls path-id: 0 PLR TCs: 0 Egress capacity: 100 Kbps Egress BW: 23 Kbps Ideal:10166 Kbps under: 10143 Kbps Egress Utilization: 23 % BR address: 10.10.1.253 Name: Tunnel20 type: external Path: inet path-id: 0 PLR TCs: 0 Egress capacity: 100 Kbps Egress BW: 20310 Kbps Ideal:10166 Kbps over: 10144 Kbps Egress Utilization: 20310 % Host 10.10.1.4 (10.10.1.254) 177

Another Way to See the Problem PDP Debugs R4-Site1-MCBR#debug domain iwan master pdp path-selection May 14 23:31:02.002 EDT: CENT:MC:PDP:[0]:Backoff Timer timeout for TC id:59. Goto pickexit May 14 23:31:02.002 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: We can use PDP debugs to creating understand candidates the for TC path-selection id: 59 process and why the TC remains uncontrolled. <output omitted> May 14 23:31:02.003 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: check_best_of_worst:moving to Chan 442 will exceed 95% of BW May 14 23:31:02.003 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: (exit bw:24049)(95% exit cap:95) May 14 23:31:02.004 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: check_best_of_worst:moving to Chan 441 will exceed 95% of BW May 14 23:31:02.004 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: (exit bw:5057)(95% exit cap:95) <output omitted> May 14 23:31:02.004 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: Tunnel20 INET Perfect channels not available for this TC May 14 23:31:02.004 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: Best of the worst channels not available too May 14 23:31:02.004 EDT: MC-PDP:[0]:TC[59, 10.10.100.254{ipv4}, 10.10.100.0{ipv4}, 0x12, 0]: no exits found. goto uncontrolled Host 10.10.100.222 BW Greater than capacity! BW Greater than capacity! Unable to find an exit Tunnel10 Host 10.10.1.4 (10.10.1.254) 178

Correcting The Problem After fixing the tunnel BW, TC s are now controlled! Host 10.10.100.222 R4-Site1-MCBR#show domain iwan master traffic-classes summary APP - APPLICATION, TC-ID - TRAFFIC-CLASS-ID, APP-ID - APPLICATION-ID Current-EXIT - Service-Provider(PFR-label)/Border/Interface(Channel-ID) UC - UNCONTROLLED, PE - PICK-EXIT, CN - CONTROLLED, UK - UNKNOWN TC s are now controlled on both paths! Dst-Site-Pfx Dst-Site-Id State DSCP TC-ID APP-ID APP Current-Exit Tunnel20 INET Tunnel10 10.10.100.0/24 10.10.100.254 CN ef[46] 60 N/A N/A mpls(0:21 0:0)/10.10.1.254/Tu10(Ch:443) 10.10.100.0/24 10.10.100.254 CN af21[18] 59 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:442) 10.10.200.0/24 10.10.100.254 CN ef[46] 62 N/A N/A mpls(0:21 0:0)/10.10.1.254/Tu10(Ch:443) 10.10.200.0/24 10.10.100.254 CN af21[18] 58 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:442) 10.10.200.0/24 10.10.100.254 CN default[0] 61 N/A N/A inet(0:32 0:0)/10.10.1.253/Tu20(Ch:445) Total Traffic Classes: 5 Site: 5 Internet: 0 Host 10.10.1.4 (10.10.1.254) 179

Learning a TC Summary Minimum requirements must be met before PfR can control traffic. The destination must be reachable over the WAN interface. PMI cache entries are exported to the MC for the flow. The policy and site-prefix on the MC must match the TC for it to be controlled. Exits must have sufficient BW for the TC. Channels must be reachable to the destination site-id. PDP on the MC will choose the exit based on configured preference, BW and performance of the channel. 180

Traffic-Classes Gone Bad

Interpreting Traffic-Class Performance In order to understand how traffic has been or is currently optimized, it is necessary to read the traffic-class output. Within a traffic-class, there are several key things to note related to optimization: 1. Is the TC controlled? 2. Is the TC in policy? 3. What is the primary path/channel? 4. What class name/policy does the TC fall under? 5. When was the bandwidth last updated? 6. Have there been any path changes (recent or otherwise)? show domain <domain> master traffic-class [summary] provides the details of all active traffic-classes. Filtering options are available as well, which we will explore in later examples. 182

Viewing Programmed Ingress PMI Details Dallas_Branch_MC#show domain iwan master policy class VOICE sequence 10 path-preference fallback INET class type: Dscp Based match dscp ef policy voice priority 2 packet-loss-rate threshold 1.0 percent priority 1 one-way-delay threshold 150 msec priority 3 jitter threshold 30000 usec priority 2 byte-loss-rate threshold 1.0 percent class BULK-DATA sequence 20 path-preference fallback INET class type: Dscp Based match dscp af11 policy bulk-data priority 2 packet-loss-rate threshold 5.0 percent priority 1 one-way-delay threshold 300 msec priority 2 byte-loss-rate threshold 5.0 percent match dscp af12 policy bulk-data priority 2 packet-loss-rate threshold 5.0 percent priority 1 one-way-delay threshold 300 msec priority 2 byte-loss-rate threshold 5.0 percent Number of Traffic classes using this policy: 1 <OUTPUT OMITTED> EIGRP SAF Branch BR PMI 183

Viewing Programmed Ingress PMI Details asheville_branch_br#show domain iwan border pmi Asheville_Branch_MC#show domain iwan master policy Ingress policy CENT-Policy-Ingress-0-6: class VOICE sequence 10 Ingress policy activated on: path-preference fallback INET Tunnel20 Tunnel10 class type: Dscp Based ------------------------------------------------------------------------- match dscp ef policy voice PMI[Ingress-per-DSCP]-FLOW MONITOR[MON-Ingress-per-DSCP-0-48-6] priority 2 packet-loss-rate threshold 1.0 percent monitor-interval:30 priority 1 one-way-delay threshold 150 msec key-list: priority 3 jitter threshold 30000 usec pfr site source id ipv4 priority 2 byte-loss-rate threshold 1.0 percent class BULK-DATA sequence 20 path-preference fallback INET class type: Dscp Based match dscp af11 policy bulk-data Non-key-list: priority 2 packet-loss-rate threshold 5.0 percent transport packets lost rate priority 1 one-way-delay threshold 300 msec transport bytes lost rate priority 2 byte-loss-rate threshold 5.0 percent pfr one-way-delay match dscp af12 policy bulk-data network delay average priority 2 packet-loss-rate threshold 5.0 percent transport rtp jitter inter arrival mean priority 1 one-way-delay threshold 300 msec counter bytes long priority 2 byte-loss-rate threshold 5.0 percent counter packets long Number of Traffic classes using this policy: 1 timestamp absolute monitoring-interval start <OUTPUT OMITTED> pfr site destination id ipv4 ip dscp interface input policy performance-monitor classification hierarchy pfr label identifier <OUTPUT OMITTED> EIGRP SAF PMI Branch BR 184

Viewing Programmed Ingress PMI Details Asheville_Branch_BR#show domain iwan border pmi Asheville_Branch_MC#show domain iwan master policy Ingress policy CENT-Policy-Ingress-0-6: class VOICE sequence 10 Ingress policy activated on: path-preference fallback INET Tunnel20 Tunnel10 class type: Dscp Based ------------------------------------------------------------------------- match dscp ef policy voice PMI[Ingress-per-DSCP]-FLOW MONITOR[MON-Ingress-per-DSCP-0-48-6] priority 2 packet-loss-rate threshold 1.0 percent monitor-interval:30 Asheville_Branch_BR#show domain iwan border pmi priority 1 one-way-delay threshold 150 msec key-list: priority 3 jitter threshold 30000 usec pfr site source <OUTPUT id ipv4omitted> priority 2 byte-loss-rate threshold 1.0 percent class BULK-DATA sequence 20 path-preference fallback INET class type: Dscp Based match dscp af11 policy bulk-data Non-key-list: priority 2 packet-loss-rate threshold 5.0 percent <OUTPUT OMITTED> pfr site destination id ipv4 ip dscp interface input policy performance-monitor classification hierarchy <OUTPUT OMITTED> DSCP-list: ef-[class:cent-class-ingress-dscp-ef-0-18] packet-loss-rate:react_id[60]-priority[2]-threshold[1.0 percent] pfr label identifier one-way-delay:react_id[61]-priority[1]-threshold[150 msec] network-delay-avg:react_id[62]-priority[1]-threshold[300 msec] transport packets lost jitter:react_id[63]-priority[3]-threshold[30000 rate usec] priority 1 one-way-delay threshold 300 msec transport bytes lost rate byte-loss-rate:react_id[64]-priority[2]-threshold[1.0 percent] priority 2 byte-loss-rate threshold 5.0 percent pfr one-way-delay af11-[class:cent-class-ingress-dscp-af11-0-19] match dscp af12 policy bulk-data network delay average packet-loss-rate:react_id[65]-priority[2]-threshold[5.0 percent] priority 2 packet-loss-rate threshold 5.0 percent transport rtp jitter inter one-way-delay:react_id[66]-priority[1]-threshold[300 arrival mean msec] priority 1 one-way-delay threshold 300 msec counter bytes long network-delay-avg:react_id[67]-priority[1]-threshold[600 msec] priority 2 byte-loss-rate threshold 5.0 percent counter packets long byte-loss-rate:react_id[68]-priority[2]-threshold[5.0 percent] Number of Traffic classes using this policy: 1 timestamp absolute monitoring-interval start EIGRP SAF PMI Branch BR 185

Viewing Programmed Ingress PMI Details Asheville_Branch_BR#show domain iwan border pmi Asheville_Branch_MC#show domain iwan master policy Ingress policy CENT-Policy-Ingress-0-6: class VOICE sequence 10 Ingress policy activated on: path-preference fallback INET Tunnel20 Tunnel10 class type: Dscp Based ------------------------------------------------------------------------- match dscp ef policy voice PMI[Ingress-per-DSCP]-FLOW MONITOR[MON-Ingress-per-DSCP-0-48-6] priority 2 packet-loss-rate threshold 1.0 percent monitor-interval:30 Asheville_Branch_BR#show domain iwan border pmi priority 1 one-way-delay threshold 150 msec key-list: priority 3 jitter threshold 30000 usec pfr site source <OUTPUT id ipv4omitted> priority 2 byte-loss-rate threshold 1.0 percent pfr site destination id ipv4 ip dscp DSCP-list: interface input ef-[class:cent-class-ingress-dscp-ef-0-18] policy performance-monitor classification hierarchy class BULK-DATA sequence 20 path-preference fallback INET class type: Dscp Based match dscp af11 policy bulk-data Non-key-list: priority 2 packet-loss-rate threshold 5.0 percent packet-loss-rate:react_id[60]-priority[2]-threshold[1.0 percent] pfr label identifier one-way-delay:react_id[61]-priority[1]-threshold[150 msec] network-delay-avg:react_id[62]-priority[1]-threshold[300 msec] transport packets lost jitter:react_id[63]-priority[3]-threshold[30000 rate usec] priority 1 one-way-delay threshold 300 msec transport bytes lost rate byte-loss-rate:react_id[64]-priority[2]-threshold[1.0 percent] priority 2 byte-loss-rate threshold 5.0 percent pfr one-way-delay af11-[class:cent-class-ingress-dscp-af11-0-19] match dscp af12 policy bulk-data network delay average packet-loss-rate:react_id[65]-priority[2]-threshold[5.0 percent] priority 2 packet-loss-rate threshold 5.0 percent transport rtp jitter inter one-way-delay:react_id[66]-priority[1]-threshold[300 arrival mean msec] priority 1 one-way-delay threshold 300 msec counter bytes long network-delay-avg:react_id[67]-priority[1]-threshold[600 msec] priority 2 byte-loss-rate threshold 5.0 percent counter packets long byte-loss-rate:react_id[68]-priority[2]-threshold[5.0 percent] Number of Traffic classes using this policy: 1 timestamp absolute monitoring-interval start <OUTPUT OMITTED> <OUTPUT OMITTED> EIGRP SAF PMI The policy parameters for each DSCP are matched to the PMI that is pushed to the BR and monitored per attached tunnel interface. Branch BR 186

Reviewing Traffic-Class Output TC is CONTROLLED by PfR. ISR4K_MC_Branch2# show domain iwan master traffic-classes Latest Performance Monitor reading indicated TC is in policy. Dst-Site-Prefix: 10.10.100.0/28 DSCP: af12 [12] Traffic class id:19 Clock Time: 17:57:07 (UTC) 03/07/2017 TC Learned: 00:03:06 ago Present State: CONTROLLED Current Performance Status: in-policy Current Service Provider: since 00:02:35 Previous Service Provider: Unknown Active channel is 58. BW Used: 0 Kbps Present WAN interface: Tunnel10 in Border 10.10.3.254 Present Channel (primary): 58 pfr-label:0:1 0:0 [0x10000] Backup Channel: 59 INET pfr-label:0:2 0:0 [0x20000] Destination Site ID bitmap: 1 Destination Site ID: 10.10.100.254 (Active) Class-Sequence in use: 20 Class Name: BULK-DATA using policy bulk-data BW Updated: 00:01:36 ago Reason for Latest Route Change: Uncontrolled to Controlled Transition Route Change History: Date and Time Previous Exit Current Exit Reason Current path is (Tunnel10). AF12 TC and is matched by the BULK-DATA class defined in the hub MC policy. No path changes have been reported by PfR. 1: 17:54:31 (UTC) 03/07/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) Uncontrolled to Controlled Transition -------------------------------------------------------------------------------- Total Traffic Classes: 1 Site: 1 Internet: 0 187

Help! Our Traffic-Class is Out Of Policy! TC is out of policy on this path! Hub DC Traffic is back in policy! This notification is known as a TCA. INET I need to move this TC to INET! Branch 188

What is a TCA? Threshold Crossing Alerts (TCA) are used by PfR to notify remote, and sometimes local, Master Controllers when a policy has been violated or unreachability detected. These are generated by a Border Router and sent to the remote site-id from which the policy violation or unreachability occurred. The generated TCA is also flooded out via all other BR s at the site on their paths to try to ensure receipt at the remote site MC. Other local BR s also generate an On-Demand Export (ODE) to provide locallyterminated path measurements to the remote end so that the remote site MC can make an informed decision about where to place the traffic. TCA s and ODE s are recorded per channel and statistics are visible on the local Master Controller channels (show domain <domain> master channels). 189

How are TCA s Generated? The ingress BR that detected the condition generates two copies of the TCA: 1. One copy is sent over the locally terminated path(s) destined to the remote MC on UDP port 9997 2. The second copy is flooded to other BRs at the site on UDP port 9996 to be sent over their locally terminated paths. All BRs are listening on this port. This process is to ensure that if the path where the violation was detected is degraded, the remote site has a better chance to receive the TCA to make a path change. The TCAs are always generated by the BRs with the local MC site-id destined to the remote MC site-id based on the TC parameters. When a BR generates a TCA, it contains all necessary performance information for that DSCP on that path. However, because the TCA is specific to that path, other BR s must generate an On-Demand Export for their local paths. 190

On-Demand Export (ODE) On-Demand Exports (ODE) are packets generated by a BR that provide performance information for each path terminated on the BR. These are generated upon the receipt of a TCA from another BR in the same site. ODE s provide a remote MC the information necessary to make an intelligent decision about how traffic should be optimized and the path that should be chosen in accordance with the defined path preference. The MC maintains two ODE buckets on each active channel that store the last two exports received. If no ODE s have been received, this value will be empty. 191

Single-Router Branch Topology Tunnel10 INET Tunnel20 (10.10.3.254) 192

TCA Generation Visualized I received a TCA on UDP port 9996! The unreachable timer expired! TCA-1 ODE TCA-1 TCA-1 Same TCA received on both paths Tunnel10 INET Tunnel20 At this point, the branch MC has the information necessary to make a decision about where to place this traffic. (10.10.3.254) INET TCA-1 TCA-1 ODE 193

Traffic-Class Timers Primary Path Evaluation Timer When a path change occurs, a 3 minute evaluation timer is added. Assuming there are no TCAs during this time on the new active path, we will not consider the primary path to move traffic back until this timer expires. Path Change Hold-Down Timer When a path-change occurs, a 90 second hold-down timer is initiated on the TC. During this 90 seconds, which constitutes the first half of the Primary Path Evaluation Timer, any TCA due to an OOP condition on the new active path will be withheld until the 90 seconds expires. This is used to avoid flapping of the TC and serves as a dampening measure. If an unreachable TCA is received on the new active path during this 90-second interval, we will process this immediately and move it either (a) to the primary path or (b) to an alternate secondary path if path measurement is better. Secondary Path Change Hold-Down Timer Constitutes the last 90 seconds of the Primary Path Evaluation Timer If an OOP TCA is received on the current active path (backup path for path-preference), we move into a best-of-worst evaluation. Otherwise, once this timer expires, the full 3 minutes are now up and we can consider the primary path as a candidate for this traffic again. 194

Packet Loss vs. Byte Loss Packet Loss is measured by Performance Monitor by checking the sequence number in RTP frames. Because Smart Probes are RTP frames, we can use these to measure packet loss in the absence of other RTP traffic. Byte Loss is computed by Performance Monitor only through TCP traffic. The sequence number is checked relative to the previous frames. In order to consider a packet as lost, it must fall outside of a three-packet window. Even if a packet is received out-of-order but is within this window, Performance Monitor considers the stream as recovered and doesn t count this against the metrics of the stream. Packet 1 Packet 3 Packet 4 Packet 5 Packet 2 Packet 6 Packet Received in Expected Window 195

Analyzing RTP Traffic in Wireshark An RTP stream consists of three key fields: 1. Synchronization Source Identifier (SSRC) Uniquely identifies the stream 2. Sequence Number Per stream value used to measure packet loss 3. Timestamp Used for measuring jitter 196

Analyzing RTP Traffic in Wireshark Wireshark provides built-in capability to analyze RTP streams to display jitter and loss holistically on a flow as well as jitter per packet. To analyze an RTP stream, select Telephony > RTP > Stream Analysis... 197

Analyzing RTP Traffic in Wireshark Average Jitter Direction of Measurement Packet Loss SSRC of Flow 198

Investigating Traffic-Class Byte Loss ISR4K_MC_Branch2#show domain iwan master traffic-classes Dst-Site-Prefix: 10.10.100.0/28 DSCP: af12 [12] Traffic class id:19 Clock Time: 17:57:07 (UTC) 03/07/2017 TC Learned: 00:03:06 ago Present State: CONTROLLED Current Performance Status: in-policy Current Service Provider: since 00:02:35 Previous Service Provider: Unknown BW Used: 157 Kbps Present WAN interface: Tunnel10 in Border 10.10.3.254 Present Channel (primary): 58 pfr-label:0:1 0:0 [0x10000] Backup Channel: 59 INET pfr-label:0:2 0:0 [0x20000] Destination Site ID bitmap: 1 Destination Site ID: 10.10.100.254 (Active) Class-Sequence in use: 20 Class Name: BULK-DATA using policy bulk-data BW Updated: Tunnel10 00:01:36 ago INET Reason for Latest Route Change: Uncontrolled to Controlled Transition Route Change History: Date and Time Previous Exit Current Exit Reason (10.10.3.254) TC learned 3 minutes ago In policy and on the correct path No path changes have been observed since the TC was learned Tunnel20 1: 17:54:31 (UTC) 03/07/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) Uncontrolled to Controlled Transition -------------------------------------------------------------------------------- Total Traffic Classes: 1 Site: 1 Internet: 0 199

Investigating Traffic-Class Byte Loss TC affected is destined to the hub. TCA was originated at the remote end. How did PfR respond? Let s verify! Tunnel10 INET Tunnel20 *Mar 7 18:01:30.383: %DOMAIN-5-TCA: TCA Received. Details: Instance id=0: VRF=default: Source Site ID=10.10.3.254: Destination Site ID=10.10.100.254: TCA-ID=203260: TCA-Origin=10.10.100.254(R): Exit=[CHAN-ID=58, BR-IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:1 0:0 [0x10000]]]: Policy Violated=BULK-DATA: thresholds(actual/config)=p2=byte-loss-rate(13.79/5.0)] Occurred on the path. TCA Received! Byte loss threshold was violated. (10.10.3.254) DSCP affected was AF12. 200

Investigating Traffic-Class Byte Loss We know from the TCA the reason was Hub loss. BR The path was changed as a response to the TCA! *Mar 7 18:01:33.059: Tunnel10 %DOMAIN-5-TC_PATH_CHG: Traffic class Path Changed. Details: Instance=0: VRF=default: Tunnel20 Source Site ID=10.10.3.254: Destination Site ID=10.10.100.254: Reason=Loss: TCA-ID=203261: Policy Violated=BULK-DATA: TC=[Site id=10.10.100.254, TC ID=19, Site prefix=10.10.100.0/28, DSCP=af12(12), App ID=0]: Original Exit=[CHAN-ID=58, BR-IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:1 0:0 [0x10000]]]: New Exit=[CHAN-ID=59, BR-IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel20, Path=INET[label=0:2 0:0 [0x20000]]] INET The site-prefix and DSCP details are provided! The TC has Branch now MC/BR been placed (10.10.3.254) on the INET path! The original path was. 201

Investigating Traffic-Class Byte Loss ISR4K_MC_Branch2#show domain iwan master traffic-classes Dst-Site-Prefix: 10.10.100.0/28 DSCP: af12 [12] Traffic class id:19 Clock Time: 18:01:39 (UTC) 03/07/2017 active. TC Learned: 00:07:38 ago Present State: CONTROLLED Current Performance Status: in-policy PfR notes that this is a fallback Current Service Provider: INET since 00:00:06 (hold until 83 sec) provider. The primary provider reevaluation Previous Service Provider: pfr-label: 0:1 0:0 [0x10000] for 421 sec (A fallback provider. Primary provider will be re-evaluated 00:02:55 later) will occur 3 minutes after BW Used: 0 Kbps the TC changes paths. Present WAN interface: Tunnel20 in Border 10.10.3.254 Present Channel (primary): 59 INET pfr-label:0:2 0:0 [0x20000] Backup Channel: 58 pfr-label:0:1 0:0 [0x10000] Path change is noted with the time, Destination Site ID bitmap: 1 Destination Site ID: 10.10.100.254 (Active) previous and current exit, and the Class-Sequence in use: 20 reason, which matches the byte-loss Class Name: BULK-DATA using policy bulk-data BW Updated: 00:00:38 ago we saw in the TCA. Tunnel10 INET Reason for Latest Route Change: Loss Tunnel20 Route Change History: Date and Time Previous Exit Current Exit Reason 1: 18:01:33 (UTC) 03/07/17 (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) INET(0:2 0:0)/10.10.3.254/Tu20 (Ch:59) Out-of- Policy (Loss Rate Bytes: 13.79%) 2: 17:54:31 (UTC) 03/07/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) Uncontrolled to Controlled Transition -------------------------------------------------------------------------------- Total Traffic Classes: 1 Site: 1 Internet: 0 (10.10.2.254) TC has now moved to the INET path. Hold down timer of 90 seconds is 202

Investigate Traffic-Class Byte Loss Tunnel10 ISR4K_MC_Branch2#show domain iwan master channels dst-site-id 10.10.100.254 Channel Id: 58 Dst Site-Id: 10.10.100.254 Link Name: DSCP: af12 [12] pfr-label: 0:1 0:0 [0x10000] TCs: 1 BackupTCs: 0 <OUTPUT OMITTED> ODE Statistics: Received: 1 ODE Stats Bucket Number: 1 Last Updated : 00:03:14 ago Packet Count : 5452 Byte Count : 228545 One Way Delay : 0 msec* Loss Rate Pkts: 0.0 % Loss Rate Byte: 13.79 % Jitter Mean : 40 usec Unreachable : FALSE TCA Statistics: Received: 1 ; Processed: 1 ; Unreach_rcvd: 0 ; Local Unreach_rcvd: 0 TCA lost byte rate: 1 TCA lost packet rate: 0 TCA one-way-delay: 0 TCA network-delay: 0 TCA jitter mean: 0 Latest TCA Bucket Last Updated : 00:03:14 ago One Way Delay : NA Loss Rate Pkts: NA Loss Rate Byte: 13.79 % Jitter Mean : NA Unreachability: FALSE INET The reported byte loss in the ODE bucket matches what was reported in the TCA and TC path history. One byte loss TCA has been received and processed locally by the branch MC. Latest TCA bucket is reported. Tunnel20 Note that in the TCA bucket, only the byte loss value is recorded. Because the remote site (hub) only saw a violation of the byte loss threshold, it has no need to report on the remaining values and sends only the violated Branch byte loss MC/BR threshold in the ODE export. (10.10.3.254) 203

What Information is Missing? We know the TC that was affected/moved (10.10.100.0/28, DSCP AF12) but we don t know specifically which flow was monitored where the loss occurred. Performance Monitor aggregates statistics per traffic-class when exporting to PfR. In larger-sized TC s (with a smaller subnet mask), there may be multiple active flows which are being monitored simultaneously. We need a way to isolate the flow(s) in question to narrow our troubleshooting. A couple of methods exist, including the use of AVC and EPC. It s also possible to use EEM in concert with these tools to script collection of data if this may occur at random or off-hours times. Let s see how to use these! 204

Measuring Performance with AVC ip access-list extended MMA_ACL permit tcp any any dscp <DSCP>! flow record type performance-monitor MMA_TCP match ipv4 protocol match ipv4 source address match ipv4 destination address match transport source-port match transport destination-port collect counter packets collect ipv4 dscp collect counter bytes collect transport packets expected counter collect transport bytes expected collect transport bytes lost collect transport packets lost counter collect transport packets lost rate collect application name collect interface input collect interface output! flow monitor type performance-monitor MMA_MON record MMA_TCP class-map match-any MMA_TCP match access-group name MMA_ACL policy-map type performance-monitor MMA_TCP class MMA_TCP flow monitor MMA_MON interface <Tunnel> service-policy type performance-monitor input MMA_TCP Match the correct DSCP value in the TCP flow and optionally the site-prefix of the remote TC affected Capture the expected vs. actual bytes/packets received Configure the flow monitor in the AVC service-policy Apply the policy to the tunnel If AVC is already configured on the tunnel, the non-key fields for byte/packet count collection can be added and the existing policy leveraged. 205

Measuring Performance with AVC - Output Asheville_Branch#show performance monitor cache monitor MMA_MON detail format csv Monitor: MMA_MON Data Collection Monitor: Cache type: Synchronized (Platform cache) Cache size: 10000 Current entries: 2 High Watermark: 21 Flows added: 30 Flows aged: 28 Synchronized timeout (secs): 30 IPV4 SRC ADDR,IPV4 DST ADDR,TRNS SRC PORT,TRNS DST PORT,IP PROT,intf input,intf output,bytes,pkts,ip dscp,app name,trns cnt pkts expect,trns cnt pkts lost,trns bytes lost,trns bytes expected 10.10.3.45,10.10.100.12,11826,80,6,Gi0/0/0,Tu10,5170225,3706,0x22,port http,0,0,0,5021985 10.10.3.67,10.10.100.13,15746,80,6,Gi0/0/0,Tu10,2441,6,0x22,port http,0,0,245,32555 10.10.3.108,10.10.100.12,16554,80,6,Tu10,Gi0/0/2,2441,6,0x22,port http,0,0,0,2197 <OUTPUT OMITTED> To filter out non-loss flows with this policy, exclude 0,0,0 can be added to the end of the command. This is useful in situations with a lot of unique traffic flows. Other formatting options are available as well. The performance monitor measured 245 bytes lost in this flow! With a TC of 10.10.100.0/24, we ve now narrowed this down to a source/destination IP pair to troubleshoot. 206

Measuring Performance with EPC Embedded Packet Capture (EPC) is a tool built into IOS/IOS-XE that allows capturing traffic inline on the device in the CEF and process-switched path. If you already have a source/destination IP pair experiencing problems, this method can be used initially. Additionally, this can be paired with AVC once a flow is identified to validate the amount of loss in the flow and provide direction on a potential source. monitor capture LOSS interface Tu10 both match ipv4 protocol tcp any any buffer size 2 circular monitor capture LOSS start monitor capture LOSS stop monitor capture LOSS export bootflash:tcp_loss.pcap IOS-XE monitor capture buffer BUF size 1024 max-size 1500 linear monitor capture buffer BUF filter access-list LOSS monitor capture point ip cef CAP Tu10 both monitor capture point associate CAP BUF monitor capture point start all IOS monitor capture point stop all monitor capture buffer BUF export flash:tcp_loss.pcap An inline filter or ACL can be specified in IOS-XE. 207

Embedded Packet Capture Tools Key Advantages and Benefits Exec-level commands to start and stop the capture, define buffer size, buffer type (linear or circular) and packet size to capture Facility to export the packet capture in PCAP format suitable for analysis Useful when it is not possible to tap into the network using a stand-alone packetsniffing tool, or when need arises to remotely debug and troubleshoot issues Capture rate can be throttled using further administrative controls. For example, using an Access Control List (ACL), specify maximum packet capture rate or specify a sampling interval Show commands to display packet contents on the device itself 208

Embedded Packet Capture (EPC) Configuration Steps IOS-XE 15.3(2)S / 3.9(0)S release: Router# monitor capture MYCAP buffer circular packets 10000 Router# monitor capture MYCAP buffer size 10 Router# monitor capture MYCAP interface Gig0/0/1 in Router# monitor capture MYCAP access-list MYACL Router# monitor capture MYCAP start Capture point Router# monitor capture MYCAP stop Router# monitor capture MYCAP export bootflash:epc1.pcap IOS versions on ISR platforms Router# mon cap buffer MYBUF size 256 max-size 256 circular Router# mon cap buffer MYBUF filter access-list MYACL Router# mon cap point ip cef IPCEFCAP Gig0/0/1 both Router# mon cap point associate IPCEFCAP MYBUF Router# mon cap point start IPCEFCAP Router# mon cap point stop IPCEFCAP Router# mon cap buffer MYBUF export tftp://1.1.1.1/epc1.pcap TFTP Server Capture point Gi0/0/1 Steps to Configure: 1. Define capture buffer 2. Define capture point 3. Associate capture buffer and point (depends on the platform and OS version) 4. Capture data Router Export Data 5. Export/display captured data Capture Buffer Gi0/0/2 209

Embedded Packet Capture (EPC) Analyzing the Traffic on the Device ASR# show monitor capture CAP parameter monitor capture CAP interface Gig0/0/2 both monitor capture CAP access-list test monitor capture CAP buffer size 10 monitor capture CAP limit pps 1000 ASR# show mon cap CAP buffer buffer size (KB) : 10240 buffer used (KB) : 128 packets in buf : 5 packets dropped : 0 packets per sec : 1 Indicates total number of packets in the capture buffer brief option provides basic information of the traffic like source/destination IP address, protocol type, packet length ASR# show monitor capture CAP buffer? brief brief display detailed detailed display dump for dump Output modifiers <cr> ASR# show monitor capture CAP buffer brief ------------------------------------------------------------------- # size timestamp source destination protocol ------------------------------------------------------------------- 0 114 0.000000 10.254.0.2 -> 100.100.100.1 ICMP 1 114 0.000992 10.254.0.2 -> 100.100.100.1 ICMP 2 114 2.000992 10.254.0.2 -> 100.100.100.1 ICMP 210

Embedded Packet Capture (EPC) Analyzing the Traffic on the Device ASR# show monitor capture CAP buffer detail ------------------------------------------------------------- # size timestamp source destination protocol ------------------------------------------------------------- 0 114 0.000000 10.254.0.2 -> 100.100.100.1 ICMP 0000: 0014A8FF A4020008 E3FFFC28 08004500...(..E. 0010: 00649314 0000FF01 551F0AFE 00026464.d...U...dd 0020: 64010800 DF8F0012 00000000 000029E8 d...). 0030: 74C0ABCD ABCDABCD ABCDABCD ABCDABCD t... detail option provides result of both brief and dump options Destination MAC Source MAC Source IP Destination IP ASR# monitor capture CAP export bootflash:my_capture.pcap Exported Successfully Save capture in standard PCAP format 211

EPC Control-Plane Traffic Configuration Steps IOS-XE 15.3(2)S / 3.9(0)S release: Router# monitor capture CPCAP buffer circular packets 10000 Router# monitor capture CPCAP buffer size 10 Router# monitor capture CPCAP control-plane both Router# monitor capture CPCAP access-list MYACL Router# monitor capture CPCAP start Router# monitor capture CPCAP stop Router# monitor capture CPCAP export bootflash:epc1.pcap from-us can be specified to capture locally-originated traffic IOS versions on ISR platforms Router# mon cap buffer CPBUF size 256 max-size 256 circular Router# mon cap buffer CPBUF filter access-list MYACL Router# mon cap point ip process-switched CPCAP both Router# mon cap point associate CPCAP CPBUF Router# mon cap point start CPCAP Router# mon cap point stop CPCAP Router# mon cap buffer CPBUF export tftp://1.1.1.1/epc1.pcap 212

Debugging Byte Loss Using FME Flow Metric Engine (FME) is a platform-specific feature that is utilized in IOS- XE that provides more granularity in reporting. High-level TCP forwarding statistics can be seen with show platform hardware qfp active feature fme datapath process counters. This can be used to quickly see well-known TCP issues, including out-of-order and retransmitted packets. ISR4K_Branch1#show platform hardware qfp active feature fme datapath process counters RTP Packets : 2427784 UDP Packets : 1011113663 TCP Packets : 6571156 Out-of-order TCP Packets : 12614 Retransmitted TCP Packets : 6269 RTT is zero : 0 RTP validated : 23 RTP revalidated : 148 Out-of-order RTP Packets : 0 FT failures : 685 RTP metric retrieval failures : 0 213

Debugging Byte Loss Using FME FME can also be used at a platform debugging level to identify a source flow that is exhibiting byte loss and provide more statistics about the flow. Create an ACL matching traffic to and from the affected subnet for the TC exhibiting byte loss. Configure FME debugs, specifying the ACL as a condition to limit debugging to traffic to/from this subnet and the Tunnel. Capture CPP tracelogs to identify the source/destination IP of the flow that is affected. 214

Debugging Byte Loss Using FME ip access-list extended TCP-LOSS permit ip <subnet> any permit ip any <subnet> debug platform condition interface <Tunnel interface> ipv4 access-list TCP-LOSS both debug platform condition feature fme dataplane submode tcp packet level verbose debug platform condition start debug platform condition stop Timestamp corresponds to when the debug was run. ASR_Hub_BR#show bootflash:... -#- --length-- ---------date/time--------- path 11 8192 May 12 2017 09:28:27.0000000000 +00:00 /bootflash/tracelogs/cpp_cp_f0-0.log.30961.20170512125152 215

Debugging Byte Loss Using FME Source IP and Port Dest. IP and Port 4/01 12:52:28.974 [cpp-dp-fme]: [30961]: (info): QFP:0.0 Thread:239 TS:00001790455407661908 :pkt:[10.13.0.41] 443 => [172.16.42.122] 52816 6 (0): MMA params 0x504f86e0 bitmap 0xf000000000030:0x0 TCP RTP Jitter 04/01 12:52:28.974 [cpp-dp-fme]: [30961]: (info): QFP:0.0 Thread:239 TS:00001790455407771650 :tcp:[10.13.0.41] 443 => [172.16.42.122] 52816 6 (0): tcp flags 0x10, has_syn=0,is_first=0, app_resolved=1,seq_num=2331908863, *expected_offset=2331908863, stored offset=2331910223,oop_cnt=3, bytes_loss=1207610, bytes_expected=1360,fme_flags=0x82, 04/01 12:52:28.974 [cpp-dp-fme]: [30961]: (verbose): QFP:0.0 Thread:239 TS:00001790455407903537 :tcp:[10.13.0.41] 443 => [172.16.42.122] 52816 6 (0): expected offset 2331908863, bytes expected 1360, bytes_loss=0 04/01 12:52:28.976 [cpp-dp-fme]: [30961]: (info): QFP:0.0 Thread:239 TS:00001790455407990178 :pkt:[10.13.0.41] 443 => [172.16.42.122] 52816 6 (0): IPv4 (ingress) Server Len(1400) Data(1360) MediaBytes(1360) RTT(0) Non-zero bytes_loss indicates sequence number didn t match expected value for next-in-flow frame 216

Bringing Everything Together with EEM Embedded Event Manager (EEM) allows for scripting actions based on an event. In this case, a logged TCA will trigger the router to capture data. Check for AF12 in the TCA log message. event manager applet BYTE_LOSS event syslog pattern "DOMAIN-5-TCA" action 100 cli command "enable" action 101 cli command "show log in TCA" action 102 string first af12" "$_cli_result" action 103 if $_string_result ne "-1" action 104 syslog msg TCA for af12 found" action 105 cli command "monitor capture LOSS stop" action 106 cli command "show domain iwan master channels dst-site-id 10.10.100.254 append bootflash:unreach_output action 107 cli command "show domain iwan master traffic-class dst-site-pfx 10.10.100.0/28 append bootflash:unreach_output" action 108 cli command "show monitor capture LOSS buffer append bootflash:unreach_output action 109 cli command "show log append bootflash:unreach_output" action 110 cli command "monitor capture LOSS export bootflash:byte_loss.pcap" action 111 cli command "clear log" action 112 end Log message indicating script triggered Capture required outputs and append to file. 217

Self-Deleting EEM Script After TCA It is sometimes useful to have EEM disable itself in troubleshooting scenarios after it has provided the necessary data. Other techniques can be employed with EEM to optionally capture multiple iterations of data before turning itself off. The example script below checks every 60 seconds for the presence of the PCAP file that is generated when the TCA script runs and if it s found, it deletes the original script as well as itself. event manager applet DELETE_UNREACH event timer watchdog time 60 action 100 cli command "enable" action 101 cli command "show bootflash: in MO_CAP" action 102 string first "MO_CAP" "$_cli_result" action 103 if $_string_result ne "-1" action 104 cli command "config t" action 105 cli command "no event manager applet UNREACH" action 106 end action 107 cli command "no event manager applet DELETE_UNREACH" 218

Investigating Traffic-Class Byte Loss ISR4K_MC_Branch2#show domain iwan master traffic-classes Dst-Site-Prefix: 10.10.100.0/28 DSCP: af12 [12] Traffic class id:19 REMINDER: Byte loss was originally Clock Time: 18:01:39 (UTC) 03/07/2017 TC Learned: 00:07:38 ago detected on. Present State: CONTROLLED Current Performance Status: in-policy Current Service Provider: INET since 00:00:06 (hold until 83 sec) Previous Service Provider: pfr-label: 0:1 0:0 [0x10000] for 421 sec (A fallback provider. Primary provider will be re-evaluated 00:02:55 later) BW Used: 0 Kbps Present WAN interface: Tunnel20 in Border 10.10.3.254 Present Channel (primary): 59 INET pfr-label:0:2 0:0 [0x20000] Backup Channel: 58 pfr-label:0:1 0:0 [0x10000] Destination Site ID bitmap: 1 Destination Site ID: 10.10.100.254 (Active) Let s configure AVC on Tunnel10! Class-Sequence in use: 20 Class Name: BULK-DATA using policy bulk-data BW Updated: 00:00:38 ago Tunnel10 Reason for Latest Route Change: Loss INET Tunnel20 Route Change History: Date and Time Previous Exit Current Exit Reason 1: 18:01:33 (UTC) 03/07/17 (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) INET(0:2 0:0)/10.10.3.254/Tu20 (Ch:59) Out-of- Policy (Loss Rate Bytes: 13.79%) 2: 17:54:31 (UTC) 03/07/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) Uncontrolled to Controlled Transition -------------------------------------------------------------------------------- Total Traffic Classes: 1 Site: 1 Internet: 0 (10.10.2.254) 219

Investigating Traffic-Class Byte Loss AVC is implemented on the remote end where the TCA was generated. The same performance monitor referenced earlier is used to track expected and lost bytes. Tunnel10 DC1 BR#config t In this case, we have seen the issue once. We are not using EEM initially as we will monitor once traffic moves back to the primary path to see if the issue INET returns and Tunnel20 we can capture data live! <PERF MON CONFIGURATION OUTPUT OMITTED> DC1 BR(config)# interface Tunnel10 DC1 BR(config-if)# service-policy type performance-monitor input MMA_TCP (10.10.3.254) 220

Investigating Traffic-Class Byte Loss *Mar 7 18:01:33.204: %DOMAIN-5-TC_PATH_CHG: Traffic class Path Changed. Details: Instance=0: VRF=default: Source Site ID=10.10.3.254: Destination Site ID=10.10.100.254: Reason=Backup to Primary path preference transition: TCA-ID=11619: Policy Violated=BULK-DATA: TC=[Site id=10.10.100.254, TC ID=88, Site prefix=10.10.100.0/28, DSCP=af12(12), App ID=0]: Original Exit=[CHAN-ID=1463, BR-IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel20, Path=[label=0:0 0:0 [0x0]]]: New Exit=[CHAN-ID=1462, BR-IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:1 0:0 [0x10000]]] Tunnel10 INET Tunnel20 Before capturing statistics, we must wait for traffic to fail back to the primary path! (10.10.3.254) 221

Investigating Traffic-Class Byte Loss Tunnel10 *Mar 7 18:01:33.204: %DOMAIN-5-TC_PATH_CHG: Traffic class Path Changed. Details: Instance=0: VRF=default: Source Site ID=10.10.3.254: Destination Site ID=10.10.100.254: Reason=Backup to Primary path preference transition: TCA-ID=11619: Policy Violated=BULK-DATA: TC=[Site id=10.10.100.254, TC ID=88, Site prefix=10.10.100.0/28, DSCP=af12(12), App ID=0]: Original Exit=[CHAN-ID=1463, BR-IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel20, Path=[label=0:0 0:0 [0x0]]]: New Exit=[CHAN-ID=1462, BR-IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:1 0:0 [0x10000]]] ISR4K_MC_Branch2#show domain iwan master traffic-classes dscp af12 Dst-Site-Prefix: 10.10.100.0/28 DSCP: af12 [12] Traffic class id:19 Current Performance Status: in-policy Current Service Provider: since 00:02:42 Previous Service Provider: INET pfr-label: 0:2 0:0 [0x20000] for 180 sec <OUTPUT OMITTED> The TC has been moved back to after the fallback timer expired. No further degradation was reported while monitoring this path in standby mode. We can now start to check AVC on the! Reason for Latest Route Change: Backup to Primary path preference transition Route Change History: Date and Time Previous Exit Current Exit Reason INET Tunnel20 1: 18:17:10 (UTC) 03/07/17 INET(0:2 0:0)/10.10.3.254/Tu20 (Ch:1463) (0:1 0:0)/10.10.3.254/Tu10 (Ch:1462) Backup to Primary path preference transition 2: 18:01:33 (UTC) 03/07/17 (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) INET(0:2 0:0)/10.10.3.254/Tu20 (Ch:59) Out-of-Policy (Loss Rate Bytes: 13.79%) 3: 17:54:31 (UTC) 03/07/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) (0:1 0:0)/10.10.3.254/Tu10 (Ch:58) Uncontrolled to Controlled Transition Before capturing statistics, we must wait for traffic to fail back to the primary path! (10.10.3.254) 222

Investigating Traffic-Class Byte Loss Because this is the hub BR, we look for the branch prefixes only. The lost bytes rate for this flow equates to 7.36%. We now have the flow in this TC to focus on troubleshooting further! DC1 BR#show performance monitor cache monitor MMA_MON detail format csv in 10.10.3 <OUTPUT OMITTED> 0x0C = AF12 Tunnel10 INET Tunnel20 IPV4 SRC ADDR,IPV4 DST ADDR,TRNS SRC PORT,TRNS DST PORT,IP PROT,intf input,intf output,bytes,pkts,ip dscp,app name,trns cnt pkts expect,trns cnt pkts lost,trns bytes lost,trns bytes expected <OUTPUT OMITTED> 10.10.3.10,10.10.100.4,11058,20,6,Tu10,Gi0/0/0,1396,1,0x0C,layer7 unknown,0,0,0,1356 10.10.3.10,10.10.100.4,11059,20,6,Tu10,Gi0/0/0,288924,7223,0x0C,layer7 statistical-download,0,0,0,0 10.10.3.10,10.10.100.4,11074,21,6,Tu10,Gi0/0/0,80,2,0x0C,port ftp,0,0,4776,64872 <OUTPUT OMITTED> This destination IP is contained in (10.10.3.254) the subnet of the branch TC where we saw the TCA. 223

Improving Out-of-Policy Detection Often times the default monitor-interval of 30 seconds isn t sufficient for sensitive traffic such as voice and/or video. Business needs require faster detection of out-of-policy conditions on a link so that traffic can be quickly adjusted to the optimal path. The monitor-interval can be adjusted in the PfR policy to provide quicker reporting from Performance Monitor to PfR. domain iwan vrf default master hub monitor-interval <#> dscp <DSCP> New quick monitor is added showing the new configured interval of 5 seconds. ISR4K_MC_Branch2#show domain iwan border pmi <OUTPUT OMITTED> PMI[Ingress-per-DSCP-quick ]-FLOW MONITOR[MON- Ingress-per-DSCP-quick -0-48-13] monitor-interval:5 <OUTPUT OMITTED> DSCP-list: ef-[class:cent-class-ingress-dscp-ef-0-32] 224

What if the TCA is Unpredictable? In the previous EEM configuration, the script matched on a TCA message. However, in some situations this may not suffice. When a BR generates an OOP TCA, it doesn t log a message in the local syslog or on the local MC. Only the remote MC, which is the destination of the TCA message, will show the log message. There are two scenarios where additional instrumentation is needed to work with EEM 1. Data is needed on the BR originating the TCA. 2. The affected path is terminated by an independent BR on one end or both. In these situations, a couple of additional techniques can be used: 1. A temporary loopback with a tracked object on a BR to trigger data collection 2. TCA debugs enabled on the generating BR to provide an EEM trigger message 225

EEM BR Generating the TCA DC1 BR# debug domain iwan border tca OOP Detected! 226

EEM BR Generating the TCA May 18 06:18:30.011: TCA-ID:[0]:CHAN[10.10.3.254{ipv4}, 0xC, 0x20, 65536]: First tca for the channel added May 18 06:18:30.011: BR-TCA:[0]:CHAN[10.10.3.254{ipv4}, 0xC, 0x20, 65536]: D_Site:10.10.100.254, jitter:0, lostpkts:2307, lostbyts:0, owd:0, ndly:0,... DC1 BR# debug domain iwan border tca Destination Site-ID OOP Detected! DSCP 227

EEM BR Generating the TCA Using the messages generated by the debug, EEM can trigger if these are seen in the syslog to capture AVC/EPC and other data when the event happens. Looks for the debug message format event manager applet BR_TCA event syslog pattern BR-TCA" action 100 cli command "enable" action 101 cli command "show log in BR-TCA" action 102 string first "0xC" "$_cli_result" action 103 if $_string_result ne "-1" action 104 string first "10.10.3.254" "$_cli_result" action 105 if $_string_result ne "-1" action 106 syslog msg "TCA for 10.10.3.254/AF12 Found!" action 105 cli command "monitor capture LOSS stop" action 106 cli command "show domain iwan border channels dst-site-id 10.10.3.254 append bootflash:unreach_output" action 107 cli command "show domain iwan border traffic-class dst-site-pfx 10.10.3.0/28 append bootflash:unreach_output" action 108 cli command "show monitor capture LOSS buffer append bootflash:unreach_output" action 109 cli command "show log append bootflash:unreach_output" action 110 cli command "monitor capture LOSS export bootflash:byte_loss.pcap" action 111 cli command "clear log" action 112 end Checks for the right DSCP Checks for the right site-id Captures the required data to file 228

EEM MC Receiving TCA INET INET TCA! MC/BR BR I need to notify the INET BR to capture data... Dual-Router Branch...shutdown the loopback! We can use a tracked object so that EEM knows when to capture data on the independent BR. 229

EEM Independent BR MC (or MC/BR) interface Loopback200 ip address 192.168.254.254 255.255.255.255 event manager applet TCA_RECEIVED event syslog pattern DOMAIN-5-TCA action 1.0 cli command enable action 1.1 cli command config t action 1.2 cli command int lo200 action 1.3 cli command shut Independent BR Define a dummy IP and loopback not used for other features on the router. Generates a ping every 5 seconds to track reachability to the loopback. ip sla 200 icmp-echo 192.168.254.254 source-ip <Source IP routable from MC on LAN> frequency 5 ip sla schedule 200 life forever start-time now track 200 ip sla 200 reachability Used if the loopback isn t already routed ip route 192.168.254.254 255.255.255.255 <LAN next-hop> event manager applet BR_TCA event track 200 state down action 100 cli command "enable... Script triggers when the loopback on the MC goes down 230

Measuring Reachability PfR measures reachability by evaluating the presence of either (a) smart probes or (b) data traffic. Just as Performance Monitor reports per-tc metrics on a path, it also has a callback to PfR to measure reachability. PfR evaluates reachability against the configured reachability timer, which is configurable on the hub MC (in seconds). IWAN 2.1.1 IWAN 2.2 domain iwan vrf default master hub advanced channel-unreachable-timer <#> domain iwan master hub advanced channel-unreachable-timer <#> Unreachable timer defaults to 1 second in IWAN 2.1.1. This timer remains configurable in IWAN 2.2 but now defaults to 4 seconds, which is the CVD-recommended value. 231

Verifying Configured Unreachable Timer ISR4K_MC_Branch2#show domain iwan border status Sun May 21 14:33:21.131 -------------------------------------------------------------------- **** Border Status **** Instance Status: UP Present status last updated: 3w3d ago Loopback: Configured Loopback0 UP (10.10.3.254) Master: 10.10.3.254 Master version: 2 Connection Status with Master: UP MC connection info: CONNECTION SUCCESSFUL Connected for: 2w0d External Collector: 10.10.100.60 port: 2055 Route-Control: Enabled Asymmetric Routing: Disabled Minimum Mask Length Internet: 24 Minimum Mask Length Enterprise: 24 Connection Keepalive: 5 seconds Sampling: off Channel Unreachable Threshold Timer: 4 seconds... Timer can be viewed on any BR in the IWAN environment. 232

Troubleshooting Unreachability TC for AF12 is active on the path... May 21 13:03:42.503: %DOMAIN-5-TCA: UNREACHABLE Received. Details: Instance id=0: VRF=default: Source Site ID=10.10.100.254: Destination Site ID=10.10.3.254: TCA-ID=47: TCA-Origin=10.10.100.254(L): Exit=[CHAN-ID=16, BR-IP=10.10.100.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:0 0:1 [0x1]] Tunnel10 INET Tunnel20 (10.10.3.254) 233

Troubleshooting Unreachability TC for AF12 is active on the path... (L)ocally generated TCA by hub BR May 21 13:03:42.503: %DOMAIN-5-TCA: UNREACHABLE Received. Details: Instance id=0: VRF=default: Source Site ID=10.10.100.254: Destination Site ID=10.10.3.254: TCA-ID=47: TCA-Origin=10.10.100.253(L): Exit=[CHAN-ID=16, BR-IP=10.10.100.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:0 0:1 [0x1]] Tunnel10 INET (10.10.3.254) Tunnel20 May 21 13:03:44.889: %DOMAIN-5-TC_PATH_CHG: Traffic class Path Changed. Details: Instance=0: VRF=default: Source Site ID=10.10.100.254: Destination Site ID=10.10.3.254: Reason=Unreachable: TCA-ID=48: Policy Violated=BULK-DATA: TC=[Site id=10.10.3.254, TC ID=35, Site prefix=10.10.3.0/28, DSCP=af12(12), App ID=0]: Original Exit=[CHAN-ID=16, BR-IP=10.10.100.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:0 0:1 [0x1]]]: New Exit=[CHAN-ID=15, BR-IP=10.10.100.253, DSCP=af12[12], Interface=Tunnel20, Path=INET[label=0:0 0:2 [0x2]]] 234

Troubleshooting Unreachability Probes not received from branch on this path for unreachable timer window! Tunnel10 Last probe received over 23 seconds ago! DC1 BR#show domain iwan border channels dscp af12 Border Smart Probe Stats: Channel id: 16 Version : 3 Site id : 10.10.3.254 DSCP : af12[12] Service provider : Pfr-Label : 0:0 0:1 [0x1] Channel state : Initiated and open Channel next hop : 172.17.10.4 RX Reachability : Un-Reachable TX Reachability : Reachable Supports Zero-SLA : Yes Muted by Zero-SLA : No Muted by Path of INET Last Resort : Tunnel20 No Number of Probes sent : 786 794 Number of Probes received : 856 Number of SMP Profile Bursts sent: 456 460 Number of Active Channel Probes sent: 58 59 Number of Reachability Probes sent: 281 284 Number of Force Unreaches sent: 0 Last Probe sent : 585 215 msec Ago Last Probe received: 19121 23110 msec ago Number of Data Packets sent : 474100 Number Branch of Data MC/BR Packets received : 14689 Smart (10.10.3.254) Probe in Burst: No Smart Probe enable Burst: Yes 235

Troubleshooting Unreachability (R)emotely generated TCA Tunnel10 INET Tunnel20 *May 21 13:48:50.979: %DOMAIN-5-TCA: UNREACHABLE Received. Details: Instance id=0: VRF=default: Source Site ID=10.10.3.254: Destination Site ID=10.10.100.254: TCA-ID=1449: TCA-Origin=10.10.100.254(R): Exit=[CHAN-ID=1193, BR- IP=10.10.3.254, DSCP=af12[12], Interface=Tunnel10, Path=[label=0:1 0:0 [0x10000]] (10.10.3.254) 236

Troubleshooting Unreachability Channel id: 1193 Version : 3 Site id : 10.10.100.254 DSCP : af12[12] Service provider : Pfr-Label : 0:1 0:0 [0x10000] Channel state : Initiated and open Channel marked Un-reachable by PfR (10.10.3.254) Channel next hop : 172.17.10.1 RX Reachability : Reachable TX Reachability : Un-reachable Supports Zero-SLA : Yes Muted by Zero-SLA : No Muted by Path of Last Resort : No Number of Probes sent : 468 Number of Probes received : 371 Number Tunnel10 of SMP Profile Bursts sent: 271 INET Tunnel20 Number of Active Channel Probes sent: 32 *May Number 21 of 13:48:50.979: Reachability %DOMAIN-5-TCA: Probes sent: 183 UNREACHABLE Received. Details: Instance id=0: VRF=default: Source Site ID=10.10.3.254: Number of Force Destination Unreaches sent: Site 0 ID=10.10.100.254: TCA-ID=1449: TCA-Origin=10.10.100.254(R): Exit=[CHAN-ID=1193, BR- IP=10.10.3.254, Last Probe sent DSCP=af12[12], : 170 msec Ago Interface=Tunnel10, Path=[label=0:1 0:0 [0x10000]] Last Probe received: 230 msec ago Number of Data Packets sent : 8646 Number of Data Packets received : 0 Smart Probe in Burst: No Smart Probe enable Burst: Yes (R)emotely generated TCA 237

Troubleshooting Unreachability Channel id: 1193 Version : 3 Site id : 10.10.100.254 DSCP : af12[12] Service provider : Pfr-Label : 0:1 0:0 [0x10000] Channel state : Initiated and open Channel Hub id: MC 1193 Channel marked Un-reachable Version by PfR : 3 Probes still sent and received but data packets no longer transmitted on this channel! (10.10.3.254) (R)emotely generated TCA Channel next hop : 172.17.10.1 Channel next hop : 172.17.10.1 RX Reachability : Reachable RX Reachability : Reachable TX Reachability : Un-reachable TX Reachability : Un-reachable Supports Zero-SLA : Yes Supports Zero-SLA : Yes Muted by Zero-SLA : No Muted by Zero-SLA : No Muted by Path of Last Resort : No Muted by Path of Last Resort : No Number of Probes sent : 468 Number of Probes sent : 484 Number of Probes received : 371 Number of Probes received : 388 Number Tunnel10 of SMP Profile Bursts sent: 271 Number of SMP INET Profile Bursts Tunnel20 sent: 280 Number of Active Channel Probes sent: 32 Number of Active Channel Probes sent: 32 *May Number 21 of 13:48:50.979: Reachability %DOMAIN-5-TCA: Probes sent: 183 UNREACHABLE Received. Details: Number Instance of Reachability id=0: VRF=default: Probes sent: Source 190 Site ID=10.10.3.254: Number of Force Destination Unreaches sent: Site 0 ID=10.10.100.254: TCA-ID=1449: Number TCA-Origin=10.10.100.254(R): of Force Unreaches sent: Exit=[CHAN-ID=1193, 0 BR- IP=10.10.3.254, Last Probe sent DSCP=af12[12], : 170 msec Ago Interface=Tunnel10, Path=[label=0:1 Last Probe 0:0 sent [0x10000]] : 96 msec Ago Last Probe received: 230 msec ago Number of Data Packets sent : 8646 Number of Data Packets received : 0 Smart Probe in Burst: No Smart Probe enable Burst: Yes Site id : 10.10.100.254 DSCP : af12[12] Service provider : Pfr-Label : 0:1 0:0 [0x10000] Channel state : Initiated and open Last Probe received: 186 msec ago Number of Data Packets sent : 8646 Number of Data Packets received : 0 Smart Probe in Burst: Yes Smart Probe enable Burst: Yes Next-hop is correct! 238

Capturing Data for Unreachability In the previous example, the TCA was generated by the hub. The channel on the showed that probes and data packets were no longer received. The branch, however, showed that it was still receiving probes, indicating the loss was unidirectional. Whether loss is uni- or bidirectional, EEM can again be used to trigger data capturing when the unreachability condition occurs. Because a TCA should be received and logged by both the local and remote MC, no additional information is needed. Object tracking can still be utilized in instances where the BR is an independent router. If using EPC to capture traffic in an unreachable condition, ensure that the ACL used for matching traffic includes an entry for data traffic and smart-probe traffic! 239

Troubleshooting Unreachability - Example ip access-list extended EPC permit udp host 10.0.1.11 eq 18000 host 10.3.0.1 eq 19000 dscp cs3 permit ip <hub subnet> <hub mask> any dscp cs3 Matches smart-probes monitor capture CAP access-list EPC interface Tu10 in buffer size 15 circular monitor capture CAP start event manager applet UNREACH event syslog pattern "DOMAIN-5-TCA" action 100 cli command "enable" action 101 cli command "show log in UNREACHABLE" action 102 string first "cs3" "$_cli_result" action 103 if $_string_result ne "-1" action 104 syslog msg "UNREACHABLE for cs3 found" action 105 cli command "monitor capture CAP stop... action 115 cli command no event manager applet UNREACH Can match a tracked object when/where required The script deletes itself after running to capture a single instance of data. 240

Immitigable Event (IME) These events occur when an OOP condition is measured but PfR has no alternative paths to put the traffic on. Reasons why an IME is observed: 1. Backup path is unavailable (tunnel down, channel unreachable, etc.) 2. Backup path is also out of policy and is measuring worse than the primary path (Best-of-Worst evaluation) 3. Backup path is out of bandwidth (current interface utilization exceeds 95% of configured bandwidth) May 3 07:06:33.043 BST: %DOMAIN-2-IME: Immitigable event occured. IME-ID=3859: Details: Instance=0: VRF=default: Source Site ID=10.10.2.254: Destination Site ID=10.10.100.254: Reason=No Alternate Exit: TCA-ID=98475: Policy Violated=VOICE: Current Exit=[CHAN-ID=786, BR-IP=10.10.2.254, DSCP=ef[46], Interface=Tunnel10, Path=[label=0:0 0:1 [0x1]]]: Out Of BW Alt Exits=0: Out Of Policy Alt Exits=1 Check the local MC channel output for this DSCP/destination site-id to see path measurements! The backup path is also OOP! 241

Best-of-Worst Evaluation In instances where both the primary and secondary path(s) are out of policy, PfR evaluates all channels based on (a) priority of the metric out of policy and (b) value of that metric. This is a best-of-worst evaluation. Steps for Evaluation 1. Evaluate the policy metric violated and the value of the metric. 2. If the metric has a different priority within the policy, the higher priority metric (lower numerical value) is considered more important for path measurement and PfR puts the traffic on the path with the lower priority metric that was violated. 3. If the metric has the same priority, PfR compares the raw values on each path to determine best-of-worst. 4. If the tie can t be broken, no action is taken and traffic remains on the current path. Priority of Metric class VOICE sequence 10 path-preference fallback INET class type: Dscp Based match dscp ef policy voice priority 2 packet-loss-rate threshold 1.0 percent priority 1 one-way-delay threshold 150 msec priority 3 jitter threshold 30000 usec priority 2 byte-loss-rate threshold 1.0 percent The priorities can be defined in a custom policy. 242

Load-Balancing

Enterprise Traffic Categories in PfR Traffic destined to a known site-prefix for a policybased DSCP Traffic destined to a known site-prefix for a non-policy DSCP Policy Scavenger Traffic destined to an IP outside of the enterprise prefix range Internet Non-IWAN Traffic destined to an IP in the enterprise prefix range for which there is no known site-prefix 244

Defining Scavenger Traffic Scavenger traffic is traffic destined to a prefix covered by a site-prefix entry but that doesn t match a DSCP in the configured policy. Only DSCP s configured in the policy are monitored for traffic performance. When load-balance is enabled on the hub MC, PfR realizes this traffic is part of the IWAN domain and will create a TC for it that, by default, will be loadbalanced across all available paths based on link utilization. Default class is created at the end of the policy which matches all other DSCP s. Any unmatched DSCP is classified by PfR under this match all condition. ISR4K_MC_Branch2#show domain iwan master policy class VOICE sequence 10 path-preference fallback INET class type: Dscp Based match dscp ef policy voice priority 2 packet-loss-rate threshold 1.0 percent priority 1 one-way-delay threshold 150 msec priority 3 jitter threshold 30000 usec priority 2 byte-loss-rate threshold 1.0 percent class BULK-DATA sequence 20 path-preference fallback INET class type: Dscp Based match dscp af11 policy bulk-data priority 2 packet-loss-rate threshold 5.0 percent priority 1 one-way-delay threshold 300 msec priority 2 byte-loss-rate threshold 5.0 percent match dscp af12 policy bulk-data priority 2 packet-loss-rate threshold 5.0 percent priority 1 one-way-delay threshold 300 msec priority 2 byte-loss-rate threshold 5.0 percent <OUTPUT OMITTED> class default match dscp all Number of Traffic classes using this policy: 0 245

Defining Internet Traffic Internet traffic is classified as any traffic destined to an IP that falls outside of the enterprise prefix-list range. Because this traffic doesn t belong to a particular site, the site-id will show as Internet when viewing the trafficclass. The T flag denotes Top Level, which are the configured enterprise prefixes on the hub MC. These are not owned by any site. ISR4K_MC_Branch2#show domain iwan master site-prefix Change will be published between 5-60 seconds Next Publish 01:37:48 later Prefix DB Origin: 10.10.3.254 Last publish Status : Peering Success Total publish errors : 0 Total learned prefix discards: 0 Prefix Flag: S-From SAF; L-Learned; T-Top Level; C- Configured; M-shared Site-id Site-prefix Last Updated DC Bitmap Flag <OUTPUT OMITTED> 255.255.255.255 *10.0.0.0/8 00:08:47 ago 0x0 S,T 255.255.255.255 *172.16.0.0/12 00:08:47 ago 0x0 S,T 255.255.255.255 *192.168.0.0/16 00:08:47 ago 0x0 S,T The site-id is listed as 255.255.255.255, which represents that no one owns these prefixes in PfR. 246

Load-Balance Configuration Standard load-balancing is configured on the hub MC and pushed to all spokes... DC1_HUB_MC#config t DC1_HUB_MC(config)#domain iwan DC1_HUB_MC(config-domain)#vrf default DC1_HUB_MC(config-domain-vrf)#master hub DC1_HUB_MC(config-domain-vrf-mc)#load-balance Operational status can then be verified on each Branch Master Controller... ISR4K_MC_Branch2#show domain iwan master status Master VRF: Global Instance Type: Branch <OUTPUT OMITTED> Load Balancing: Operational Status: Up Max Calculated Utilization Variance: 0% Last load balance attempt: never Last Reason: Variance less than 20% Total unbalanced bandwidth: External links: 0 Kbps Internet links: 0 Kbps 247

Load Balance Configuration Advanced In some designs, it s preferable to place all load-balanced traffic on a single path. EXAMPLE: bandwidth in many branch locations is limited and the business wants to ensure that this path is only reserved for business critical traffic (traffic placed on this path by the PfR policy). Path-preference can be defined for load-balanced traffic. DC1 BR(config-domain-vrf-mc)#load-balance advanced DC1 BR(...-load-balance)#path-preference INET1 fallback INET2 ISR4K_MC_Branch2#show domain iwan master policy <OUTPUT OMITTED> class default path-preference INET1 fallback INET2 match dscp all Number of Traffic classes using this policy: 1 Path-preference now shows as defined for the default class! 248

Load Balance How It Works PfR monitors exit interface bandwidth based on the configured bandwidth statement on the local tunnels. A variance of 20% is sought and maintained by PfR to ensure equal distribution of traffic. This is achieved by moving traffic-classes when variance exceeds 20% either due to (1) a new traffic-class added to a link or (2) the bandwidth of existing traffic-classes increasing. If moving a traffic-class will not equalize variance but instead will simply shift the higher bandwidth link to a different path, PfR will not move this traffic. This can happen in situations where traffic-class granularity is low. 249

Reviewing Load-Balanced Traffic-Class ISR4K_MC_Branch2#show domain iwan master traffic-classes dscp default Dst-Site-Prefix: 11.10.182.64/28 DSCP: default [0] Traffic class id:375934 Clock Time: 02:22:32 (UTC) 05/01/2017 TC Learned: 00:05:20 ago Present State: CONTROLLED Current Performance Status: not monitored (internet) Current Service Provider: INET since 00:05:20 Previous Service Provider: Unknown BW Used: 0 Kbps Present WAN interface: Tunnel20 in Border 10.10.2.254 Present Channel (primary): 743 INET pfr-label:1:2 0:0 [0x1020000] Backup Channel: none Destination Site ID: Internet Class-Sequence in use: default Class Name: default BW Updated: - ago Reason for Latest Route Change: Uncontrolled to Controlled Transition Route Change History: Date and Time Previous Exit Current Exit Reason Default DSCP isn t configured as a monitored class in the policy. Internet TC that is not monitored by performance monitor/pfr PfR placed this TC on the INET path based on load calculation 1: 02:17:12 (UTC) 05/01/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) INET(1:2 0:0)/10.10.2.254/Tu20 (Ch:743) Uncontrolled to Controlled Transition 250

Reviewing Exit Interface Utilization Configured tunnel bandwidth Calculated egress bandwidth Average of egress BW on all paths ISR4K_MC_Branch1#show domain iwan master exits BR address: 10.10.2.254 Name: Tunnel10 type: external Path: path-id: 0 PLR TCs: 0 Egress capacity: 15000 Kbps Egress BW: 275 Kbps Ideal:252 Kbps over: 23 Kbps Egress Utilization: 0 % DSCP: af11[10]-number of Traffic Classes[1] DSCP: af12[12]-number of Traffic Classes[1] DSCP: af13[14]-number of Traffic Classes[1] DSCP: ef[46]-number of Traffic Classes[1] BR address: 10.10.2.254 Name: Tunnel20 type: external Path: INET path-id: 0 PLR TCs: 0 Egress capacity: 20000 Kbps Egress BW: 229 Kbps Ideal:252 Kbps under: 23 Kbps Egress Utilization: 0 % DSCP: default[0]-number of Traffic Classes[12] DSCP: cs1[8]-number of Traffic Classes[1] DSCP: af22[20]-number of Traffic Classes[1] DSCP: af23[22]-number of Traffic Classes[1] Active DSCPs and number of TCs for each on the path Calculated egress utilization as percentage of configured bandwidth 251

Troubleshooting Load-Balancing: Scenario Load-Balance has been configured to send all non-policy traffic over INET with a fallback of routing. Once the site migration is complete, there will be a tertiary link used as a backup for Internet traffic. All enterprise traffic will be marked with a DSCP that is controlled. A test flow has been created at the branch but it is seen that the TC created never moves to a controlled state. ISR4K_MC_Branch2#show domain iwan master traffic-classes dscp default TC was learned over 4 minutes ago. Dst-Site-Prefix: 11.14.13.0/24 DSCP: default [0] Traffic class id:22 Clock Time: 14:17:17 (UTC) 05/03/2017 TC Learned: 00:04:16 ago Present State: UN-CONTROLLED at border 10.10.3.254 (An attempt to control will be made 00:00:16 later) Destination Site ID: Internet Class-Sequence in use: default Class Name: default BW Updated: - ago Route Change History: TC matches the default class. No route change history shows the TC has never been controlled. 252

Verify Active Configuration Transit MC (10.10.200.254) BR BR BR BR Tunnel10 INET Tunnel20 (10.10.3.254) 253

BR Verify Active Configuration Load-Balancing has been enabled on the hub MC. BR BR Path-preference is correctly defined. Transit MC (10.10.200.254) ISR4K_MC_Branch2#show domain iwan master policy sec default class default path-preference INET fallback routing match dscp all Number of Traffic classes using this policy: 1 BR ISR4K_MC_Branch2#show domain iwan master status sec Load Load Balancing: Operational Status: Up Max Calculated Utilization Variance: 0% Last load balance attempt: never Last Tunnel10 Reason: Variance less than 20% Total unbalanced bandwidth: External links: 0 Kbps Internet links: 0 Kbps Load Sharing: Enabled INET Tunnel20 The uncontrolled TC is matching the policy. (10.10.3.254) 254

Necessary Components Site-ID and Channels BR The 255.255.255.255 site-id is used to designate Internet TC s. This signifies that no site owns the destination prefix. It should be learned when PfR comes up on the branch. It is necessary to see this site-id BR in order to BRbuild an Internet TC. Transit MC (10.10.200.254) No Internet channels are present! BR ISR4K_MC_Branch2#$ iwan master discovered-sites sec 255.255.255.255 Site ID: 255.255.255.255 Site Discovered:1d00h ago Tunnel10 DSCP :default[0]-number of traffic classes[1][1] DSCP :af11[10]-number of traffic classes[0][0] DSCP :af12[12]-number of traffic classes[0][0] Site Traffic Classes: 1 INET Tunnel20 ISR4K_MC_Branch2#show domain iwan master channels sec Internet ISR4K_MC_Branch2# The TC that was created correctly shows as an active TC for this site-id. Without a channel, the TC will never be controlled! Why Branch don t MC/BR we see a channel? (10.10.3.254) 255

BR Check Default Routing Tunnel10 BR No routes are learned via Tunnel20 in EIGRP! ISR4K_MC_Branch2#show ip eigrp topology 0.0.0.0/0 in Tunnel 172.17.10.2 (Tunnel10), from 172.17.10.2, Send flag is 0x0 172.17.10.1 (Tunnel10), from 172.17.10.1, Send flag is 0x0 We need to check the INET configurations! BR Transit MC (10.10.200.254) ISR4K_MC_Branch2#show ip route 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via "eigrp 10", distance 170, metric 158976000, candidate default path Tag 103, type external Redistributing via eigrp 10 Last update from 172.17.10.2 on Tunnel10, 00:00:07 ago BR Routing Descriptor Blocks: 172.17.10.2, from 172.17.10.2, 00:00:07 ago, via Tunnel10 Route metric is 158976000, traffic share count is 1 Total delay is 310000 microseconds, minimum bandwidth is 20000 Kbit Reliability 255/255, minimum MTU 1400 bytes Loading 1/255, Hops 2 Route tag 103 * 172.17.10.1, from 172.17.10.1, 00:00:07 ago, via Tunnel10 Route metric is 158976000, INETtraffic share count is 1 Tunnel20 Total delay is 310000 microseconds, minimum bandwidth is 20000 Kbit Reliability 255/255, minimum MTU 1400 bytes Loading 1/255, Hops 2 Route tag 101 The preferred routes are via Tunnel10! Recall that path-preference is for INET with a fallback of routing. Without a preferred route out of one of the INET paths, PfR will not be able to build and control this traffic! (10.10.3.254) 256

Check Configuration on Hub Border Routers Transit MC (10.10.200.254) Legacy configuration! BR BR BR BR DC1_INET_BR#show run in ip route ip route 0.0.0.0 0.0.0.0 Null0 Tunnel10 DC1_INET_BR#show run in ip route ip route 0.0.0.0 0.0.0.0 Null0 INET Tunnel20 DC1_INET_BR#show ip route 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via "static", distance 1, metric 0 (connected), candidate default path Routing Descriptor Blocks: * directly connected, via Null0 Route metric is 0, traffic share count is 1 DC2_INET_BR#show ip route 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via "static", distance 1, metric 0 (connected), candidate default path Routing Descriptor Blocks: * directly connected, via Null0 Route metric is 0, traffic share count is 1 (10.10.3.254) 257

Check Configuration on Hub Border Routers Transit MC (10.10.200.254) BR BR Legacy configuration! BR ISR4K_MC_Branch2#show ip eigrp topology 0.0.0.0/0 in Tunnel 172.17.20.2 (Tunnel20), from 172.17.20.2, Send flag is 0x0 172.17.10.1 (Tunnel10), from 172.17.10.1, Send flag is 0x0 172.17.10.2 (Tunnel10), from 172.17.10.2, Send flag is 0x0 172.17.20.1 (Tunnel20), from 172.17.20.1, Send flag is 0x0 BR DC1_INET_BR#show run in ip route ip route 0.0.0.0 0.0.0.0 Null0 Tunnel10 Default route is now learned via both INET Border Routers! DC1_INET_BR#show ip route 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via "static", distance 1, metric 0 (connected), candidate default path Routing Descriptor Blocks: * directly connected, via Null0 Route metric is 0, traffic share count is 1 DC1_INET_BR#show run in ip route ip route 0.0.0.0 0.0.0.0 Null0 INET (10.10.3.254) Tunnel20 DC2_INET_BR#show ip route 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via "static", distance 1, metric 0 (connected), candidate default path Routing Descriptor Blocks: * directly connected, via Null0 Route metric is 0, traffic share count is 1 258

Check Traffic-Class State BR ISR4K_MC_Branch2#show domain iwan master traffic-classes dscp default Transit MC Although the TC (10.10.200.254) is now controlled, we see that it is taking the wrong path towards the transit DC! Dst-Site-Prefix: 11.14.13.0/24 DSCP: default [0] Traffic class id:28 Clock Time: 14:49:44 (UTC) 05/03/2017 TC Learned: 00:10:13 ago Present State: CONTROLLED Current Performance Status: not monitored (internet) Current Service Provider: INET since 00:07:35 Previous Service Provider: Unknown BR BR BW Used: 105 Kbps Present WAN interface: Tunnel20 in Border 10.10.3.254 Present Channel (primary): 878 INET pfr-label:1:2 0:0 [0x1020000] Backup Channel: none Destination Site ID: Internet Class-Sequence in use: default Class Name: default BW Updated: Tunnel10 00:00:13 ago Reason for Latest Route Change: Uncontrolled to Controlled Transition Route Change History: Date and Time Previous Exit Current Exit Reason PfR Label INET 1:2 = POP ID Tunnel20 1 : Path-ID 2 Traffic is controlled towards the transit site! Why? 1: 14:42:08 (UTC) 05/03/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) INET(1:2 0:0)/10.10.3.254/Tu20 (Ch:878) Uncontrolled to Controlled Transition BR (10.10.3.254) 259

Check Prefix State on Channels BR ISR4K_MC_Branch2#show domain iwan master channels sec Internet Channel Id: 878 Dst Site-Id: Internet 0 Channel Created: 00:02:57 ago Provisional State: Initiated and open Operational state: Available Channel to hub: TRUE <OUTPUT OMITTED> BR BR Transit MC (10.10.200.254) Link Name: INET DSCP: default [0] pfr-label: 1:2 0:0 [0x1020000] TCs: 1 BackupTCs: The default route is Active for the channel to the transit DC. BR Site Prefix List DEFAULT (Active) However, the default route is Standby for the primary DC. INET Tunnel20 Channel Id: 880 Dst Site-Id: Internet Link Name: INET DSCP: default [0] pfr-label: 0:2 0:0 [0x20000] TCs: 0 BackupTCs: 0 Channel Created: 00:02:56 ago Provisional State: Initiated Tunnel10 and open Operational state: Available Channel to hub: TRUE <OUTPUT OMITTED> Site Prefix List DEFAULT (Standby) (10.10.3.254) 260

BR Check Prefix State on Channels ISR4K_MC_Branch2#show domain iwan master channels sec Internet Transit MC The default route (10.10.200.254) with the best metric is learned via the transit DC. The default route is Active for the channel to the transit DC. BR Channel Id: 878 Dst Site-Id: Internet Link Name: INET DSCP: default [0] pfr-label: 1:2 0:0 [0x1020000] TCs: 1 BackupTCs: 0 ISR4K_MC_Branch2#show Channel Created: ip 00:02:57 route 0.0.0.0 ago Routing Provisional entry for 0.0.0.0/0, State: Initiated supernet and open Known Operational via "eigrp state: 10", distance Available 170, metric 215381333, candidate default path Channel to hub: TRUE BR BR Tag 104, type external Redistributing <OUTPUT OMITTED> via eigrp 10 Last update from 172.17.20.2 on Tunnel20, 00:00:01 ago Routing Site Descriptor Prefix List Blocks: * 172.17.20.2, DEFAULT (Active) from 172.17.20.2, 00:00:01 ago, via Tunnel20 However, the default route is Route metric is 215381333, traffic share count is 1 Channel Total delay Id: 880 is 420000 Dst Site-Id: microseconds, Internet minimum Link Name: bandwidth INET is DSCP: 15000 default Kbit [0] pfr-label: 0:2 0:0 [0x20000] TCs: 0 BackupTCs: 0 Standby for the primary DC. Reliability Channel Created: 255/255, 00:02:56 minimum ago MTU 1400 bytes Loading Provisional State: Initiated Tunnel10 1/255, Hops 2 and open INET Operational state: Available Tunnel20 Route tag 104 Channel to hub: TRUE <OUTPUT OMITTED> Site Prefix List DEFAULT (Standby) After investigation on the hub side, incorrect routing metrics are found and corrected. (10.10.3.254) 261

Check for Path Change Transit MC (10.10.200.254) BR BR BR BR Tunnel10 INET Tunnel20 *May 3 14:57:28.276: %DOMAIN-5-TC_PATH_CHG: Traffic class Path Changed. Details: Instance=0: VRF=default: Source Site ID=10.10.3.254: Destination Site ID=255.255.255.255: Reason=Backup to Primary path preference transition: TCA-ID=1273: Policy Violated=None: TC=[Site id=255.255.255.255, TC ID=28, Site prefix=11.14.13.0/24, DSCP=default(0), App ID=0]: Original Exit=[CHAN-ID=878, BR-IP=10.10.3.254, DSCP=default[0], Interface=Tunnel20, Path=[label=0:0 0:0 [0x0]]]: New Exit=[CHAN- ID=880, BR-IP=10.10.3.254 (10.10.3.254) 262

BR Verify Correct Forwarding State of TC ISR4K_MC_Branch2#show domain iwan master traffic-classes dscp default (10.10.3.254) Transit MC The current label is (10.10.200.254) 0:2, indicating that this is destined to the primary DC. Dst-Site-Prefix: 11.14.13.0/24 DSCP: default [0] Traffic class id:28 Clock Time: 14:57:30 (UTC) 05/03/2017 TC Learned: 00:17:59 ago Present State: CONTROLLED Current Performance Status: not monitored (internet) Current Service Provider: INET since 00:15:21 BR BR Previous Service Provider: INET pfr-label: 1:2 0:0 [0x1020000] for 919 sec BW Used: 0 Kbps Present WAN interface: Tunnel20 in Border 10.10.3.254 Present Channel (primary): 880 INET pfr-label:0:2 0:0 [0x20000] Backup Channel: none Destination Site ID: Internet Class-Sequence in use: default Class Name: Tunnel10 default BW Updated: 00:00:29 ago INET Tunnel20 Reason for Latest Route Change: Backup to Primary path preference transition Route Change History: Date and Time Previous Exit Current Exit Reason *May 3 14:57:28.276: %DOMAIN-5-TC_PATH_CHG: Traffic class Path Changed. Details: Instance=0: VRF=default: Source Site ID=10.10.3.254: Destination Site ID=255.255.255.255: Reason=Backup to Primary path preference transition: TCA-ID=1273: Policy 1: 14:57:28 (UTC) 05/03/17 INET(1:2 0:0)/10.10.3.254/Tu20 (Ch:878) INET(0:2 0:0)/10.10.3.254/Tu20 (Ch:880) Violated=None: TC=[Site id=255.255.255.255, TC ID=28, Site prefix=11.14.13.0/24, DSCP=default(0), App ID=0]: Original Backup to Primary path preference transition Exit=[CHAN-ID=878, BR-IP=10.10.3.254, DSCP=default[0], Interface=Tunnel20, Path=[label=0:0 0:0 [0x0]]]: New Exit=[CHAN- 2: 14:42:08 (UTC) 05/03/17 None(0:0 0:0)/0.0.0.0/None (Ch:0) INET(1:2 0:0)/10.10.3.254/Tu20 (Ch:878) ID=880, BR-IP=10.10.3.254 Uncontrolled to Controlled Transition The path change reason matches BR what was seen in the log message. 263

Transit-Site Affinity This feature allows a branch to prefer a site (hub DC or transit DC) based on routing metrics advertised by those DC s. With Transit-Site Affinity enabled, path-preference is followed for all TC s towards the same site even if the primary path drops and the next best routing metric is on the same path but to a different site. This feature is enabled by default and can be enabled/disabled using the following CLI on the hub MC: domain iwan vrf <name> master hub advanced [no] transit-site-affinity 264

Transit-Site Affinity - Illustrated 10.50.10.0/24 Transit MC (10.10.200.254) BR BR BR BR Tunnel10 INET Tunnel20 class INTRANET sequence 30 path-preference INET fallback class type: Dscp Based match dscp af22 policy best-effort (10.10.3.254) 265

Transit-Site Affinity - Illustrated 10.50.10.0/24 Transit MC (10.10.200.254) BR BR BR BR Tunnel10 INET Tunnel20 class INTRANET sequence 30 path-preference INET fallback class type: Dscp Based match dscp af22 policy best-effort (10.10.3.254) Installed route in RIB is for INET tunnel to transit DC 266

Transit-Site Affinity - Illustrated 10.50.10.0/24 Transit MC (10.10.200.254) BR BR BR BR Tunnel10 INET Tunnel20 class INTRANET sequence 30 path-preference INET fallback class type: Dscp Based match dscp af22 policy best-effort (10.10.3.254) Path drops, goes out of policy, or the route is removed on the INET tunnel! 267

Transit-Site Affinity - Illustrated 10.50.10.0/24 Transit MC (10.10.200.254) BR BR BR BR Tunnel10 INET... but Transit-Site Affinity is enabled! Tunnel20 class INTRANET sequence 30 path-preference INET fallback class type: Dscp Based match dscp af22 policy best-effort (10.10.3.254) New best route remains on the INET path but to the primary DC... 268

Transit-Site Affinity - Illustrated 10.50.10.0/24 Transit MC (10.10.200.254) BR BR BR BR Tunnel10 class INTRANET sequence 30 path-preference INET fallback class type: Dscp Based match dscp af22 policy best-effort INET Assuming Tunnel20 is available and in-policy and a route exists on this path, PfR will place the traffic on the fallback path and continue to prefer the transit DC! (10.10.3.254) 269

Transit-Site Affinity - Illustrated 10.50.10.0/24 Transit MC (10.10.200.254) BR BR BR BR Tunnel10 class INTRANET sequence 30 path-preference INET fallback class type: Dscp Based match dscp af22 policy best-effort INET (10.10.3.254)... but Transit-Site Affinity is enabled! Assuming Tunnel20 is available and in-policy and a route exists on this path, PfR will place the traffic Path Installed New on drops, the best fallback route goes route in RIB out path of remains and is policy, continue for INET or tunnel the to route INET to prefer is the path removed transit transit but to DC on DC! the the primary INET tunnel! DC... 270

Improving Load-Balance Granularity In some instances, large bandwidth flows that are load-balanced all fall within the same prefix window. When this happens, a single TC may encompass all of this traffic and will significantly utilize only one of the available paths. The minimum mask length for load-balanced TC s created by PfR can be modified on the hub MC to achieve more (or less) granularity. This works by informing PfR of the mask to use when creating an Internet TC. Internal subnet calculation is done by PfR based on the destination IP of a flow. domain iwan vrf <name> master hub advanced minimum-mask-length <#> ISR4K_BR_Branch1#show domain iwan master status... Minimum Mask Length: 28 Default for IWAN 2.1.1 271

Takeaways

Key Takeaways Performance Routing, Routing, Routing! Know Your Channels Automate Using Existing Tools When Needed Turn IWAN into I WIN! 273

Troubleshooting Workflow Interface Discovery What is Known? What to Check 1. Does the branch MC have a SAF adjacency with the hub MC? 2. Does the hub MC show the branch MC as a discoveredsite? Ensure the correct hub MC IP is defined in the domain configuration. Check show eigrp service-family ipv4 neighbor on the local MC and hub MC to see if a neighbor is formed. Verify Q count is zero. Check show domain <name> master discovered-sites on the hub MC and look for the branch s site-id. 3. Has each hub BR learned the branch site-id as well? Check show domain <name> border site-capability on the hub BR and look for the branch s site-id. 4. Is the hub BR building a channel on the default DSCP to the branch with the right next-hop? 5. Does the branch have a channel to the hub on the default DSCP? Check show domain <name> border parent-route and look for route installed to branch site-id. Check show domain <name> border channels dst-site-id <branch site-id> to confirm channel is present and active with correct next-hop programmed. Check show domain <name> border channels dscp default and look for channel to hub s site-id with the right next-hop programmed. If hub BR s channel looks correct, run packet captures on branch BR to confirm if smart probes are received. 274

Troubleshooting Workflow Minimum Requirement What is Known? What to Check 1. Is the Operational? Verify the has a complete configuration and that it is operational with show domain <name> master status 2. Is the problem with a single Branch, multiple, or ALL? Use the show eigrp service-family ipv4 neighbor command on the. What is different about the sites that are not working? If none are working is there a routing or configuration problem in PfR? 3. Does connectivity exist between sites? Verify routing entries are correct and all sites are using their tunnel interfaces to reach the hub MC. Are there any access-lists or firewalls that would block EIGRP SAF or TCP 17749? 4. Are SAF Updates sent and received successfully? Verify a non-zero output queue in show eigrp service-family ipv4 neighbor detail. EPC can be used to capture EIGRP updates and ack s, as well as EIGRP packet debugging. Verify no lost fragments or MTU problems exist between peers. 5. What is PfR missing? Is it in the service-routing DB? Is it in the EIGRP SAF topology table? Verify that services are subscribed and published as expected in show domain <name> (master border) peering. Verify the service updates are present in show service-routing database and show eigrp service-family ipv4 topology. 275

Troubleshooting Workflow Traffic-Class Learning What is Known? What to Check 1. Is the minimum requirement met and TCP 17749 socket is established between BR and MC 2. MC has PfR policy and a class exists for the DSCP of the flow Verify the is operational with show domain <name> master status, for the TCP session verify with show tcp brief The policy configuration can be seen with show running-config on the hub MC, and show domain <name> master policy on Branch MC 3. PMI s are present on the BR WAN Tunnels PMI is learned from the MC and can be viewed with show domain <name> border pmi 4. Borders have discovered WAN interfaces and there is available bandwidth On the hub MC, show domain <name> master exits 5. Route is present to cause traffic flow to egress the tunnel Verify the routing table on the border routers, show ip route w.x.y.z 6. Channels are created to the Dst-Site-ID and the channels are reachable On the MC, show domain <name> master channels, show domain <name> border channels, and show domain <name> border parentroute 276

Troubleshooting Workflow OOP TCA What is Known? What to Check 1. Where is the TCA sourced from and on what path? Check show log and find the TCA message. Look for the Origin and determine if this IP belongs to the local site (L) or remote site (R). Check the path the traffic was on when the TCA was generated 2. How often does this happen? Identify the DSCP and destination site-prefix from the TCA/path change messages. Check show domain <name> master traffic-class dst-site-pfx <prefix> sec <dscp> and check the Route Change history. 3. Does this happen frequently enough that AVC/EPC can be configured and manually monitored? 4. If not, use automation for data capture when the problem occurs again. If the Route Change history indicates frequent transitions during business hours, configure AVC/EPC on the TCA-generating end. Wait for traffic to return to affected path and actively monitor to determine flow(s) showing loss, delay, jitter, etc. Configure debug domain <name> border tca and EEM script to match the TCA condition and capture data when the problem occurs. 5. Is it clear that the OOP condition is on the WAN side? Configure debug domain <name> border tca and EEM script on TCAreceiving end with AVC on LAN side to capture traffic statistics as well. 277

Troubleshooting Workflow Unreachable TCA What is Known? What to Check 1. What is the value configured for the unreachability timer? Check show domain <name> border status. 2. Which site initially detected unreachability? Check show log and look for the TCA and the origin: (R)emote or (L)ocal. If it s not clear, check the channel in each side to see which BR has marked Rx Reachability as Un-reachable. 3. What does the border channel on each end? Check show domain <name> border channel dscp <DSCP>. Look for the time the last probe and data packet was sent/received. 4. Is each site sending traffic to the correct next-hop? Check show domain <name> border channel dscp <DSCP> and verify the programmed next-hop matches the correct route in the RIB. If it doesn t, check show ip route and show domain <name> border parent-route. 5. Has the traffic pattern been validated? Use EPC (with EEM if required) to capture all data and smart-probe traffic leading up to a TCA. 278

Troubleshooting Workflow Load-Balance What is Known? What to Check 1. Is load-balance configured and active? Check show domain <name> master policy on the local MC. Ensure default class is present with a match dscp all to catch all other DSCP values. 2. Is a default route (or route for the destination) present in the RIB of the BR? 3. If the destination falls within the enterprise boundary and is for a non-monitored DSCP, does a site-prefix exist? 4. Are channels built and is the destination route prefix listed and active on one of the channels? Check show ip route on the BR and verify that a route exists covering the destination of the traffic to be load-balanced. If path-preference is configured, ensure that installed RIB route points out of at least one of the PfR paths defined in the path-preference. Check show domain <name> master site-prefix on the local MC to confirm that a site-prefix exists to cover the destination. Ensure the enterprise prefix-list is configured by checking for Top-Level (T-flag) prefixes in database. Check show domain <name> master channels and ensure the channel is present with the prefix as (Active) on at least one channel. 5. If traffic is controlled and load-balanced but to the wrong DC, are routing metrics defined properly? Check show ip route, show ip eigrp topology (if relevant), or show ip bgp (if relevant). Confirm routing metrics to ensure the active route points to the correct DC on the correct path. 279

Recommended Optimizations connection-keepalive-timer Configured under advanced mode on the hub MC Improves convergence when an independent BR loses connectivity to the MC (indirect link failure, etc.) by allowing the BR to timeout all TC s after the keepalive hold-time expires Recommended value is 5 seconds (hold-time is 3x = 15 seconds) channel-unreachable-timer If utilizing IWAN 2.1.1, recommended value is 4 seconds Please see TCA Unreachable section for reference on configuration Smart Probe Tuning IWAN 2.2 introduces Probe Reduction which allows for defining lower probe rates Recommended in designs with branches that utilize low-bandwidth links (i.e. T1 speeds) http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/pfrv3/configuration/xe-16/pfrv3-xe-16-book/pfrprobe-reduction.html 280

Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the Cisco Live Mobile App 2. Click Join the Discussion 3. Install Spark or go directly to the space 4. Enter messages/questions in the space Cisco Spark spaces will be available until July 3, 2017. cs.co/ciscolivebot#

Complete Your Online Session Evaluation Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 gift card. Complete your session surveys through the Cisco Live mobile app or on www.ciscolive.com/us. Don t forget: Cisco Live sessions will be available for viewing on demand after the event at www.ciscolive.com/online.