Managing the Management Switches. Erik Ruiter SURFsara Cumulus Meetup Amsterdam 2017

Similar documents
IPv6 Capability of Whitebox Networking. Jeremy

Datacenter Network Innovation. Freek Dijkstra, SURFsara Colloquium UvA SNE 15 November 2017

Networking Terminology Cheat Sheet

Building physical clouds

OCP Networking Workshop. SAI-Switch Abstraction Interface SONiC Software for Open Networking in the Cloud

End-to-end fabric visibility

Build Cloud like Rackspace with OpenStack Ansible

deploying high capacity IP fabrics

CPNR: Cisco Prime Network Registrar Administration and Operations

Network Automation at Oracle+Dyn NANOG on the Road Boston, 14 Sept 2017

January 28 29, 2014San Jose. Engineering Workshop

OPEN NETWORKING REVOLUTION. Aftab Siddiqui Internet Society Technical Engagement Manager - Asia Pacific

Baremetal with Apache CloudStack

Automated Out-of-Band management with Ansible and Redfish

Network Automation using modern tech. Egor Krivosheev 2degrees

Network configuration management at CERN

Optics in Open Networks

Junos Platform Automation (JAUT)

Redfish: The next generation of server management automation

PICA8 Intro. Copyright 2015 Pica8 Inc. All Rights Reserved.

Oslo 30 October 2018

Cisco Cloud Services Platform 2100 Quick Start Guide, Release 2.2.0

Google Cloud Platform for Systems Operations Professionals (CPO200) Course Agenda

MS425 SERIES. 40G fiber aggregation switches designed for large enterprise and campus networks. Datasheet MS425 Series

NephOS. A Single Turn-key Solution for Public, Private, and Hybrid Clouds

COMPLETE AGENT-FREE MANAGEMENT OF POWEREDGE SERVERS

Implementing Multi-Chassis Link Aggregation Groups (MC-LAG)

Transforming Networks to All-IT Network with OCP and Open Networking

Be smart. Think open source.

Dell EMC Ready Solution for VMware vcloud NFV 3.0 OpenStack Edition Platform

Trellis Introduction. Saurav Das, Charles Chan & Jono Hart. with contributions from many more. CORD Build 2017, San Jose January 14, 2018

Building Scaleable Cloud Infrastructure using the Red Hat OpenStack Platform

Lenovo ThinkSystem NE Release Notes. For Lenovo Cloud Network Operating System 10.6

NephOS. A Single Turn-key Solution for Public, Private, and Hybrid Clouds

OS10 Virtualization Guide. Enterprise Edition

Side-by-side comparison of the features of Dell EMC idrac9 and idrac8

VMware vsphere 6.5/6.0 Ultimate Bootcamp

Stratum Project. Enabling era of next generation of SDN

Open Networking Hardware and Software Steven Noble / Big Switch Networks

Automating Cloud Networking with RedHat OpenStack

About Chassis Manager

Installing vrealize Network Insight

Gigabit Managed Ethernet Switch

Cisco Meraki MS400 Series Cloud-Managed Aggregation Switches

ForeScout CounterACT. Single CounterACT Appliance. Quick Installation Guide. Version 8.0

Automating, Securing, and Managing Cox Automotive's (AutoTrader) Big Data Infrastructure

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

Overview. ACE Appliance Device Manager Overview CHAPTER

Introduction to Stacki. Greg Bruno, PhD VP Engineering, StackIQ

ForeScout Extended Module for ServiceNow

Arista Networks A New Era of Networking. Chris Bowles Consolidate IT

Cisco Virtual Networking Solution for OpenStack

Cisco Cloud Services Platform 2100 Quick Start Guide, Release 2.2.5

Weiterentwicklung von OpenStack Netzen 25G/50G/100G, FW-Integration, umfassende Einbindung. Alexei Agueev, Systems Engineer

ForeScout Extended Module for ServiceNow

NEXT GENERATION SOLUTION FOR NETWORK ACCESS MANAGEMNT & CONTROL

Release Notes for Cisco UCS Platform Emulator, Release 3.1(1ePE1)

Demo Lab Guide OS 9 Emulator

Port Usage Information for the IM and Presence Service

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

OpenSwitch Introduction to Architecture and Development Diego Dompe, Michael Zayats

Forescout. Quick Installation Guide. Single Appliance. Version 8.1

LENS Server Maintenance Guide JZ 2017/07/28

Installing vrealize Network Insight. VMware vrealize Network Insight 3.3

Your partner for Success. CCIE Security v5 Lab Access Guide

UCS-E160DP Double-wide E-Series Server, 6 core CPU, with PCIe

vrealize Network Insight Installation Guide

Peplink SD Switch User Manual. Published on October 25th, 2018

Installing vrealize Network Insight. VMware vrealize Network Insight 3.5

My network deploys. How about yours? EOS APIs. Andrei Dvornic

Cisco Network Plug and Play Agent Configuration Guide, Cisco IOS XE Everest b

Apstra Operating System AOS

Linux Administration

Networking Software. ONIE Project Update. Alex Doyle Build Engineer Cumulus Networks

Gigabit Managed Ethernet Switch

Gigabit Managed Ethernet Switch

45 10.C. 1 The switch should have The switch should have G SFP+ Ports from Day1, populated with all

Best Practice Deployment of F5 App Services in Private Clouds. Henry Tam, Senior Product Marketing Manager John Gruber, Sr. PM Solutions Architect

SWP-0208G, 8+2SFP. 8-Port Gigabit Web Smart Switch. User s Manual

DEPLOYING NFV: BEST PRACTICES

Dell EMC Big Cloud Fabric Deployment and Best Practices Guide with VMware vsan. Dell EMC Networking Infrastructure Solutions November 2017

Cisco - ASA Lab Camp v9.0

Dell EMC OpenManage Enterprise-Modular Edition Version for PowerEdge MX7000 Chassis. User's Guide

Model Driven APIs for the Network Infrastructure Layer

ForeScout CounterACT. Configuration Guide. Version 8.12

Dell EMC Ready Architecture for Red Hat OpenStack Platform

Network Configuration Example

Data Center Automation

NSX-T Data Center Migration Coordinator Guide. 5 APR 2019 VMware NSX-T Data Center 2.4

Your partner for Success. CCIE Security v5 Lab Access Guide

PSOACI Tetration Overview. Mike Herbert

PSGS-2610F L2+ Managed GbE PoE Switch

Updated in 2017 Free Product Guide for Your Network. Ethernet Switch. Catalog

Introduction to Cumulus Linux

Introduction to Aruba Dik van Oeveren Aruba Consulting System Engineer

CounterACT Switch Plugin

INDIGO PAAS TUTORIAL. ! Marica Antonacci RIA INFN-Bari

Port Usage Information for the IM and Presence Service

UCS Technical Deep Dive: Getting to the Heart of the Matter

FGS-2616X L2+ Managed GbE Fiber Switches

Transcription:

Managing the Management Switches Erik Ruiter SURFsara Cumulus Meetup Amsterdam 2017

Outline 1. Old vs new Situation 2. Used technologies (Ansible / Cumulus) 3. Ansible Examples 4. Results / Whats next? 5. Short demo (if time left )

Background (High-Level Design SURFsara network) SURFsara provides many services, for instance: - Super computing (Cartesius) - HPC cloud - Hadoop - GRID computing / storage All services are operated as clusters, and are accessible for personnel through a management network. This is used for: - Steppingstone LAN - DRAC / IPMI access - Backup traffic - Server monitoring Routing is done using Firewall

Previous management network Left-over switches from 1 to 10G upgrades in the past Unsupported switches: up to 10 years old Hand-maintained, vendor proprietary CLI Arista Brocade (and also Foundry) Cisco (7 different models) Dell (8 different models) Juniper Nortel (unmanaged) Supermicro (unmanaged)

Desired management network Physical requirements: - Low(er) power switches - Redundant, swappable power supply and fans - Back-to-front airflow Core (Fiber) mgmt switch 1 HUB A 2Gbps LACP 1G-BASE-T 20Gbps 20Gbps MLAG / VC 10GBASE-SR MM-fiber HUB B 2Gbps LACP 1G-BASE-T Core (Fiber) mgmt switch 2 Operational requirements: - Standard CLI for all switches - Automated configuration management / ZTP - Mostly layer 2 requirements 20Gbps LACP 10GBASE-SR MM-fiber Core (UTP) mgmt switch 3 Core (UTP) mgmt switch 4 20Gbps LACP 10GBASE-SR MM-fiber NW-MGMT NW-MGMT EoR Console server 1G-BASE-T ToR switch EoR switch 1G-BASE-T 1G-BASE-T 1G-BASE-T EoR switch ToR switch EoR Console server ToR switch ToR switch CONSOLE RJ45 RS232 MGMT VLANS VLAN 800 SURFSARA-NW-MGMT SERIAL RS232 CONSOLE CONSOLE RJ45 RS232

Resulting management network Tender for 78 switches, February April 2016, Won by Dell Core switches: Top of Rack and End of Row switches: Console servers for OOB support: Dell S4048-ON Dell S3048-ON Opengear CM7148 Core switches configured as an MLAG pair EoR switches connected to cores using LACP (10 Gbps optical) ToR switches connected to EoR using single uplink (1 Gbps UTP) Bare Metal ( white label ) switches using the ONIE boot loader All switches running Cumulus Linux as networking OS

Bare Metal ( White Label ) Switches Decoupling of network operating system and hardware Similar to the rise of Linux and Windows with the IBM compatible PC Driven by cheap top-of-rack switches in big datacentres (eg Facebook Wedge / Backpack) Bootloader: ONIE Hardware: Dell, Edge-core (Accton), Quanta, Penguin, Mellanox, Software: Cumulus Linux, Big Switch, PicOS (Pica8), Pluribus, Dell OS 10, Microsoft SONiC, etc Freedom of Choice! Network Processors Broadcom Apollo2 Broadcom Firebolt3 Broadcom Helix4 Broadcom Tomahawk Broadcom Trident Broadcom Trident/+/2/2+ Broadcom Triumph2 Mellanox Spectrum Cavium

Cumulus Linux Linux on a switch No more vendor proprietary CLI Install your own software when desired Manageable using many configuration management systems Makes use of switchd kernel driver for communicating with Broadcom ASIC SURFsara already gained experience using Cumulus in some previous projects

Port configuration Cumulus vs Cisco NOTE: in version 3.2 a CLI is introduced

Network management approach Original Plan: Network Controller Considered different network controllers, Not what we hoped for: No / Limited northbound authentication, Poor vendor abstraction We only require a simple northbound interface, no device state required, just pull status and push config. No intelligent decisions required (eg traffic engineering) Network controllers can be difficult to learn and manage SPOF or complex redundant installation NCS Still times can change, we still need to keep track of these to monitor improvements.

Managing the management switches Our approach: Configuration management Built in python -> Flexible / extendible No agent required on switch/server Support for multiple vendors (including Juniper, Cisco and Arista) Templating using YAML Playbooks -> Plays -> Roles -> Tasks

Zero Touch Provisioning Is used for provisioning a switch without any user interaction. Switch is racked, connected to network mgmt VLAN and turned on DHCP server provides IP address for management interface ONIE boot loader downloads required firmware from HTTP server (using URL from DHCP option) ZTP script removes default login credentials, creates NOC user, and adds authorized_keys From here on Ansible takes over switch configuration using playbook and predefined variables Physical racking DHCP ONIE installation Config using Ansible

Implemented roles cl-users: Creates users and adds sshkeys cl-common Sets common settings: DNS resolvers, NTP server, Timezone and hostname cl-license Sets and activates Cumulus licence cl-apt Set apt repositories and set location of apt-proxy cl-ldap_auth Sets up LDAP authentication for non-local users (not in use at the moment) cl_snmp Sets up SNMP configuration and starts daemon cl-rsyslog Sets up syslogging and starts daemon cl-interface Sets up switch interfaces (VLAN aware bridge,vlan tagging, LACP, MLAG, routed interfaces) Cumulus also provides similar roles, in their Ansible Galaxy repository

Inventory and Variables Inventory: - Contains host and group information for Ansible managed servers/switches - SURFsara uses a dynamic inventory, making use of own in house developed Inventory database (CMT). - This avoids having a to keep a separate administration for Ansible hosts. Group variables file: - A single file which contains all variables that are the same for all switches in a Ansible group - NTP settings - DNS settings, - SNMP - Syslog settings - etc Host variables file - A file per switch, containing the variables that are unique per switch - Mgmt ip address - Port settings (VLAN tagging / MTU / port description) - MLAG / LACP settings

Performing changes on network Scenario: Icinga has a new IP adress, ip address of SNMP querier needs to be changed on all switches: 1. Adjust global variables file 2. Commit in GIT 3. Pull new config 4 Execute ansible playbook #ansible-playbook mgmt/provisioning.yml -tags snmp Measured execution time: 2 minutes for 70 switches (how much time does this take for 70 individually managed switches from 7 different vendors?)

Example role: SNMP (in too much detail) Tasks (YAML) Templates (Jinja) Variables (YAML) Result

Results/ Lessons learnt Positives: - Freedom of choosing own OS - Standardized network (big improvement from pile of old switches) - Configuration management using Ansible is powerful - Cheaper switches, built for datacenter environment Keep in Mind: - Network engineers require some additional skills. (Linux / Cumulus / Ansible / Git) - Different way of working, pushing configs in stead of configuring switches - Configuration management using Ansible is powerful (again!) - Support contract is important; Who is responsible? (Dell or Cumulus?)

Whats next Short term: Implement Lesser steps for making changes (no more manual git commands) Exposing Ansible using API (poor mans Ansible Tower / Semaphore) Implement additional roles for routing (quagga) and ACLs (iptables) Longer term (wish list) Build GUI (+ Authorization) for delegating small network changes to end-users -> e.g. change VLAN tag on single port Explore possibilities for using Ansible on production routers and switches (mostly Juniper equipment) (using NAPALM?) Standardize configuration abstraction using YAML or possibly OpenConfig for multiple vendors Add automated provisioning for Cacti and Icinga

Erik Ruiter Erik.Ruiter@surfsara.nl www.surfsara.nl

Network topology 20Gbps MLAG sw-lab-c04-1 sw-lab-h04-1 LACP 20Gbps MLAG sw-lab-c04-2 sw-lab-h04-2