SENSE: SDN for End-to-end Networked Science at the Exascale

Similar documents
Next Generation Integrated Architecture SDN Ecosystem for LHC and Exascale Science. Harvey Newman, Caltech

BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer. BigData Express Research Team November 10, 2018

BNL Dimitrios Katramatos Sushant Sharma Dantong Yu

Enabling a SuperFacility with Software Defined Networking

Chin Guok, ESnet. Network Service Interface Signaling and Path Finding

BigData Express: Toward Predictable, Schedulable, and High-performance Data Transfer. Fermilab, May 2018

BigData Express: Toward Predictable, Schedulable, and High-performance Data Transfer. Wenji Wu Internet2 Global Summit May 8, 2018

Programmable Information Highway (with no Traffic Jams)

ESnet s (100G) SDN Testbed

Transport SDN: The What, How and the Future!

Novel Network Services for Supporting Big Data Science Research

Advance Reservation Access Control Using Software-Defined Networking and Tokens

Design patterns for data-driven research acceleration

Traffic Optimization for ExaScale Science Applications

NSI: The common interface towards network services

Optical considerations for nextgeneration

DetNet Requirements on Data Plane and Control Plane

Engagement With Scientific Facilities

ANSE: Advanced Network Services for [LHC] Experiments

Cisco Extensible Network Controller

ALCATEL-LUCENT ENTERPRISE DATA CENTER SWITCHING SOLUTION Automation for the next-generation data center

Routing Underlay and NFV Automation with DNA Center

Extending dynamic Layer-2 services to campuses

CHARTING THE FUTURE OF SOFTWARE DEFINED NETWORKING

The Software Journey: from networks to visualization

Technologies for the future of Network Insight and Automation

SLIDE 1 - COPYRIGHT 2015 ELEPHANT FLOWS IN THE ROOM: SCIENCEDMZ NATIONALLY DISTRIBUTED

Performance Assurance Solution Components

Cisco APIC Enterprise Module Simplifies Network Operations

Cisco CloudCenter Solution with Cisco ACI: Common Use Cases

PLEXXI HCN FOR VMWARE VSAN

CMS Data Transfer Challenges and Experiences with 40G End Hosts

Zhengyang Liu University of Virginia. Oct 29, 2012

SNAG: SDN-managed Network Architecture for GridFTP Transfers

Towards Network Awareness in LHC Computing

Evolution of OSCARS. Chin Guok, Network Engineer ESnet Network Engineering Group. Winter 2012 Internet2 Joint Techs. Baton Rouge, LA.

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Virtualized Network Services SDN solution for service providers

Participatory Networking: An API for Application Control of SDNS SIGCOMM 13

Fundamentals and Deployment of Cisco SD-WAN Duration: 3 Days (24 hours) Prerequisites

China Unicom SDN Practice in WAN. Lv Chengjin/Ma Jichun, China Unicom

SDN/DANCES Project Update Developing Applications with Networking Capabilities via End-to-end SDN (DANCES)

SDN AT THE SPEED OF BUSINESS THE NEW AUTONOMOUS PARADIGM FOR SERVICE PROVIDERS FAST-PATH TO INNOVATIVE, PROFITABLE SERVICES

WLCG Network Throughput WG

KREONET-S: Software-Defined Wide Area Network Design and Deployment on KREONET

Internet2 DCN and Dynamic Circuit GOLEs. Eric Boyd Deputy Technology Officer Internet2 GLIF Catania March 5, 2009

Carrier SDN for Multilayer Control

Cisco Wide Area Bonjour Solution Overview

Title DC Automation: It s a MARVEL!

Time-Sensitive Networking: A Technical Introduction

RapidIO.org Update. Mar RapidIO.org 1

ONUG SDN Federation/Operability

One Platform Kit: The Power to Innovate

Scheduling Computational and Storage Resources on the NRP

Ending the Confusion About Software- Defined Networking: A Taxonomy

Avnu Alliance Introduction

OpenStack and OpenDaylight, the Evolving Relationship in Cloud Networking Charles Eckel, Open Source Developer Evangelist

Virtual Private Networks with Cisco Network Services Orchestrator Enabled by Tail-f - Fast, Simple, and Automated

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

THE NETWORK. INTUITIVE. Powered by intent, informed by context. Rajinder Singh Product Sales Specialist - ASEAN August 2017

Optimize and Accelerate Your Mission- Critical Applications across the WAN

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Virtualization and Softwarization Technologies for End-to-end Networking

NETWORK VIRTUALIZATION IN THE HOME Chris Donley CableLabs

The National Fusion Collaboratory

Data Driven Networks

LHC and LSST Use Cases

ANR-13-INFR-013 ANR DISCO

EU Phosphorus Project Harmony. (on

Borderless Networks. Tom Schepers, Director Systems Engineering

Huawei Agile Controller. Agile Controller 1

Bringing DevOps to Service Provider Networks & Scoping New Operational Platform Requirements for SDN & NFV

DYNES: DYnamic NEtwork System

BROCADE CLOUD-OPTIMIZED NETWORKING: THE BLUEPRINT FOR THE SOFTWARE-DEFINED NETWORK

Data Intensive Science Impact on Networks

HP SDN Document Portfolio Introduction

SDN Peering with XSP. Ezra Kissel Indiana University. Internet2 Joint Techs / TIP2013 January 2013

OpenFlow: What s it Good for?

SEVONE END USER EXPERIENCE

ONAP CCVPN Blueprint Overview. ONAP CCVPN Blueprint Improves Agility and Provides Cross-Domain Connectivity. ONAP CCVPN Blueprint Overview 1

Sweet Little Lies: Fake Topologies for Flexible Routing

Implementation of the FELIX SDN Experimental Facility

From Zero Touch Provisioning to Secure Business Intent

NetDevOps. Building New Culture around Infrastructure as Code and Automation. Tom Davies Sr. Manager,

Software-Defined WAN: Application-centric Virtualization and Visibility

Intelligent WAN: Leveraging the Internet Secure WAN Transport and Internet Access

Linac Coherent Light Source (LCLS) Data Transfer Requirements

Virtualized Network Services SDN solution for enterprises

Thomas Lin, Naif Tarafdar, Byungchul Park, Paul Chow, and Alberto Leon-Garcia

UltraScience Net Update: Network Research Experiments

Micro Focus Network Operations Management Suite Supports SDN and Network Virtualization Engineering and Operations

TIBCO Complex Event Processing Evaluation Guide

Enhancing Infrastructure: Success Stories

Cisco Cloud Architecture with Microsoft Cloud Platform Peter Lackey Technical Solutions Architect PSOSPG-1002

Exam C Foundations of IBM Cloud Reference Architecture V5

Voice of the Customer First American Title SD-WAN Transformation

A Simulation Model for Large Scale Distributed Systems

Self Programming Networks

Database Engineering. Percona Live, Amsterdam, September, 2015

Automated Control and Orchestration within the Juniper Networks Mobile Cloud Architecture. White Paper

Transcription:

SENSE: SDN for End-to-end Networked Science at the Exascale SENSE Research Team INDIS Workshop, SC18 November 11, 2018 Dallas, Texas SENSE Team Sponsor Advanced Scientific Computing Research (ASCR)

ESnet Inder Monga Chin Guok John MacAuley Alex Sim Fermilab Phil Demar ANL Linda Winkler NERSC Damian Hazen SENSE Team Caltech Harvey Newman Justas Balcas Maria Spiropulu Raimondas Sirvinskas Shashwitha Puttaswamy University of Maryland/Mid-Atlantic Crossroads Tom Lehman Xi Yang Alberto Jimenez

Acknowledgments This work was supported by the U.S. DOE Office of Science Advanced Scientific Computing Research (ASCR) network research program

4 Intelligent Predictive and/or Deterministic Network Experience When will I get home? LAX Caltech, 6 pm: 1 hr 1hr 50 min LAX Caltech, 11 pm: 32 min

Large data streams from instrument to supercomputer (LCLS II)

What Research Problem(s) are We Solving? Distributed scientific workflows need end-to-end automation so the focus can be on science, and not infrastructure Manual provisioning and infrastructure debugging takes time No service consistency across domains No service visibility or automated troubleshooting across domains Lack of realtime information from domains impedes development of intelligent services

What Research Problem(s) are We Solving? Science application workflows need to drive network service provisioning Network programming APIs usually not intuitive and require detailed network knowledge, some not pre-known Detailed network information needed, usually not easily available Efficient use of resources Multi-domain orchestration requires service visibility and troubleshooting Data APIs across domains for applications, users, network administrators Performance, service statistics, topology, capability, etc. Exchange of scoped and authorized information

Vision and Objectives A new paradigm for Application to Network Interactions Intent Based Abstract requests and questions in the context of the application objectives. Interactive what is possible? what is recommended? let s negotiate. Real-time resource availability, provisioning options, service status, troubleshooting. End-to-End multi-domain networks, end sites, and the network stack inside the end systems. Full Service Lifecycle Interactions continuous conversation between application and network for the service duration.

SENSE Solution Approach End-to-End model-based distributed resource reasoning and intelligent service orchestration Hierarchical service resource architecture Unified network and end-site resource modeling and computation Model based realtime control Application driven orchestration workflow End-to-end network data collection and analytics integration

SENSE Architecture Components Application Workflow Agent Orchestrator API Network Data and Analytics data from resource owners Resource Manager Network SENSE Orchestrator Resource Manager End Site Resource Manager API Resource Manager Other Resources

SENSE Application Workflow Agent Intent Based APIs with Resource Discovery, Negotiation, Service Lifecycle Monitoring/Troubleshooting SENSE End-to-End Model Realtime system based on Resource Manager developed infrastructure and service models Model Driven SDN Control with Orchestration SENSE Orchestrator Datafication of cyberinfrastructure to enable intelligent services Architecture ESnet UMD MAX SENSE Resource Manager

SENSE Resource Managers Model Based Control and Orchestration Learn system states Read/Sync data Network Markup Language/RDF based Model Schema SENSE Orchestrator ESnet Control Feedback / Awareness Infrastructure Allow the machines to automate, iterate, react, and adjust to find solutions and not bring the humans in until absolutely necessary Automatic Operation MAX Change system states Write / Sync data Multi-Resource Modeling UMD SENSE Resource Manager

Modeling Language and Schema Based on Network Markup Language (NML) standard developed by the Open Grid Forum (OGF) Added extensions to allow other resource types in addition to network elements/topologies to be described and modeled Multi-Resource Markup Language (ML) https://github.com/max-umd/nml-mrml

SENSE Orchestrator Application Types of Interactions Workflow Agent What is possible? What is recommended? Requests with negotiation Service status and troubleshooting SENSE Orchestrator ESnet Resource Discovery Service Service Discovery End-Point Listing Connectivity Service Point-to-Point Multi-Point Resource Computation Service Monitoring and Troubleshooting Service MAX UMD SENSE Resource Manager

SENSE Network Resource Manager () SENSE- API Model Based Interface Infrastructure and Services OSCARS/NSI SDN Controller Others SENSE-O Network- Network Controller Network- Functions/Roles: Responsible for a specific set of Network Resources Generate realtime ML Model Evaluate and respond to SENSE Orchestrator information and service requests (including negotiation) Provision network resources in support of SENSE services Provide status, monitoring, and debug functions

SENSE /End Site Resource Manager SENSE- API Model Based Interface Infrastructure and Services SENSE-O EndSite/ Switch/ Router End-Site/ Resources To external network EndSite/- Functions/Roles: Responsible for a specific set of EndSite and Resources Generate realtime ML Model Evaluate and respond to SENSE Orchestrator information and service requests (including negotiation) Provision EndSite/ resources in support of SENSE services (includes networking stack of end systems) QoS provided via OpenFlow (Open vswitch) flow prioritization and/or TC (FireQoS) Automatic dataflow initiation for path verification

SENSE Enabled s SENSE s can be deployed next to production s No impact to standard operations Just adds a priority flow feature Scheduled and guaranteed resources, network and end system Can be included as part of application workflow planning

Application to SENSE Interactions Application Workflow What is the maximum Agent Request Please provide 20 Gbps a P2P listing service of bandwidth all Service between available is available not Caltech provisioning working, and for a Fermilab. P2P please service Endpoints check If 20 between status Gbps not Caltech available and 10Gbps Fermilab? is ok. Please provide a listing of all available provisioning Endpoints Endpoint Listing What is the maximum bandwidth available for a P2P service between Caltech and Fermilab? 20 Gbps available for a P2P service between Caltech and Fermilab 20 Failure 15 Gbps Gbps available on P2P a network service for a element, Endpoint P2P between service problem Caltech Listing between fixed and Caltech Fermilab and Instantiated Fermilab Request 20 Gbps P2P service between Caltech and Fermilab. If 20 Gbps not available 10Gbps is ok. 15 Gbps P2P service between Caltech and Fermilab Instantiated Service is not working, please check status Failure on a network element, problem fixed

Application to SENSE Interactions Application Workflow Agent Transforms the network into a first class resource for workflow planning and optimization Allows applications to query and negotiate with network The SENSE infrastructure is designed to develop these types of services in DevOps manner, and customize for individual application agents. Cannot do every computation possible, but can do any computation desired. Please provide a listing of all available provisioning Endpoints Endpoint Listing What is the maximum bandwidth available for a P2P service between Caltech and Fermilab? 20 Gbps available for a P2P service between Caltech and Fermilab Request 20 Gbps P2P service between Caltech and Fermilab. If 20 Gbps not available 10Gbps is ok. 15 Gbps P2P service between Caltech and Fermilab Instantiated Service is not working, please check status Failure on a network element, problem fixed

Application Workflow Agent Services Time-Block-Maximum Bandwidth (TBMB): Application asks for a specific time block and would like to know (or provision) the maximum bandwidth available for a specific time period. Bandwidth-Sliding-Window (BSW): Application asks for a specific bandwidth and duration and provides an acceptable time window. For example, a request may be for 40 Gbps for a 10- hour time window, sometime in the next 3 days. Time-Bandwidth-Product (TBP): Application asks for 8 hours of transfer at 10Gbps representing a TBP of 36 TBytes. The user also specifies an acceptable time window, and other options such as prefer the highest bandwidth rate available, or the lowest.

Application Workflow Agent Services Immediate Provision: If SENSE finds a resource path which satisfies the application request, provisioning starts immediately (after routine confirmations from both sides). What is Possible?: In this mode, SENSE simply conducts a Resource Computation and provides the results back to the requestor. No provisioning action is taken without further explicit requests from the user. Negotiation: One or more rounds of Resource Computation requests with subsequent provisioning request by the application user if desired.

SENSE Services- Data Plane Features Data Plane Connectivity Services: Point-to-Point (Layer 2) Multi-Point (Layer 2) Layer 3 QoS/Priority Options Layer 2 (with L3 addressing) Layer 3 Routed Network Connections Quality of Service (guaranteedcapped, guaranteed, besteffort) Negotiation Scheduling, Batch Service Request Strict and Loose hops, Preemption, Lifecycle monitoring and debug

SENSE SC18 Demonstrations

Use Cases Data Transfer Node Priority Flow Deterministic end-to-end data transfers Exascale for Free Electron Lasers (ExaFEL) Streaming the data from the LCLS online cache (NVRAM) to the SLAC data transfer nodes Big Data Express Intelligent selection of WAN paths based on user requirements Future LHC, Superfacility

Interesting Research Questions Tradeoffs between scaling and real-time performance Which information/states should be routinely exchanged between Resource Manager(s) and Orchestrator? Which information should be accessible on demand? Hold times? What is the right level of abstraction for Application Agent to SENSE system interactions? variable levels from highly abstract to very detailed and resource specific? What is the best method for realizing multi-domain, multiresource authentication and authorization? What is the proper granularity for this? User? Project? Domain? Individual network and end system resource elements? Flows?

SENSE Demonstration Contact someone from the SENSE team SENSE Demonstration movies available here: Point to Point QoS https://tinyurl.com/sense-pt-to-pt-sc18 Multi Point service https://tinyurl.com/sense-multipoint-sc18 Intent API Intelligent Services https://tinyurl.com/sense-api-sc18

Thanks!

Extra Slides

SENSE Testbed and SC18 Topology

SENSE GUI

SENSE Features Today Resource Managers OSCARS and NSI controlled networks End Sites with Data Transfer Nodes () Orchestrator Southbound to Resource Managers, Northbound to Application Workflow Agents, Internal computation and workflow functions Data Plane Services Point-to-Point, Multi-Point, QoS, End-to-End Provisions network, end system, starts data flows Application facing Services Time-Block-Maximum Bandwidth (TBMB) Bandwidth-Sliding-Window (BSW) Time-Bandwidth-Product (TBP) Use Cases Priority Service ExaFEL Big Data Express

End to End Provisioning Workflow: -SENSE- provide Resource models to SENSE-Orchestrator -SENSE Orchestrator builds end-to-end Model -Batch Provisioning of P2P) connections -Dataflow between end systems using FDT SENSE End-to-End Model SENSE Orchestrator ESnet UMD MAX SENSE Resource Manager

33 Even though the networks are not typically congested, we see remarkable variability Nine orders of variability for Petrel 11/11/18 Chard K, Dart E, Foster I, Shifflett D, Tuecke S, Williams J. (2018) The Modern Research Data Portal: a design pattern for networked, data-intensive science. PeerJ Computer Science4:e144 https://doi.org/10.7717/peerj-cs.144