Using GPUaaS in Cloud Foundry

Similar documents
Construct a sharable GPU farm for Data Scientist. Layne Peng DellEMC OCTO TRIGr

VMware Hybrid Cloud Solution

AGENDA Introduction Pivotal Cloud Foundry NSX-V integration with Cloud Foundry New Features in Cloud Foundry Networking NSX-T with Cloud Fou

Recent Enhancements to Cloud Foundry Routing. Route Services and TCP Routing

Pontoon An Enterprise grade serverless framework using Kubernetes Kumar Gaurav, Director R&D, VMware Mageshwaran R, Staff Engineer R&D, VMware

Continuous Delivery for Cloud Native Applications

Developing Enterprise Cloud Solutions with Azure

API, DEVOPS & MICROSERVICES

Docker CaaS. Sandor Klein VP EMEA

SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Orchestration: Accelerate Deployments and Reduce Operational Risk. Nathan Pearce, Product Development SA Programmability & Orchestration Team

Building an Operating System for AI

One Platform Kit: The Power to Innovate

#techsummitch

Welcome to Docker Birthday # Docker Birthday events (list available at Docker.Party) RSVPs 600 mentors Big thanks to our global partners:

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Data Center Management and Automation Strategic Briefing

São Paulo. August,

AZURE CONTAINER INSTANCES

Ethernet Fabrics- the logical step to Software Defined Networking (SDN) Frank Koelmel, Brocade

EMC Solutions are Powered by Intel Xeon Processor Technology

REDEFINING THE ENTERPRISE

Cisco CloudCenter Solution with Cisco ACI: Common Use Cases

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Carrier SDN for Multilayer Control

The SD-WAN implementation handbook

Service Insertion with ACI using F5 iworkflow

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

DevOps Tooling from AWS

Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies

8.3 cloud roadmap. Dr. Andrei Borshchev, CEO Nikolay Churkov, Head of Software Development. The AnyLogic Company Conference 2018 Baltimore

e-shelter innovation lab

Please give me your feedback

The Latest EMC s announcements

利用 Mesos 打造高延展性 Container 環境. Frank, Microsoft MTC

IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM s sole discretion.

EMC Storage Resource Management Suite

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

Mesosphere and the Enterprise: Run Your Applications on Apache Mesos. Steve Wong Open Source Engineer {code} by Dell

Enterprise Cloud One OS. One Click.

DEPLOY MODERN APPS WITH KUBERNETES AS A SERVICE

AWS Lambda: Event-driven Code in the Cloud

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017

DEPLOY MODERN APPS WITH KUBERNETES AS A SERVICE

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

DXC Technology and VMware: Innovation that Transforms

Connected vehicle cloud Commercial presentation

Connect and Transform Your Digital Business with IBM

PLEXXI HCN FOR VMWARE VSAN

F5 Networks in the Software Defined DataCenter Era. Paolo Pambianco System Engineer CSP

The intelligence of hyper-converged infrastructure. Your Right Mix Solution

The importance of monitoring containers

Data Center Automation und Orchestration

Deploying and Operating Cloud Native.NET apps

IBM WebSphere Application Server for Bluemix

SUSE s vision for agile software development and deployment in the Software Defined Datacenter

DevOps CICD for VNF a NetOps Approach

Intel. Rack Scale Design: A Deeper Perspective on Software Manageability for the Open Compute Project Community. Mohan J. Kumar Intel Fellow

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Michael Wells Microsoft Specialist, Dell EMC. SQL DBaaS on Microsoft Azure Stack

Evolution of Data Center Security Automated Security for Today s Dynamic Data Centers

Leveraging cloud for real business transformation

Continuous delivery of Java applications. Marek Kratky Principal Sales Consultant Oracle Cloud Platform. May, 2016

XtremIO Business Continuity & Disaster Recovery. Aharon Blitzer & Marco Abela XtremIO Product Management

Practical Guide to Platform as a Service.

Creating a Hybrid Gateway for API Traffic. Ed Julson API Platform Product Marketing TIBCO Software

Running RHV integrated with Cisco ACI. JuanLage Principal Engineer - Cisco May 2018

TRex Realistic Traffic Generator

EMC ScaleIO In The Enterprise: The Citi Experience. Tamir Segal Head of ScaleIO Marketing Eliot A. Wilson Senior Engineer, Citi

Fast IT - Policy Driven Infrastructure for the Intercloud World

Cloud Foundry Diego: The New Cloud Runtime. Heterogeneous Container Scheduling, Docker & More

Dell EMC Native Hybrid Cloud

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation

Cisco Unified Data Center Strategy

Deploying and Operating Cloud Native.NET apps

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

Application Centric Microservices Ken Owens, CTO Cisco Intercloud Services. Redhat Summit 2015

Automating the Software-Defined Data Center with vcloud Automation Center

Best Practice Deployment of F5 App Services in Private Clouds. Henry Tam, Senior Product Marketing Manager John Gruber, Sr. PM Solutions Architect

DevNet Workshop-Hands-on with CloudCenter and Jenkins

Kuber-what?! Learn about Kubernetes

Top 4 considerations for choosing a converged infrastructure for private clouds

Dell EMC Unity: Migration from VNX. Deepak Venkat Product Technologist Midrange & Entry Solutions

Exploring Cloud Security, Operational Visibility & Elastic Datacenters. Kiran Mohandas Consulting Engineer

Introduction to Cumulus Linux

Real-life technical decision points in using cloud & container technology:

Evolution of Rack Scale Architecture Storage

IBM MQ Hybrid Cloud Architectures

CWIN CAPGEMINI WEEK OF INNOVATION NETWORKS. Re-platforming and Cloud Journey. Fausto Pasqualetti, Milano, 25 Settembre 2018

TungstenFabric (Contrail) at Scale in Workday. Mick McCarthy, Software Workday David O Brien, Software Workday

Connecting your Microservices and Cloud Services with Oracle Integration CON7348

Adobe Digital Marketing s IT Transformation with OpenStack

The Future of Interconnect Technology

Unify DevOps and SecOps: Security Without Friction

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

TensorFlow: A System for Learning-Scale Machine Learning. Google Brain

Oktober 2018 Dell Tech. Forum München

Boost your Analytics with ML for SQL Nerds

Transcription:

Using GPUaaS in Cloud Foundry

Agenda Introduction GPUaaS Cloud Foundry Integration 2

Technology Research Innovation Group Innovation Advanced Research Proof of Concept User Feedback Agile Roadmap 3

Technology Research Innovation Group Frank Zhao Consultant Technologist Leading GPUaaS Data Path Distributed System, Micro- Services, Real-time Processing and Analytics 4 Granted Patents 50+ Pending Patents Layne Peng Principal Technologist Cloud Computing and SDDC 10+ Patents A big data and cloud computing book author Victor Fong @victorkfong Sensei at Dell EMC Dojo Cloud Foundry Contributor 4

GPU-as-a Service Project Asaka

What is GPU & GPGPU? Graphical Processing Unit Designed for Parallel Processing Latency: Increase Clock Speed Throughput: Increase Concurrent Task Execution General-Purpose GPU Why: Scientific Computing and Graphical processing => Matrix & Vector Prerequisite: key feature supported in GPU, such as float point computing Easy to use: CUDA => Theano, Tensorflow and etc. Popular use-case: Scientific Calculation, ML/DL, Crypto, Bitcoin and etc. 6

Problems today Not widely deployed High cost Low availability when sharing Hard to monitoring in fine-grained Remote GPU as a Service Fine-grained resource control Sharable Fine-grained resource monitor and tracing Offering GPU in cloud makes it more complex: How to abstract GPU resource? Pass-through solution today Performance issues caused by network Resource discovery and provisioning 7

Project Asaka - GPU as a Service Remote GPU Access Consume remote GPU resource transparently Based on queue model and various network fabric GPU Sharing N:1 model: multiple user consume same GPU accelerator Fine-grained control and management GPU Chaining 1:N model: multiple GPU hosts chaining as one to serve a large size to 1 application; Dynamical GPU number fit to application Smart Scheduling GPU pooling, discovery and provision Fine-grained resource monitor and tracing Pluggable intelligent scheduling algorithms: o Heterogeneous resource o Network tradeoffs 8

How to access remote GPU transparently? Transparent remote GPU resource access: C Asaka CUDA APIs CUDA APIs interception CUDA APIs Fine grained GPU resource management and control: vgpu -side GPU resource chaining Intelligent selection of network fabric: TCP or RDMA Asaka GPU Library vgpu Network (TCP, RDMA) Asaka CUDA APIs C CUDA APIs Compatible with existed CUDA applications Asaka GPU Library vgpu Asaka CUDA APIs Client Network Fabric 9 C : CUDA Application

How to manage GPU pool and scheduling? Pooling and managing GPU resources: Consul based Add/remove node, failure detection, resurrect GPU scheduling challenges: Two-level dispatching: 1 st level: Find GPU hosts 2 nd level: CUDA request to devices GPU server chaining Network Strategy Host Dispatcher Q-Length Strategy H-Res Strategy Intelligent Analysis Client Application Asaka GPU Library Distributed Global State Store Global intelligence and smart scheduling: Global status monitor and tracing Frontend Network topology analysis Queue-based priority profile Pluggable scheduling algorithms Tasks Queue Smart Dispatcher 10

How to integrate with Cloud Native Platform? Offering allocation strategies as a service: High-level abstraction of scheduling functions 3. Execute the Job Cluster Manager 1. Job requests Allocation API Global Scheduler XaaS_Controller Non-invasive design, easy to integrate with existed schedulers in Cloud Native Platform Access GPU transparently from applications deployed in Cloud Native Platform 4. Transparently consume GPU as a service Ease of integrating to Cloud Native Platform Sync nodes attributes & status 2. Allocation Strategies : Job : Allocation Strategy 11

Asaka Demo

Cloud Foundry What is it and why should I care?

Let Daily me Struggle solve that for you Need to MOVE at a faster pace Constant Requests New Machines New Databases New Services Manual Installation Processes Things Crashing Developers writing crappy code Software Platform for rapid pace Self Help Marketplace Automated Orchestration High Availability with Scalability Enforces good software pattern Identical Environments for Dev, Test and Production 14

What is Cloud Foundry? Cloud Foundry DevOps CLI Web GUI Cloud Controller Diego Brain Loggregator Marketplace Diego Cell Diego Cell Router Users 15

Cloud Foundry Marketplace Fully Self Serviced Industrial standard services available Different Service Levels Charge back based on Usage Easy API 16

Cloud Foundry with Asaka Run my app with GPU! I don t care how!

But why? New Requests for GPU Manual Management and Orchestration No Network Traffic Control Hard to predict budget Not releasing GPU GPU dependent App cannot run on CF Self-Service Marketplace Automates orchestration of entire stack Centralize Compute Location Pay and use as you go Easy GPU Service Provision GPU Access from Cloud Foundry! 18

Cloud Foundry with Asaka Cloud Foundry DevOps CLI Web GUI Cloud Controller Diego Brain Loggregator Marketplace Diego Cell Diego Cell Router Users Asaka Cluster XaaS Controller Asaka Asaka Asaka 19

Runtime Stack Cloud Foundry automates Only cares about DevOps 20

Runtime Architecture ML Application Python Buildpack AsakaFS Asaka Cluster 21

Cloud Foundry with Asaka Demo!

What s Next?

Upcoming Talks Cloud Foundry 101 Monday May 8 th 12pm Zeno 4602 7 Lessons Learned from Implementing DevOps Tuesday, May 9 th 12pm - Murano 3201A Adventure of Pair Programming Thursday, May 11 th 1pm Zeno 4602 DevOps from a Developer Perspective Thursday, May 11 th 10am Zeno 4702 24

Next Steps Reach out and Pair with us! jack.harwood@dell.com victor.fong@dell.com Intelligent Scheduling Data Path XaaS (FPGAaaS and etc.) 25

Performance 15000 Memcpy H2D MemcpyH2D: ~98% native MemcpyD2H: ~25% native Further improvement on D2H: (Maybe) the platform issue; Optimize the communication flow between host and client 10000 5000 0 10000 8000 6000 4000 2000 0 1MB 2MB 4MB 8MB 16MB 32MB Native TCP RDMA Memcpy D2H 1MB 2MB 4MB 8MB 27 Native TCP RDMA