Containerizing GPU Applications with Docker for Scaling to the Cloud

Similar documents
IBM Bluemix compute capabilities IBM Corporation

[Docker] Containerization

BUILDING A GPU-FOCUSED CI SOLUTION

Deployment Patterns using Docker and Chef

Containerization Dockers / Mesospere. Arno Keller HPE

Flip the Switch to Container-based Clouds

Docker and Oracle Everything You Wanted To Know

Using DC/OS for Continuous Delivery

Onto Petaflops with Kubernetes

Multi-Arch Layered Image Build System

Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS

Logging, Monitoring, and Alerting

Continuous Integration and Delivery with Spinnaker

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

How to Keep UP Through Digital Transformation with Next-Generation App Development

Microservice Deployment. Software Engineering II Sharif University of Technology MohammadAmin Fazli

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

Mesosphere and the Enterprise: Run Your Applications on Apache Mesos. Steve Wong Open Source Engineer {code} by Dell

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

YOUR APPLICATION S JOURNEY TO THE CLOUD. What s the best way to get cloud native capabilities for your existing applications?

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

ISLET: Jon Schipp, AIDE jonschipp.com. An Attempt to Improve Linux-based Software Training

Building Kubernetes cloud: real world deployment examples, challenges and approaches. Alena Prokharchyk, Rancher Labs

Shifter: Fast and consistent HPC workflows using containers

Containers, Serverless and Functions in a nutshell. Eugene Fedorenko

WHY COMPOSABLE INFRASTRUCTURE INSTEAD OF HYPERCONVERGENCE

Building a Data-Friendly Platform for a Data- Driven Future

WHITE PAPER. RedHat OpenShift Container Platform. Benefits: Abstract. 1.1 Introduction

Who is Docker and how he can help us? Heino Talvik

The Long Road from Capistrano to Kubernetes

AZURE CONTAINER INSTANCES

How to go serverless with AWS Lambda

Running MarkLogic in Containers (Both Docker and Kubernetes)

Getting Started With Containers

AGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS

Choosing the Right Container Infrastructure for Your Organization

Docker II - Judgement Day

How to Put Your AF Server into a Container

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Important DevOps Technologies (3+2+3days) for Deployment

Kubernetes made easy with Docker EE. Patrick van der Bleek Sr. Solutions Engineer NEMEA

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

S Implementing DevOps and Hybrid Cloud

Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING

VM Migration, Containers (Lecture 12, cs262a)

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

Composable Infrastructure for Public Cloud Service Providers

Amir Zipory Senior Solutions Architect, Redhat Israel, Greece & Cyprus

Managing Deep Learning Workflows

Two years of on Kubernetes

Building an Operating System for AI

Faculté Polytechnique

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

Swift Web Applications on the AWS Cloud

Sunil Shah SECURE, FLEXIBLE CONTINUOUS DELIVERY PIPELINES WITH GITLAB AND DC/OS Mesosphere, Inc. All Rights Reserved.

Airship A New Open Infrastructure Project for OpenStack

Docker for People. A brief and fairly painless introduction to Docker. Friday, November 17 th 11:00-11:45

Building A Better Test Platform:

DevOps Technologies. for Deployment

CS-580K/480K Advanced Topics in Cloud Computing. Container III

LINUX CONTAINERS. Where Enterprise Meets Embedded Operating Environments WHEN IT MATTERS, IT RUNS ON WIND RIVER

Exploring Cloud Security, Operational Visibility & Elastic Datacenters. Kiran Mohandas Consulting Engineer

HPC over Cloud. July 16 th, SCENT HPC Summer GIST. SCENT (Super Computing CENTer) GIST (Gwangju Institute of Science & Technology)

Using Crowbar to Deploy Your OpenStack Cloud. Adam Spiers Vincent Untz John H Terpstra

WHITEPAPER. Embracing Containers & Microservices for future-proof application modernization

Implementing the Twelve-Factor App Methodology for Developing Cloud- Native Applications

Cloud I - Introduction

Index. Bessel function, 51 Big data, 1. Cloud-based version-control system, 226 Containerization, 30 application, 32 virtualize processes, 30 31

Baremetal with Apache CloudStack

Container Orchestration on Amazon Web Services. Arun

Patching and Updating your VM SUSE Manager. Donald Vosburg, Sales Engineer, SUSE

State of OpenShift on Bare Metal

Cloud Computing & Visualization

Devops, Docker and Security. John

Merging Enterprise Applications with Docker* Container Technology

Comparative Analysis on Docker and Virtual Machine in Cloud Computing

The IBM Platform Computing HPC Cloud Service. Solution Overview

Embedded GPGPU and Deep Learning for Industrial Market

HPC learning using Cloud infrastructure

Build your own Cloud on Christof Westhues

Travis Cardwell Technical Meeting

ovirt and Docker Integration

Run Stateful Apps on Kubernetes with PKS: Highlight WebLogic Server

IBM Cloud Developer Tools (IDT) and App Service Console Overview

Introduction to Docker. Antonis Kalipetis Docker Athens Meetup

DevOps Workflow. From 0 to kube in 60 min. Christian Kniep, v Technical Account Manager, Docker Inc.

MQ High Availability and Disaster Recovery Implementation scenarios

Supporting GPUs in Docker Containers on Apache Mesos

Learn. Connect. Explore.

POWERING THE INTERNET WITH APACHE MESOS

/ Cloud Computing. Recitation 5 February 14th, 2017

PUBLIC AND HYBRID CLOUD: BREAKING DOWN BARRIERS

OS Virtualization. Linux Containers (LXC)

SQL Server inside a docker container. Christophe LAPORTE SQL Server MVP/MCM SQL Saturday 735 Helsinki 2018

Roadmap: Operating Pentaho at Scale. Jens Bleuel Senior Product Manager, Pentaho

Building/Running Distributed Systems with Apache Mesos

Frequently Asked Questions on WebSphere Application Server z/os

Testing Docker Performance for HPC Applications

Running Splunk Enterprise within Docker

Transcription:

Containerizing GPU Applications with Docker for Scaling to the Cloud SUBBU RAMA FUTURE OF PACKAGING APPLICATIONS

Turns Discrete Computing Resources into a Virtual Supercomputer GPU Mem Mem GPU GPU Mem Mem GPU CPU CPU CPU CPU Mem GPU GPU Mem GPU Mem GPU Mem CPU CPU CPU CPU GPU GPU GPU GPU GPU GPU GPU GPU Mem Mem Mem Mem Mem Mem Mem Mem Data Center Virtual Supercomputer

What problems are we trying to solve? Software is Stuck: proper installation can take days operating system requirements library dependencies drivers interoperability between tools Hardware is Stuck: proper setup and optimization can take days code portability performance portability resource provisioning

Goal Given: Applications from different vendors Systems of different capabilities Heterogeneous hardware Compose a workflow that: Works: individual components work, thus workflow works Is Portable: workload can be migrated across infrastructure Is Performant: has the ability to take advantage of GPU hardware Is Secure: individual components can be easily audited

Current Solutions Current solutions revolve around a common denominator: Operating system that works for all tools in chain Compute nodes which satisfy the most memory hungry application Need GPUs? Must deploy on top of GPU only nodes Cost sensitive? Must deploy on low-end CPU only nodes Common denominator shortcoming: Inefficiencies Poor utilization / over provisioning Non-performant

Solution Containerize all applications Create GPU/CPU versions Assemble containers into workflow templates To represent particular use cases and pipelines Use workflow templates to create virtual clusters Optimize performance / budget via virtual clusters

What are Containers? Containers are nothing new Part of Linux for last 10 years LXC, FreeBSD Jails, Solaris Containers, etc. What is new are APIs Docker Rocket etc. Specifically A complete runtime environment: OS, application, libraries, dependencies, binaries, and configuration files Can be quickly deployed on a set of container hosts when needed

Containers vs. VMs (Stack Comparison)

Why Containers? Easy Deployment Avoid hours of environment / application setup Fast environment spin-up / tear-down Flexibility Applications use preferred version of OS, libs language versions, etc. Move data to application, or move Application to data Reproducibility / Reliability / Scaling Workflow steps start with clean and immutable images Reliability through easy migration and checkpointing

GPU Containers the NVIDIA Way Much easier that it used to be One no longer has to fully reinstall the NVIDIA driver within the container No more container vs. host system driver matching conflicts - container works with host OS driver - there is still a drive and toolkit dependency https://github.com/nvidia/nvidia-docker Requirements Host has NVIDIA Drivers Host has Docker installed

GPU Container Getting Started (CAFFE) Create a Dockerfile Very small, easy to re-build/update container if needed Reproducible builds Specify Operating System Install Operating System basics Install Application Dependencies Install Application Once Dockerfile is done: Build Container, Test Container, Store Container in Repo Quickly spin up and container where and when needed Enables fire-and-forget GPU applications What about data? Long answer: we ll get to that in a bit Short: Put it somewhere else, keep containers small

Dockerfile Code Demo

Demo 1: Deploy GPU Container Across Clouds Interactive shell demo which will show: Launch container on Cloud #1 and execute application (example: execute container on AWS and the K520s) Take exact same container and launch on Cloud #2 and execute application (example : execute container on Softlayer on the K80s) Highlight the following: container runs on different clouds container uses different types of GPUs and drivers and everything works transparently fire and forget GPU applications on GPU hardware you need wherever it may be

Container Performance People are sceptical about container performance vs bare-metal There are special cases where performance can be an issue, but in general performance is on par, and better than VMs Docker versus Bare Metal is within 10% performance W. Felter, A. Ferreira, R. Rajamony, and J. Rubio. An Updated Performance Comparison of Virtual Machines and Linux Containers.Technology, 28:32, 2014. (IBM)

So what about Data? In general, avoid storing data in containers. Container ought to be immutable bring it up, perform a task, return the result, shut it down Containers ought to be small size of containers impacts startup times size of containers impacts time it takes to pull container from repository Discuss best practices for data storage/sharing give example of data sharing between containers using Data Volume Containers

Application Flow Pipelines & Scheduling Sophisticated tool flows rarely consist of a single application Some steps may only run on CPUs Some steps may execute on a CPU or a GPU Challenge is how to schedule these flow efficiently to either obtain faster turnaround times or better overall throughput, while maintaining reproducible results

Example Workflow: Semiconductor Circuit Design

Example Workflow: Semiconductor Circuit Design CPU/GPU App

Cluster Scheduling Constraints In general several assumptions can be made about today s clusters # CPU nodes >> # GPU nodes GPU nodes have a fixed #GPUs in them Best machine for an application is usually determined by amount of memory amount and type of CPUs amount and type of GPUs How can containers help with scheduling give this constraint vs. regular schedulers

Example Scheduler: Regular vs. Container based Kubernetes Mesosphere

What if we can break Physical Machine Limitations? Most cloud service provider and data centers are limited by physics Example: Largest machines has 2 GPUs (Softlayer), 4 GPUs (AWS) Rack can only have max amount of GPUs due to power constraints What if we could create virtual machines and clusters and present them to applications as a single virtual machines? How would this change the clusters and schedulers? * Elastic Containers or Elastic Machines via Containers (grow or shrink)

Introduce Bitfusion Boost Containers We can: Combine Bitfusion Boost and Containers -> create magic! What things can we build? Create a machine which has 16 or more virtual GPUs! Run an application across these GPUs without having to setup MPI, SPARK, HADOOP! Run GPU applications on non-gpu machines by automatically offloading to GPU machines in the cluster All of the above can be done WITHOUT CODE CHANGES for GPU enabled applications!

Boost Container Building Blocks Boost Server Container Boost provisioned container with Boost Server Runs on any GPU provisioned host Can act as a client at the same time Boost Client Container Boost provisioned container with Boost Client and End User Application Runs on any-type of instance including CPU only instances

Boost Container Architecture

Demo 2: Build Virtual GPU Instances in the Cloud Demo will show the following using containers: How in minutes we can create virtual GPU cluster configurations How we can provision GPU machines which don t exist in the physical world How we can run GPU applications on non-gpu machines How we can execute applications across these configurations without changing a single line of code!

Thank You Visit us at our Booth located at: To learn more about GPU Containers To learn more about Bitfusion Boost And, to see more physics defying demos! AWS monster machine and CloudX @Bitfusionio @subburama subbu@bitfusion.io