Nvidia GPU Support on Mesos: Bridging Mesos Containerizer and Docker Containerizer

Similar documents
Supporting GPUs in Docker Containers on Apache Mesos

Understanding IO patterns of SSDs

IBM Leading High Performance Computing and Deep Learning Technologies

Oracle 一体化创新云技术 助力智慧政府信息化战略. Copyright* *2014*Oracle*and/or*its*affiliates.*All*rights*reserved.** *

PMI,PMI (China) Membership, Certifications. Bob Chen PMI (China) August 31, 2010

云计算入门 Introduction to Cloud Computing GESC1001

新一代 ODA X5-2 低调 奢华 有内涵

IBM 开源技术微讲堂容器技术与微服务系列

Build a Key Value Flash Disk Based Storage System. Flash Memory Summit 2017 Santa Clara, CA 1

Chapter 1 (Part 2) Introduction to Operating System

在数据中心中加速 AI - Xilinx 机器学习套件 (Xilinx ML Suite )

AvalonMiner Raspberry Pi Configuration Guide. AvalonMiner 树莓派配置教程 AvalonMiner Raspberry Pi Configuration Guide

Microsemi - Leading Innovation for China s Hyperscale Data Centers

如何查看 Cache Engine 缓存中有哪些网站 /URL

Apache OpenWhisk + Kubernetes:

ICP Enablon User Manual Factory ICP Enablon 用户手册 工厂 Version th Jul 2012 版本 年 7 月 16 日. Content 内容

Green Computing Cloud Computing LSD Tech Co., Ltd SSD server & SSD Storage Cloud SSD Supercomputer LSD Tech Co., LTD

Silverlight 3 概览 俞晖市场推广经理微软 ( 中国 ) 有限公司

Wireless Presentation Pod

2. Introduction to Digital Media Format

PCU50 的整盘备份. 本文只针对操作系统为 Windows XP 版本的 PCU50 PCU50 启动硬件自检完后, 出现下面文字时, 按向下光标键 光标条停在 SINUMERIK 下方的空白处, 如下图, 按回车键 PCU50 会进入到服务画面, 如下图

IBM 开源技术微讲堂容器技术与微服务系列

Previous on Computer Networks Class 18. ICMP: Internet Control Message Protocol IP Protocol Actually a IP packet

测试基础架构 演进之路. 茹炳晟 (Robin Ru) ebay 中国研发中心

Computer Networks. Wenzhong Li. Nanjing University

Apache Kafka 源码编译 Spark 大数据博客 -

TW5.0 如何使用 SSL 认证. 先使用 openssl 工具 1 生成 CA 私钥和自签名根证书 (1) 生成 CA 私钥 openssl genrsa -out ca-key.pem 1024

Chapter 11 SHANDONG UNIVERSITY 1

Triangle - Delaunay Triangulator

北 京 忆 恒 创 源 科 技 有 限 公 司 16

NyearBluetoothPrint SDK. Development Document--Android

Smart Services Lucy Huo (Senior Consultant, UNITY Business Consulting) April 27, 2016

云计算入门 Introduction to Cloud Computing GESC1001

Presentation Title. By Author The MathWorks, Inc. 1

智能终端与物联网应用 课程建设与实践. 邝坚 嵌入式系统与网络通信研究中心北京邮电大学计算机学院

Machine Vision Market Analysis of 2015 Isabel Yang

EqualLogic Best Practices for SQL Server Deployments

S INSIDE NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORK CONTAINERS

Bi-monthly report. Tianyi Luo

China Next Generation Internet (CNGI) project and its impact. MA Yan Beijing University of Posts and Telecommunications 2009/08/06.

Murrelektronik Connectivity Interface Part I Product range MSDD, cable entry panels MSDD 系列, 电缆穿线板

Deep Learning mit PowerAI - Ein Überblick

学习沉淀成长分享 EIGRP. 红茶三杯 ( 朱 SIR) 微博 : Latest update:

Information is EVERYTHING 微軟企業混和雲解決方案. November 24, Spenser Lin. Cloud Infra Solution Sales, Microsoft Taiwan

BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing. Chen Zheng ICT,CAS

1. DWR 1.1 DWR 基础 概念 使用使用 DWR 的步骤. 1 什么是 DWR? Direct Web Remote, 直接 Web 远程 是一个 Ajax 的框架

软件和支持订订 1.1 订阅单位定义 订阅服务费用以称为 单位 的计量标准为依据 下表 1.1 定义了用于计量贵方使用的软件订阅的数量的各种单位 在贵方购买行为所适用的订单中以及在附件中包含了各种软件订阅所适用的具体单位

Spark Standalone 模式应用程序开发 Spark 大数据博客 -

Outline. Motivations (1/3) Distributed File Systems. Motivations (3/3) Motivations (2/3)

H3C CAS 虚拟机支持的操作系统列表. Copyright 2016 杭州华三通信技术有限公司版权所有, 保留一切权利 非经本公司书面许可, 任何单位和个人不得擅自摘抄 复制本文档内容的部分或全部, 并不得以任何形式传播 本文档中的信息可能变动, 恕不另行通知

实验三十三 DEIGRP 的配置 一 实验目的 二 应用环境 三 实验设备 四 实验拓扑 五 实验要求 六 实验步骤 1. 掌握 DEIGRP 的配置方法 2. 理解 DEIGRP 协议的工作过程

密级 : 博士学位论文. 论文题目基于 ScratchPad Memory 的嵌入式系统优化研究

梁永健. W K Leung. 华为企业业务 BG 解决方案销售部 CTO Chief Technology Officer, Solution Sales, Huawei

饲服驱动在激光切割行业的应用 岑海亮

Next Gen Opportunities. Lilian Jiang 姜宁 17/03/2016

libde265 HEVC 性能测试报告

计算机组成原理第二讲 第二章 : 运算方法和运算器 数据与文字的表示方法 (1) 整数的表示方法. 授课老师 : 王浩宇

Chap1 Introduction. Outline. An Example System. 1.1 Overview. Computer organization and architecture. Computer organization and architecture

Altera 器件高级特性与应用 内容安排 时钟管理 时钟管理 片内存储器 数字信号处理 高速差分接口 高速串行收发器. 时钟偏斜 (skew): 始终分配到系统中到达各个时钟末端 ( 器件内部触发器的时钟输入端 ) 的时钟相位不一致的现象 抖动 : 时钟边沿的输出位置和理想情况存在一定的误差

绝佳的并行处理 - FPGA 加速的根本基石

Company Overview.

Software Engineering. Zheng Li( 李征 ) Jing Wan( 万静 )

Lecture 23 Introduction to Paging

: Operating System 计算机原理与设计

Logitech G302 Daedalus Prime Setup Guide 设置指南

Command Dictionary CUSTOM

Introduction to Computer Science

Virtual Memory Management for Main-Memory KV Database Using Solid State Disk *

Using DC/OS for Continuous Delivery

IBM Deep Learning Solutions

Intelligent System for AI. 清大資工周志遠 AII Workshop

Ganglia 是 UC Berkeley 发起的一个开源集群监视项目, 主要是用来监控系统性能, 如 :cpu mem 硬盘利用率, I/O 负载 网络流量情况等, 通过曲线很容易见到每个节点的工作状态, 对合理调整 分配系统资源, 提高系统整体性能起到重要作用

OTAD Application Note

The Path to GPU as a Service in Kubernetes Renaud Gaubert Lead Kubernetes Engineer

Lowering the Cost of Test while Improving Time-To-Market with a Platform-based Approach

SNMP Web Manager. User s Manual

IBM 企业业务连续性方案建议书. System x3850m2+ds4700/ds5000

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--- global properties --> <property>

最短路径算法 Dijkstra 一 图的邻接表存储结构及实现 ( 回顾 ) 1. 头文件 graph.h. // Graph.h: interface for the Graph class. #if!defined(afx_graph_h C891E2F0_794B_4ADD_8772_55BA3

TOP100 从代码修改到持续交付给客户 快速自动化软件集成及交付实践

我们应该做什么? 告知性分析 未来会发生什么? 预测性分析 为什么会发生 诊断性分析 过去发生了什么? 描述性分析 高级分析 传统 BI. Source: Gartner

public static InetAddress getbyname(string host) public static InetAddress getlocalhost() public static InetAddress[] getallbyname(string host)

The State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology

XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中

A Benchmark For Stroke Extraction of Chinese Characters

Multiprotocol Label Switching The future of IP Backbone Technology

Intel Atom quad-core x5-z8350 (1.44 GHz) 32/64 GB (default 32 GB) Windows 10 IoT Enterprise, Android 6.0

Air Speaker. Getting started with Logitech UE Air Speaker. 快速入门罗技 UE Air Speaker. Wireless speaker with AirPlay. 无线音箱 (AirPlay 技术 )

Supplementary Materials on Semaphores

Run Oracle on Oracle

提升设备制造应用效能 -- 适应物联网发展的嵌入式 IPC. 麦文浩 Max,Mak Embedded IPC ARK PSM

组播路由 - MSDP 和 PIM 通过走

NVIDIA CUDA 超大规模并行. 邓仰东 清华大学微电子学研究所

操作系统原理与设计. 第 13 章 IO Systems(IO 管理 ) 陈香兰 2009 年 09 月 01 日 中国科学技术大学计算机学院

Chapter 7: Deadlocks. Operating System Concepts 9 th Edition

上海泛腾电子科技有限公司徐鹤军 上海张江高科技园区碧波路 500 号 306 室. Tel :

上汽通用汽车供应商门户网站项目 (SGMSP) User Guide 用户手册 上汽通用汽车有限公司 2014 上汽通用汽车有限公司未经授权, 不得以任何形式使用本文档所包括的任何部分

New Media Data Analytics and Application. Lecture 7: Information Acquisition An Integration Ting Wang

下一代互联网的发展 Next Generation Internet

LAB 3: DC Simulations and Circuit Modeling

Transcription:

Nvidia GPU Support on Mesos: Bridging Mesos Containerizer and Docker Containerizer MesosCon Asia - 2016 Yubo Li Research Stuff Member, IBM Research - China Email: liyubobj@cn.ibm.com 1

Yubo Li( 李玉博 ) Email: liyubobj@cn.ibm.com Slack: @liyubobj QQ: 395238640 Dr. Yubo Li is a Researcher Stuff Member at IBM Research, China. He is the architect of the GPU acceleration and deep-learning as a service (DlaaS) components of SuperVessel, an open-access cloud running OpenStack on OpenPOWER machines. He is currently working on GPU support for several cloud container technologies, including Mesos, Kubernetes, Marathon and OpenStack. 2

Why GPUs? GPUs are the tool of choice for many computation intensive applications Deep Learning Scientific Computing Genetic Analysis 3

Why GPUs? GPU can shorten a deep learning training from tens of days to several days 4

Why GPUs? Mesos users have been asking for GPU support for years First email asking for it can be found in the dev-list archives from 2011 The request rate has increased dramatically in the last 9-12 months 5

Interface Why GPUs? We have internal need to support cognitive solutions on Mesos User Web UI Monitoring Operation UI Cognitive API/UI Others Application Data pre-processing DL Training (Caffe, Theano, etc) DL Inference (Caffe, Theano, etc) Web Service Resource Management/Orchestration Mesos + Frameworks (Marathon, k8sm) Infrastructure Compute Resource (VM / Bare-metal) Container (docker, mesos container) Network Volume Hardware Resources Storage Compute Memory SSD/Flash Disk CPU GPU FPGA 6 Memory

Why GPUs? VM based GPU pass-through Exclusively occupied GPUs Container based GPU injection Flexible apply and release 7

Why GPUs? Mesos has no isolation guarantee for GPUs without native GPU support No built-in coordination to restrict access to GPUs Possible for multiple frameworks / tasks to access GPUs at the same time 8

Why GPUs? Enterprise users want to see GPU support on container cloud Deep learning / artificial intelligence need GPU as accelerator Traditional HPC users turn to micro-service arch. and container cloud 9

Why Docker? Extremely popular image format for containers Build once run everywhere Configure once run anything Source: DockerCon 2016 Keynote by Docker s CEO Ben Golub 10

Why Docker? Nvidia-docker Wrap around docker to allow GPUs to be used/isolated inside docker containers CUDA-ready docker images Exclusive CUDA toolkit Loose dependency Shared GPU/CUDA driver https://github.com/nvidia/nvidia-docker 11

Why Docker? Ready-to-use ML/DL images Get rid of tedious framework installation! 12

Why Docker? Our internal consideration We want to re-use so many existing docker images/dockerfiles Developers are familiar with docker 13

What We Want To Do? Test locally with nvidia-docker Deploy to production with Mesos 14

Talk Overview Challenges and our basic ideas GPU unified scheduling design Future works Demo: running cognitive application with Mesos/Marathon + GPU 15

Bare-metal vs. Container for GPU Bare-metal Application (Caffe/TF/ ) CUDA libraries nvidia base libraries Container1 Application (Caffe/TF/ ) CUDA libraries nvidia base libraries Container2 Application (Caffe/TF/ ) CUDA libraries nvidia base libraries nvidia-kernel-module Linux Kernel Loose couple between host and container is the most challenge! nvidia-kernel-module Linux Kernel 16

Challenges Container Application (Caffe/TF/ ) CUDA libraries nvidia base libraries (v1) Container1 Application (Caffe/TF/ ) CUDA libraries nvidia base libraries Container2 Application (Caffe/TF/ ) CUDA libraries nvidia base libraries nvidia-kernel-module (v2) Linux Kernel nvidia-kernel-module Linux Kernel Not work if nvidia libraries and kenel module versions are not match We also need GPU isolation control 17

How We Solve That? Container Application (Caffe/TF/ ) Container Application (Caffe/TF/ ) CUDA libraries nvidia base libraries (v1) CUDA libraries nvidia base libraries (v2) Volume injection nvidia-kernel-module (v2) Linux Kernel nvidia base libraries (v2) nvidia-kernel-module (v2) Linux Kernel Not work if nvidia libraries and kernel module versions are not match 18

How We Solve That? Mimic functionality of nvidia-docker-plugin Finds all standard nvidia libraries / binaries on the host and consolidates them into a single place as a docker volume (nvidia-volume) /var/lib/docker/volumes nvidia_xxx.xx (version number) bin lib lib64 Inject volume with ro to container if needed 19

How We Solve That? Determine whether nvidia-volume is needed Check docker image label: com.nvidia.volumes.needed = nvidia_driver Inject nvidia-volume to /usr/local/nvidia if the label found This label certificates following things: https://github.com/nvidia/nvidia-docker/blob/master/ubuntu-14.04/cuda/7.5/runtime/dockerfile 20

How We Solve That? GPU isolation Currently we support physical-core level isolation Example Per card Isolation? GPU sharing is not supported No process capping mechanism from nvidia GPU driver GPU sharing is suggested for MPI/OpenMP case only Yes 1 core of Tesla K80 (dual-core) Yes 512 CUDA cores of Tesla K40 No 21

How We Solve That? GPU device control: /dev nvidia0 nvidia1 nvidiactl nvidia-uvm nvidia-uvm-tools (data interface for GPU0) (data interface for GPU1) (control interface) (unified virtual memory) (UVM control) Isolation Mesos containerizer: cgroups Docker containerizer: docker run devices 22

How We Solve That? Dynamic loading of nvml library Nvidia GDK (nvml library) Same Mesos binary works on both for GPU node and non-gpu node Needs on compile Mesos binary with GPU support Yes, with GPU Run Mesos on GPU node Nvidia GPU Driver Yes, without GPU Mesos binary without GPU support Yes, without GPU Yes, without GPU Run Mesos on non-gpu node 23

Apache Mesos and GPUs Multiple containerizer support Mesos (aka unified) containerizer (fully supported) Docker containerizer (code review, partially merged) Why support both? Many people are asking for docker containerizer support to bridge the feature gap People are already familiar with existing docker tools Unified containerizer needs time to mature 24

Apache Mesos and GPUs GPU_RESOURCES framework capability Frameworks must opt-in to receive offers with GPU resources Prevents legacy frameworks from consuming non-gpu resources and starving out GPU jobs Use agent attributes to select specific type of GPU resources Agents advertise the type of GPUs they have installed via attributes Only accept an offer if the attributes match the GPU type you want 25

Usage Usage Nvidia GPU and GPU driver needed Install Nvidia GPU Deployment Toolkit (GDK) Compile Mesos with flag:../configure --with-nvml=/nvml-header-path && make j install Build GPU images following nvidia-docker do: (https://github.com/nvidia/nvidia-docker) Run a docker task with additional such resource gpus=1 Mesos Containerier: --isolation="cgroups/devices,gpu/nvidia" 26

Apache Mesos and GPUs -- Evolution Containerizer API Isolator API Mesos Agent (Unified) Mesos Containerizer Mimics functionality of nvidia-docker-plugin Nvidia GPU Allocator Nvidia Volume Manager CPU Memory GPU Linux devices cgroup Nvidia GPU Isolator 27

Apache Mesos and GPUs -- Evolution Containerizer API Isolator API Mesos Agent (Unified) Mesos Containerizer CPU Memory GPU Nvidia GPU Allocator Nvidia Volume Manager Nvidia GPU Isolator Linux devices cgroup 28

Apache Mesos and GPUs -- Evolution Containerizer API Composing Containerizer Docker Containerizer GPU (Unified) Mesos Containerizer Nvidia GPU Allocator Isolator API CPU Memory GPU Nvidia Volume Manager 29

Apache Mesos and GPUs Mesos Agent mesos-docker-executor Mesos Containerizer Nvidia GPU Isolator Nvidia GPU Allocator Docker Daemon --device Docker Containerizer --volume Nvidia Volume Manager CPU Memory GPU GPU driver volume Docker image label check: com.nvidia.volumes.needed="nvidia_driver" Native docker arguments for GPU management 30

Release and Eco-syetems Release GPU for Mesos Containerizer: fully supported after Mesos 1.0 (supports both image-less and docker-image based containers) GPU for Docker Containerizer: Expected to release on Mesos 1.1 or 1.2 Eco-systems Marathon GPU support for Mesos Containerizer after Marathon v1.3 GPU support for Docker Containerizer ready for release (wait for Mesos support) K8sm On design 31

Mesos on IBM POWER8 Apache Mesos 1.0 and GPU feature perfectly supports IBM POWER8 IBM POWER8 Delivers Superior Cloud Performance with Docker 32

IBM LC 产品为 Power Systems 增添新活力 High Performance Computing S812LC S822LC 大器有为 要多大才比的上它的认知高度 帮助企业加速获得洞察 S822LC for HPC 大快人心 要多快才比的上它的运算速度 开启新一波加速浪潮 S821LC 大智若云 要多智才比的上它的云端契合度 开启数据中心与云的无缝模式 S822LC for Commercial Computing Storage rich single socket system for big data applications Memory Intensive workloads 以存储为中心 高数据吞吐量工作负载的理想之选 采用 2 个 POWER8 插槽, 用于处理大数据工作负载 通过 CAPI 和 GPU 实现大数据加速 引入了 CPU-GPU NVLink, 可将进入 GPU 加速器的带宽提升 2.5 倍 POWER8 与 NVIDIA NVLink 的完美结合 Intel x86 系统内存带宽的 2 倍 内存密集型工作负载 2X memory bandwidth of Intel x86 systems Memory Intensive workloads 33

Special Thanks to Collaborators Kevin Klues Rajat Phull Seetharami Seelam Guangya Liu Qian Zhang Benjamin Mahler Vikrama Ditya Yong Feng 34

Demo Build a GPU-enabled cognitive web service in a minute! 35