ROCm: An open platform for GPU computing exploration

Similar documents
AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

SIMULATOR AMD RESEARCH JUNE 14, 2015

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

Understanding GPGPU Vector Register File Usage


AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

Automatic Intra-Application Load Balancing for Heterogeneous Systems

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

HyperTransport Technology

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

Heterogeneous Computing

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

Generic System Calls for GPUs

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

Unified Communication X (UCX)

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

FUSION PROCESSORS AND HPC

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

RegMutex: Inter-Warp GPU Register Time-Sharing

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

viewdle! - machine vision experts

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

Fusion Enabled Image Processing

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

PCCC WORKSHOP:AMD の最新製品戦略とプラットフォームソリューション FEBRUARY 19 TH 2016 HIDETOSHI IWASA, FAE MANAGER AMD JAPAN

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

DR. LISA SU

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

Designing Natural Interfaces

Vulkan (including Vulkan Fast Paths)

AMD AIB Partner Guidelines. Version February, 2015

AMD EPYC CORPORATE BRAND GUIDELINES

Sequential Consistency for Heterogeneous-Race-Free

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

Driver Options in AMD Radeon Pro Settings. User Guide

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

Fan Control in AMD Radeon Pro Settings. User Guide. This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings.

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

Solid State Graphics (SSG) SDK Setup and Raw Video Player Guide

AMD RYZEN CORPORATE BRAND GUIDELINES

Paving the Road to Exascale

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

Cilk Plus: Multicore extensions for C and C++

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v3.0

1 Presentation Title Month ##, 2012

Resource Saving: Latest Innovation in Optimized Cloud Infrastructure

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

MULTIMEDIA PROCESSING Real-time H.264 video enhancement by using AMD APP SDK

Interconnect Your Future

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v2.1

UCX: An Open Source Framework for HPC Network APIs and Beyond

Mellanox GPUDirect RDMA User Manual

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

User Manual. Nvidia Jetson Series Carrier board Aetina ACE-N622

EPYC VIDEO CUG 2018 MAY 2018

Graphics Hardware 2008

AMD 780G. Niles Burbank AMD. an x86 chipset with advanced integrated GPU. Hot Chips 2008

MxGPU Setup Guide with VMware

AMD Security and Server innovation

AMD Radeon ProRender plug-in for Universal Scene Description. Installation Guide

INTRODUCING RYZEN MARCH

Mellanox GPUDirect RDMA User Manual

Transcription:

UCX-ROCm: ROCm Integration into UCX {Khaled Hamidouche, Brad Benton}@AMD Research ROCm: An open platform for GPU computing exploration 1 JUNE, 2018 ISC

ROCm Software Platform An Open Source foundation for Hper Scale and HPC-class GPU computing Graphics core next headless Linux 64-bit driver Large memor single allocation Peer-to-Peer Multi-GPU Peer-to-Peer with RDMA Sstems management API and tools Rich compiler foundation for HPC developer 2 HSA drives rich capabilities into the ROCm hardware and software LLVM native GCN ISA code generation Offline compilation support Standardized loader and code object format GCN ISA assembler and disassembler Full documentation to GCN ISA JUNE, 2018 ISC User mode queues Architected queuing language Flat memor addressing Atomic memor transactions Process concurrenc & preemption Open Source tools and libraries Rich Set of Open Source math libraries Tuned Deep Learning frameworks Optimized parallel programing frameworks CodeXL profiler and GDB debugging

ROCm Leverages OpenUCX For Scale-up and Scale-out Distributed Programming Models Next generation open source HPC communication framework Built off the foundation of MXM, UCCS, PAMI Broad Industr support including IBM, ARM, Mellanox, Nvidia, and AMD Rich platform for supporting MPI, OpenSHMEM, PGAS UCX 3 JUNE, 2018 ISC

ROCm for Distributed Sstems CPU can directl accesses GPU memor Expose entire GPU frame buffer as addressable memor through PCIe BAR (LargeBar feature) Map GPU pages to CPU pages Allow CPU to directl load/store from/to GPU memor HCA to directl access GPU memor : ROCnRDMA feature Leverages Mellanox s PeerDirect feature Allows IB HCA to directl read/write data from/to GPU memor Available and enabled b default in ROCm 4 JUNE, 2018 ISC

UCX over ROCm: Intra-node support Zero-cop based design uct_rocm_cma_ep_put_zcop uct_rocm_cma_ep_get_zcop Zero-cop based implementation Similar to the CMA UCT code in UCX ROCm provides similar functions to the original CMA for GPU memories hsakmtprocessvmwrite hsakmtprocessvmread IPC for intra-node communication Working on providing ROCm-IPC support in UCX Test-bed: AMD FIJI GPUs, Intel CPU, Mellanox Connect-IB OMB latenc benchmark Latenc (us) 12 10 8 6 4 2 0 1.9 us 0 1 2 4 8 16 32 64 128 256 512 Message Size (Btes) } ROCM-CMA provides efficient support for large messages } 1.9 us for 4 Btes transfer for intra-node D-D } 43 us for 512KBtes transfer for intra-node 5 JUNE, 2018 ISC

UCX over ROCm: Inter-node Support 15 Takes advantage of LargeBar capabilit to support eager protocols Eager protocols can run directl from GPU buffers Latenc (us) 10 5 2.4 us Take advantage of ROCnRDMA to design rendezvous (RNDV) protocols 0 0 1 2 4 8 16 32 64 128 256 512 Message Size (Btes) Optimization and tuning work in progress Enhanced and optimized GPU-Aware protocols Pipeline, etc. } LargeBar feature provides efficient support for eager protocol } 2.4 us for 4 Btes transfer for inter-nodes 6 JUNE, 2018 ISC

Disclaimer & Attribution The information presented in this document is for informational purposes onl and ma contain technical inaccuracies, omissions and tpographical errors. The information contained herein is subject to change and ma be rendered inaccurate for man reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notif an person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION 2018 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, FirePro and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. ARM is a registered trademark of ARM Limited in the UK and other countries. PCIe is a registered trademarks of PCI-SIG Corporation. OpenCL and the OpenCL logo are trademarks of Apple, Inc. and used b permission of Khronos. OpenVX is a trademark of Khronos Group, Inc. Other names are for informational purposes onl and ma be trademarks of their respective owners. Use of third part marks / names is for informational purposes onl and no endorsement of or b AMD is intended or implied. 7 JUNE, 2018 ISC