Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Similar documents
Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Timothy Lanfear, NVIDIA HPC

HPC with Multicore and GPUs

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

How to Write Code that Will Survive the Many-Core Revolution

GPU computing at RZG overview & some early performance results. Markus Rampp

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Accelerating High Performance Computing.

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

n N c CIni.o ewsrg.au

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

GPU Computing Ecosystem

Running the FIM and NIM Weather Models on GPUs

NVIDIA GPU TECHNOLOGY UPDATE

GPU ARCHITECTURE Chris Schultz, June 2017

GPU Architecture. Alan Gray EPCC The University of Edinburgh

OpenACC Standard. Credits 19/07/ OpenACC, Directives for Accelerators, Nvidia Slideware

Parallel Computing. November 20, W.Homberg

Experts in Application Acceleration Synective Labs AB

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

General Purpose GPU Computing in Partial Wave Analysis

Introduction to GPU computing

Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

VSC Users Day 2018 Start to GPU Ehsan Moravveji

To hear the audio, please be sure to dial in: ID#

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Vectorisation and Portable Programming using OpenCL

OpenACC programming for GPGPUs: Rotor wake simulation

GPU ARCHITECTURE Chris Schultz, June 2017

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

Technology for a better society. hetcomp.com

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

Addressing Heterogeneity in Manycore Applications

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Parallel Programming on Ranger and Stampede

MAGMA. Matrix Algebra on GPU and Multicore Architectures

RAMSES on the GPU: An OpenACC-Based Approach

GpuWrapper: A Portable API for Heterogeneous Programming at CGG

The Era of Heterogeneous Computing

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Introduction to GPU hardware and to CUDA

GPU A rchitectures Architectures Patrick Neill May

Kepler Overview Mark Ebersole

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

An Introduction to OpenACC

GPU Computing with NVIDIA s new Kepler Architecture

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA

EXASCALE COMPUTING ROADMAP IMPACT ON LEGACY CODES MARCH 17 TH, MIC Workshop PAGE 1. MIC workshop Guillaume Colin de Verdière

GPUs and Emerging Architectures

CME 213 S PRING Eric Darve

MANY-CORE COMPUTING. 7-Oct Ana Lucia Varbanescu, UvA. Original slides: Rob van Nieuwpoort, escience Center

Preparing for Highly Parallel, Heterogeneous Coprocessing

Steve Scott, Tesla CTO SC 11 November 15, 2011

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University

A General Discussion on! Parallelism!

Applications of Berkeley s Dwarfs on Nvidia GPUs

CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS

ORAP Forum October 10, 2013

Accelerating Financial Applications on the GPU

Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

IBM CORAL HPC System Solution

CSC573: TSHA Introduction to Accelerators

OpenACC2 vs.openmp4. James Lin 1,2 and Satoshi Matsuoka 2

Experiences with GPGPUs at HLRS

CUDA Experiences: Over-Optimization and Future HPC

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

TESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

Heterogeneous Computing and OpenCL

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

HPC Architectures past,present and emerging trends

Advanced OpenACC. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2016

Stan Posey, NVIDIA, Santa Clara, CA, USA

Fast-multipole algorithms moving to Exascale

John Levesque Nov 16, 2001

CUDA. Matthew Joyner, Jeremy Williams

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

Cuda C Programming Guide Appendix C Table C-

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

S Comparing OpenACC 2.5 and OpenMP 4.5

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

High Performance Computing (HPC) Introduction

Tesla GPU Computing A Revolution in High Performance Computing

Transcription:

+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

+ Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics to Multi-Computing - The Needs n Ecosystem SW HW n Trends and Convergence - Market n ManyCores and OpenACC n Economics n Kaust Examples n Acoustics n Electromagnetics n From Academia to Industry The Opportunity@KAUST n OpenACC Training on Jan30th B9-R2220 9:30am

+ From Multi Physics Multi Scales to Hybrid-Computing MultiPhysics Fracture Simulation Reservoir Modelling Aerosol Maths and Discretization Partial Derivatives Equations Volume Integral Equations Programing Models Multi Computing OpenACC fine grain OpenMP coarse grain MPI - large grain CPUs GPGPU Accelerators FPGA Fractures Simulation are Compute Intensive. Accelerators can absorb the peak needs, can OpenACC help to use it?

+ Hybrid Computing Platforms n Hybrid = Heterogeneous n Hybrid Computing Platforms are made of CPUs +GPUs or Accelerators or FPGA n Examples of Vendors : n NVidia : GPUs n AMD : Accelerators and GPUs n Intel : Accelerators Xeon Phi n FPGA : Convey, Maxeler, SRC n http://www.conveycomputer.com/ n http://www.maxeler.com/ n http://www.srccomp.com/

+ Hybrid Computing - CPU+GPU Many Cores NVidia Cuda DP Gflops Cores Mhz GB SMP S1070 1.3 345 192 1200 4 30 C2075 2.0 515 448 1600 5.3 14 K20X 3.5 1310 2688 730 6.1 14

+ Programming GPU Environment n CUDA n Cuda 5 drivers n Cuda SDK, cuda compilers, debuggers, profilers n Cuda Toolkits : samples n Libraries : cufft, cublas, cusparse n Applications (catalog ~ 300 CUDA/GPU Enabled) n Molecular Dynamics : Amber, Gromacs, Lammps, Namd, Vasp n Computational Chemistry : NW Chem n Computational Structural Mechanics : Abaqus, Ansys n Geophysics : CGG Veritas, Paradigm Echos, Schlumberger WesternGeco n Maths : Matlab, Mathematica, Maple http://www.nvidia.com/docs/io/123576/nv-applications-catalog-lowres.pdf

+ Programming Environments Evolution - OpenACC n CUDA 2006 OpenCL 2008 - GPGPU n OpenACC in 2011 n CAPS n PGI n CRAY n Advantages of OpenACC n Preserves the legacy n Incremental Optimization and porting on the GPU/Accelerator n Very Simple to Implement n Looks like OpenMP n Exploit broad Opportunities of Optimizations (fine and coarse grain) n http://www.openacc-standard.org/

+ Type of Parallelism - Technology granularity Application programmers level Large Grain Message Passing Ex. Domain decomposition Target accelerators / many-cores Coarse Grain Task parallelism Data stream parallelism Dynamic, load balancing oriented Data locality oriented Compilers target Fine Grain Instruction level parallelism SIMD Instructions (SSE,..)

+ Programming Models OpenMP - OpenACC OpenMP Cuda OpenACC Memory Model Coherent Shared Variables - Private Variables - Global Memory Not Coherent - Shared Local Memory Coherent inside blocks - Global Memory Not Coherent - Shared Local Memory Coherent inside blocks Parallel Constructs SIMD - loops SPMD - regions MIMD - tasks Kernel with hierarchy of - Grid - Block/Warp - Thread Kernel or Parallel Hierarchy of - Gang - Worker - Vector

+ Trends and Convergence

+ Hardware and Programming Convergence n Many Cores Adoption n Intel : Sandybridge, MIC n AMD/ATI : Radeon, Fusion n Nvidia : Kepler, Maxwell n OpenACC n Cray n PGI n NVidia n CAPS

+ Market is growing n The global economy in HPC is growing again (IDC 2011) n 2010 grew by 10%, to reach $9.5 billion n forecasting ~7% growth over the next 5 years n 30% of all HPC sites use Accelerators mostly GPGPUs (IDC) n Top500 list Nov 2012 n #1 Titan@ORNL : 18 PetaFlops system with 261000 K20 cores n 3 of the first 10 are Hybrid Computers using Accelerators either Intel Xeon PHI or Nvidia n Accelerators are being adopted by major mainstream vendors n Accelerators are part of the ExaScale Race

+ Hybrid Computing @ KAUST 0.5 Petaflops on GPGPU n KAUST awarded CUDA Research Center n GPGPU Computing at KAUST > 0.5 Pflops q Laptops ~ 100 Tflops q Desktops ~ 300Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 64 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops n NEW OpenACC Compilers q CAPS q PGI n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM) n Competences at KAUST KSL and Research Computing

+ Thank You

+ Hybrid Computing @ KAUST 0.65 Petaflops on GPGPU n KAUST awarded CUDA Research Center n GPGPU Computing at KAUST > 0.65 Pflops q Laptops ~ 100 Tflops q Desktops ~ 400Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 32 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops n NEW OpenACC Compilers q CAPS q PGI n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM) n Competences at KAUST KSL and Research Computing

+ Academia to Industry n Hybrid Computing is a big opportunity for KAUST n KAUST has the Critical Mass n to create value in Research and Industry n Develop New Algorithms n Develop New Libraries and Applications n Develop New Knowledge, New Competences n Create Business through Economic Development n CAPS is a good example of transfer from Academia to Industry

+ OpenACC Training on Jan30th in Building 9 R2220 9:30am n Introduction to GPU computing n CUDA architecture and programming model n OpenACC Overview & compilers n OpenACC Programming n ModelManaging data with OpenACC n OpenACC loop constructs n Asynchronism with OpenACC n OpenACC runtime API