Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Size: px

Start display at page:

Download "Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS"

Harvey Mills
6 years ago
Views:

1 + Hybrid KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

2 + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics to Multi-Computing - The Needs n Ecosystem SW HW n Trends and Convergence - Market n ManyCores and OpenACC n Economics n Kaust Examples n Acoustics n Electromagnetics n From Academia to Industry The Opportunity@KAUST n OpenACC Training on Jan30th B9-R2220 9:30am

3 + From Multi Physics Multi Scales to Hybrid-Computing MultiPhysics Fracture Simulation Reservoir Modelling Aerosol Maths and Discretization Partial Derivatives Equations Volume Integral Equations Programing Models Multi Computing OpenACC fine grain OpenMP coarse grain MPI - large grain CPUs GPGPU Accelerators FPGA Fractures Simulation are Compute Intensive. Accelerators can absorb the peak needs, can OpenACC help to use it?

4 + Hybrid Computing Platforms n Hybrid = Heterogeneous n Hybrid Computing Platforms are made of CPUs +GPUs or Accelerators or FPGA n Examples of Vendors : n NVidia : GPUs n AMD : Accelerators and GPUs n Intel : Accelerators Xeon Phi n FPGA : Convey, Maxeler, SRC n n n

5 + Hybrid Computing - CPU+GPU Many Cores NVidia Cuda DP Gflops Cores Mhz GB SMP S C K20X

6 + Programming GPU Environment n CUDA n Cuda 5 drivers n Cuda SDK, cuda compilers, debuggers, profilers n Cuda Toolkits : samples n Libraries : cufft, cublas, cusparse n Applications (catalog ~ 300 CUDA/GPU Enabled) n Molecular Dynamics : Amber, Gromacs, Lammps, Namd, Vasp n Computational Chemistry : NW Chem n Computational Structural Mechanics : Abaqus, Ansys n Geophysics : CGG Veritas, Paradigm Echos, Schlumberger WesternGeco n Maths : Matlab, Mathematica, Maple

7 + Programming Environments Evolution - OpenACC n CUDA 2006 OpenCL GPGPU n OpenACC in 2011 n CAPS n PGI n CRAY n Advantages of OpenACC n Preserves the legacy n Incremental Optimization and porting on the GPU/Accelerator n Very Simple to Implement n Looks like OpenMP n Exploit broad Opportunities of Optimizations (fine and coarse grain) n

Domain decomposition Target accelerators / many-cores Coarse Grain Task parallelism Data

8 + Type of Parallelism - Technology granularity Application programmers level Large Grain Message Passing Ex. Domain decomposition Target accelerators / many-cores Coarse Grain Task parallelism Data stream parallelism Dynamic, load balancing oriented Data locality oriented Compilers target Fine Grain Instruction level parallelism SIMD Instructions (SSE,..)

9 + Programming Models OpenMP - OpenACC OpenMP Cuda OpenACC Memory Model Coherent Shared Variables - Private Variables - Global Memory Not Coherent - Shared Local Memory Coherent inside blocks - Global Memory Not Coherent - Shared Local Memory Coherent inside blocks Parallel Constructs SIMD - loops SPMD - regions MIMD - tasks Kernel with hierarchy of - Grid - Block/Warp - Thread Kernel or Parallel Hierarchy of - Gang - Worker - Vector

10 + Trends and Convergence

11 + Hardware and Programming Convergence n Many Cores Adoption n Intel : Sandybridge, MIC n AMD/ATI : Radeon, Fusion n Nvidia : Kepler, Maxwell n OpenACC n Cray n PGI n NVidia n CAPS

12 + Market is growing n The global economy in HPC is growing again (IDC 2011) n 2010 grew by 10%, to reach $9.5 billion n forecasting ~7% growth over the next 5 years n 30% of all HPC sites use Accelerators mostly GPGPUs (IDC) n Top500 list Nov 2012 n #1 Titan@ORNL : 18 PetaFlops system with K20 cores n 3 of the first 10 are Hybrid Computers using Accelerators either Intel Xeon PHI or Nvidia n Accelerators are being adopted by major mainstream vendors n Accelerators are part of the ExaScale Race

13 + Hybrid KAUST 0.5 Petaflops on GPGPU n KAUST awarded CUDA Research Center n GPGPU Computing at KAUST > 0.5 Pflops q Laptops ~ 100 Tflops q Desktops ~ 300Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 64 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops n NEW OpenACC Compilers q CAPS q PGI n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM) n Competences at KAUST KSL and Research Computing

14 + Thank You

15 + Hybrid KAUST 0.65 Petaflops on GPGPU n KAUST awarded CUDA Research Center n GPGPU Computing at KAUST > 0.65 Pflops q Laptops ~ 100 Tflops q Desktops ~ 400Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 32 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops n NEW OpenACC Compilers q CAPS q PGI n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM) n Competences at KAUST KSL and Research Computing

16 + Academia to Industry n Hybrid Computing is a big opportunity for KAUST n KAUST has the Critical Mass n to create value in Research and Industry n Develop New Algorithms n Develop New Libraries and Applications n Develop New Knowledge, New Competences n Create Business through Economic Development n CAPS is a good example of transfer from Academia to Industry

17 + OpenACC Training on Jan30th in Building 9 R2220 9:30am n Introduction to GPU computing n CUDA architecture and programming model n OpenACC Overview & compilers n OpenACC Programming n ModelManaging data with OpenACC n OpenACC loop constructs n Asynchronism with OpenACC n OpenACC runtime API

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved