+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics to Multi-Computing - The Needs n Ecosystem SW HW n Trends and Convergence - Market n ManyCores and OpenACC n Economics n Kaust Examples n Acoustics n Electromagnetics n From Academia to Industry The Opportunity@KAUST n OpenACC Training on Jan30th B9-R2220 9:30am
+ From Multi Physics Multi Scales to Hybrid-Computing MultiPhysics Fracture Simulation Reservoir Modelling Aerosol Maths and Discretization Partial Derivatives Equations Volume Integral Equations Programing Models Multi Computing OpenACC fine grain OpenMP coarse grain MPI - large grain CPUs GPGPU Accelerators FPGA Fractures Simulation are Compute Intensive. Accelerators can absorb the peak needs, can OpenACC help to use it?
+ Hybrid Computing Platforms n Hybrid = Heterogeneous n Hybrid Computing Platforms are made of CPUs +GPUs or Accelerators or FPGA n Examples of Vendors : n NVidia : GPUs n AMD : Accelerators and GPUs n Intel : Accelerators Xeon Phi n FPGA : Convey, Maxeler, SRC n http://www.conveycomputer.com/ n http://www.maxeler.com/ n http://www.srccomp.com/
+ Hybrid Computing - CPU+GPU Many Cores NVidia Cuda DP Gflops Cores Mhz GB SMP S1070 1.3 345 192 1200 4 30 C2075 2.0 515 448 1600 5.3 14 K20X 3.5 1310 2688 730 6.1 14
+ Programming GPU Environment n CUDA n Cuda 5 drivers n Cuda SDK, cuda compilers, debuggers, profilers n Cuda Toolkits : samples n Libraries : cufft, cublas, cusparse n Applications (catalog ~ 300 CUDA/GPU Enabled) n Molecular Dynamics : Amber, Gromacs, Lammps, Namd, Vasp n Computational Chemistry : NW Chem n Computational Structural Mechanics : Abaqus, Ansys n Geophysics : CGG Veritas, Paradigm Echos, Schlumberger WesternGeco n Maths : Matlab, Mathematica, Maple http://www.nvidia.com/docs/io/123576/nv-applications-catalog-lowres.pdf
+ Programming Environments Evolution - OpenACC n CUDA 2006 OpenCL 2008 - GPGPU n OpenACC in 2011 n CAPS n PGI n CRAY n Advantages of OpenACC n Preserves the legacy n Incremental Optimization and porting on the GPU/Accelerator n Very Simple to Implement n Looks like OpenMP n Exploit broad Opportunities of Optimizations (fine and coarse grain) n http://www.openacc-standard.org/
+ Type of Parallelism - Technology granularity Application programmers level Large Grain Message Passing Ex. Domain decomposition Target accelerators / many-cores Coarse Grain Task parallelism Data stream parallelism Dynamic, load balancing oriented Data locality oriented Compilers target Fine Grain Instruction level parallelism SIMD Instructions (SSE,..)
+ Programming Models OpenMP - OpenACC OpenMP Cuda OpenACC Memory Model Coherent Shared Variables - Private Variables - Global Memory Not Coherent - Shared Local Memory Coherent inside blocks - Global Memory Not Coherent - Shared Local Memory Coherent inside blocks Parallel Constructs SIMD - loops SPMD - regions MIMD - tasks Kernel with hierarchy of - Grid - Block/Warp - Thread Kernel or Parallel Hierarchy of - Gang - Worker - Vector
+ Trends and Convergence
+ Hardware and Programming Convergence n Many Cores Adoption n Intel : Sandybridge, MIC n AMD/ATI : Radeon, Fusion n Nvidia : Kepler, Maxwell n OpenACC n Cray n PGI n NVidia n CAPS
+ Market is growing n The global economy in HPC is growing again (IDC 2011) n 2010 grew by 10%, to reach $9.5 billion n forecasting ~7% growth over the next 5 years n 30% of all HPC sites use Accelerators mostly GPGPUs (IDC) n Top500 list Nov 2012 n #1 Titan@ORNL : 18 PetaFlops system with 261000 K20 cores n 3 of the first 10 are Hybrid Computers using Accelerators either Intel Xeon PHI or Nvidia n Accelerators are being adopted by major mainstream vendors n Accelerators are part of the ExaScale Race
+ Hybrid Computing @ KAUST 0.5 Petaflops on GPGPU n KAUST awarded CUDA Research Center n GPGPU Computing at KAUST > 0.5 Pflops q Laptops ~ 100 Tflops q Desktops ~ 300Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 64 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops n NEW OpenACC Compilers q CAPS q PGI n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM) n Competences at KAUST KSL and Research Computing
+ Thank You
+ Hybrid Computing @ KAUST 0.65 Petaflops on GPGPU n KAUST awarded CUDA Research Center n GPGPU Computing at KAUST > 0.65 Pflops q Laptops ~ 100 Tflops q Desktops ~ 400Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 32 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops n NEW OpenACC Compilers q CAPS q PGI n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM) n Competences at KAUST KSL and Research Computing
+ Academia to Industry n Hybrid Computing is a big opportunity for KAUST n KAUST has the Critical Mass n to create value in Research and Industry n Develop New Algorithms n Develop New Libraries and Applications n Develop New Knowledge, New Competences n Create Business through Economic Development n CAPS is a good example of transfer from Academia to Industry
+ OpenACC Training on Jan30th in Building 9 R2220 9:30am n Introduction to GPU computing n CUDA architecture and programming model n OpenACC Overview & compilers n OpenACC Programming n ModelManaging data with OpenACC n OpenACC loop constructs n Asynchronism with OpenACC n OpenACC runtime API