Experiences with GPGPUs at HLRS

Similar documents
Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Technology for a better society. hetcomp.com

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

OP2 FOR MANY-CORE ARCHITECTURES

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

GPU Architecture. Alan Gray EPCC The University of Edinburgh

Execution Models for the Exascale Era

GPU Computing with NVIDIA s new Kepler Architecture

Trends in HPC (hardware complexity and software challenges)

Accelerating Data Warehousing Applications Using General Purpose GPUs

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Mathematical computations with GPUs

Performance Benefits of NVIDIA GPUs for LS-DYNA

Addressing Heterogeneity in Manycore Applications

The GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

Software and Performance Engineering for numerical codes on GPU clusters

Numerical Algorithms on Multi-GPU Architectures

Cuda C Programming Guide Appendix C Table C-

Accelerating sequential computer vision algorithms using commodity parallel hardware

HPC future trends from a science perspective

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

An Introduction to OpenACC

GE Usage & Trends

RWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Exascale Challenges and Applications Initiatives for Earth System Modeling

MAGMA. Matrix Algebra on GPU and Multicore Architectures

An Introduction to the SPEC High Performance Group and their Benchmark Suites

HPC IN EUROPE. Organisation of public HPC resources

FPGA-based Supercomputing: New Opportunities and Challenges

PARALLEL SYSTEMS PROJECT

GPGPU. Alan Gray/James Perry EPCC The University of Edinburgh.

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

HPC with Multicore and GPUs

Inauguration Cartesius June 14, 2013

HPC Enabling R&D at Philip Morris International

IBM CORAL HPC System Solution

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Analysis and Visualization Algorithms in VMD

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

HPC Architectures. Types of resource currently in use

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent

CPU GPU. Regional Models. Global Models. Bigger Systems More Expensive Facili:es Bigger Power Bills Lower System Reliability

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

CUDA Experiences: Over-Optimization and Future HPC

Using Graphics Chips for General Purpose Computation

GPUs and Emerging Architectures

Accelerating Financial Applications on the GPU

Kepler Overview Mark Ebersole

Large scale Imaging on Current Many- Core Platforms

OpenFOAM on GPUs. Thilina Rathnayake R. Department of Computer Science & Engineering. University of Moratuwa Sri Lanka

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Parallel Programming. Libraries and Implementations

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Accelerators in Technical Computing: Is it Worth the Pain?

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

GPU. OpenMP. OMPCUDA OpenMP. forall. Omni CUDA 3) Global Memory OMPCUDA. GPU Thread. Block GPU Thread. Vol.2012-HPC-133 No.

High performance Computing and O&G Challenges

Comparison of CPU and GPGPU performance as applied to procedurally generating complex cave systems

designing a GPU Computing Solution

Thinking Outside of the Tera-Scale Box. Piotr Luszczek

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

Comparison of PRACE prototypes and benchmarks. Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010

Cray XC Scalability and the Aries Network Tony Ford

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

GPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

Vectorisation and Portable Programming using OpenCL

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

ANSYS HPC Technology Leadership

How to Write Code that Will Survive the Many-Core Revolution

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

Illinois Proposal Considerations Greg Bauer

OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4

n N c CIni.o ewsrg.au

GPU Acceleration of a. Theoretical Particle Physics Application

STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC. Stefan Maintz, Dr. Markus Wetzstein

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration

ORAP Forum October 10, 2013

Steve Scott, Tesla CTO SC 11 November 15, 2011

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

Banking on Monte Carlo and beyond

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

Siggraph Asia December 2011

Transcription:

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Experiences with GPGPUs at HLRS Stefan Wesner, Managing Director High Performance Computing Centre Stuttgart :: :: ::

HLRS Context and Challenges ahead PRACE Tier- 0 Centre 1 PF in 2011 4-5 PF in 2013 10 PF in 2015? HPC Service for Industry Research towards Exascale CREST! Na>onal Supercompu>ng centre HPC service for ~100 projects with several hundred users targebng different level of parallelism and disciplines but with a focus on engineering Experiences with GPGPUs at HLRS :: 2

Drivers and Issues for GPGPU adoption @ HLRS Issues Complex codes with long history Legacy codes designed and adapted for dinosaur compubng system architectures High level of innovabon is paralyzing! APIs are too low level or not standardized (protecbon of my investment?) Industrial customers of HLRS demands for stable environments GPGPU ExpectaBons Very high performance High Memory Bandwidth High level of innovabon is excibng! APIs allowing full low level control are excibng! Experiences with GPGPUs at HLRS :: 3

GPGPU Deployment History and Future Starting point: Research activities mostly in the visualization department Initial Deployment on National Resource: Laki Intel Nehalem/Tesla S1070 Hermit1 will be equipped with GPGPUs (2012) Hermit2 PRACE Tier- 0 System will have a visible accelerator share GPGPU research <2008 NEC Cluster Laki 32*S1070 62 TF peak 2008 Cray XE6 Hermit1 Phase1 Step1 ~1PF Peak Q3/2011 Update of Hermit1 with 32 Nodes CRAY XK6 2012 Cray Cascade Hermit2 Phase1 Step2 ~4-5PF Peak 2013 Experiences with GPGPUs at HLRS :: 4

Use- Case: Erosion in Turbine Runners CFD SimulaBon: Ansys CFX Unstructured grid 15.215.488 elements Contact: Florian Niebling, niebling@hlrs.de Dr. Uwe Wössner, woessner@hlrs.de Experiences with GPGPUs at HLRS :: 5

Use- Case: Parallel Surface Extraction (GPU) Parallelization of iso-/cutting surface extraction for interactive post-processing on unstructured grids NVIDIA Fermi GPU: >5x faster than 16 Xeon E5472 MPI MPI MPI Renderer Module 1 Module 1 Module 2 Datamanager Shared Memory GPU MPI Transport Layer Renderer Module 1 Module 1 Module 2 Datamanager Shared Memory GPU Contact: Florian Niebling, niebling@hlrs.de Dr. Uwe Wössner, woessner@hlrs.de Experiences with GPGPUs at HLRS :: 6

Industrial Collaboration: HMI- Tec AI Neuro- Sorter: Enhanced software to analyse, sort and further process written text. Objective: Parallelize using CUDA Existing code based on Boost- library has been rewritten, and optimized for high single core performance. CUDA version tested & compared on NVIDIA Fermi and Tesla. Shows nice speedup up to 30x Only a few weeks of porting effort done as part of a master thesis! Contact: Dr. Rainer Keller, keller@hlrs.de Experiences with GPGPUs at HLRS :: 7

Speedup ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: Industrial Collaboration: HMI- Tec Contact: Dr. Rainer Keller, keller@hlrs.de Factor >20 compared to CPU version (Nehalem) Factor 2 compared to original BOOST based version Speedup: Training phase Pattern: - 3766 words - 3766 input neurons - Vary # inner neurons Data: Zaheer Ahmed Experiences with GPGPUs at HLRS :: 8

Summary of experiences and derived next steps For communities with well developed open source or ISV applications GPGPU deliver already today benefit in time to result and/or flops per Euro à GPGPUs are part of the HLRS offer of academic and industrial users New application areas, in particular if the starting point is non- parallelized code have a high speed- up potential à Seek collaborations with users from academia and industry to leverage this potential Applications combining visualization and computing e.g. interactive or realtime scenarios exploit well the GPGPU architecture What about legacy codes and very huge parallelized applications? à Investigate new emerging programming approaches (HMPP, PGI, Cray Accelerator Compiler) and compare them to CUDA and OpenCL à Large applications needs more stable or standardized environment à Accelerator programming must be more easy for the average developer à Communication between accelerators and host and accelerator must improve Experiences with GPGPUs at HLRS :: 9

THANK YOU! ANY QUESTIONS? Dr. Stefan Wesner wesner@hlrs.de Come and Visit us at the HLRS booth (#134) and at the HPC User Forum 6.-7.10.2011 in Stuttgart Experiences with GPGPUs at HLRS :: 10