Timothy Lanfear, NVIDIA HPC

Similar documents
GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Steve Scott, Tesla CTO SC 11 November 15, 2011

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear

GPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA

Accelerating High Performance Computing.

GPUS FOR NGVLA. M Clark, April 2015

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

CUDA. Matthew Joyner, Jeremy Williams

HPC Technology Trends

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research

Mathematical computations with GPUs

GPU Computing. Axel Koehler Sr. Solution Architect HPC

Challenges for Future Computing Systems. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Stan Posey, NVIDIA, Santa Clara, CA, USA

NVIDIA GPU TECHNOLOGY UPDATE

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

Present and Future Leadership Computers at OLCF

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

OLCF's next- genera0on Spider file system

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Leadership Computing Directions at Oak Ridge National Laboratory: Navigating the Transition to Heterogeneous Architectures

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

The Era of Heterogeneous Computing

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU. Peng Wang HPC Developer Technology

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Lecture Topic Projects

n N c CIni.o ewsrg.au

Peter Messmer Developer Technology Group Stan Posey HPC Industry and Applications

Hybrid Architectures Why Should I Bother?

Running the FIM and NIM Weather Models on GPUs

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

TESLA V100 PERFORMANCE GUIDE May 2018

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

The Future of GPU Computing

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

High performance computing and numerical modeling

An Introduction to OpenACC

ECE 8823: GPU Architectures. Objectives

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis

HPC Algorithms and Applications

CME 213 S PRING Eric Darve

John Levesque Nov 16, 2001

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY. Peter Messmer

HIGH-PERFORMANCE COMPUTING

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Intel Many Integrated Core (MIC) Architecture

GPU Computing with NVIDIA s new Kepler Architecture

Fundamental CUDA Optimization. NVIDIA Corporation

NVIDIA S VISION FOR EXASCALE. Cyril Zeller, Director, Developer Technology

GPU Architecture. Alan Gray EPCC The University of Edinburgh

Pedraforca: a First ARM + GPU Cluster for HPC

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

The Mont-Blanc approach towards Exascale

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

IBM CORAL HPC System Solution

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen

Fra superdatamaskiner til grafikkprosessorer og

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

How HPC Hardware and Software are Evolving Towards Exascale

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

arxiv: v1 [physics.comp-ph] 4 Nov 2013

Energy-efficient acceleration of task dependency trees on CPU-GPU hybrids

Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2

Petascale Computing Research Challenges

Programming GPUs with CUDA. Prerequisites for this tutorial. Commercial models available for Kepler: GeForce vs. Tesla. I.

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

The End of Denial Architecture and The Rise of Throughput Computing

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

UCX: An Open Source Framework for HPC Network APIs and Beyond

State-of-the-art in Heterogeneous Computing

Building NVLink for Developers

CUDA Update: Present & Future. Mark Ebersole, NVIDIA CUDA Educator

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Transcription:

GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC

Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision engines. Predictive calculations for thermonuclear and core-collapse supernovae, allowing confirmation of theoretical models. Comprehensive Earth System Model at 1KM scale, enabling modeling of cloud convection and ocean eddies. Coupled simulation of entire cells at molecular, genetic, chemical and biological levels.

10 EF 1 EF Exaflop Expectations First Exaflop Computer 100 PF 10 PF Titan 8.2 MW 1 PF 100 TF 10 TF Sum N=1 CM5 ~200 KW 1 TF 100 GF 10 GF N=500 Growing size, cost and power 1 GF 100 MF

Power: This Time It s Different In the Good Old Days Leakage was not important, and voltage scaled with feature size L = L/2 V = V/2 E = CV 2 = E/8 f = 2f D = 1/L 2 = 4D P = P Halve L and get 4x the transistors and 8x the capability for the same power! MF to GF to TF and almost to PF Technology was giving us 68% per year in perf/w! Processors realized ~50% per year in perf/w (spent it on single thread performance) The New Reality Leakage has limited threshold voltage, largely ending voltage scaling Halve L and get 2x the capability for the same power. At constant voltage, technology gives us only 19% per year in perf/w

The High Cost of Data Movement Fetching operands costs more than computing on them 64-bit DP 20 pj 26 pj 256 pj 256-bit access 8 kb SRAM 50 pj 256 bits 16 nj DRAM Rd/Wr 500 pj Efficient off-chip link 1 nj 20mm 28nm IC Relative cost grows with each generation Wire delay (ps/mm) not improving

HPC is Going Hybrid x86 CPU Fast single threads (serial work) PCIe Sandy Bridge 32nm 690 pj/flop GPU Extreme power-efficiency (throughput work) Kepler 28nm 132 pj/flop Do most work by cores optimized for extreme energy efficiency Still need a few cores optimized for fast serial work PCIe Xeon (AMD Fusion too) Intel MIC

Kepler Generation of GPUs Tesla K10 Tesla K20 Dual GK104 GPUs 3x Single Precision Video, Signal, Life Sciences, Seismic GK110 GPU 3x Double Precision CFD, FEA, Finance, Physics, etc.

Overarching Goals for Tesla Power Efficiency Ease of Programming And Portability Application Space Coverage

GK110 GPU KEPLER THE WORLD S FASTEST, MOST EFFICIENT HPC ACCELERATOR SMX Hyper-Q Dynamic Parallelism (power efficiency) (programmability and application coverage)

Titan: World s #1 Supercomputer 18,688 Tesla K20X GPUs 27 Petaflops

Flagship Scientific Applications on Titan Material Science (WL-LSMS) Role of material disorder, statistics, and fluctuations in nanoscale materials and systems. Climate Change (CAM-SE) Answer questions about specific climate change adaptation and mitigation scenarios; realistically represent features like precipitation patterns/statistics and tropical storms. Biofuels (LAMMPS) A multiple capability molecular dynamics code. Astrophysics (NRDF) Radiation transport critical to astrophysics, laser fusion, combustion, atmospheric dynamics, and medical imaging. Combustion (S3D) Combustion simulations to enable the next generation of diesel/biofuels to burn more efficiently. Nuclear Energy (Denovo) Unprecedented high-fidelity radiation transport calculations that can be used in a variety of nuclear energy and technology applications.

Kepler GPU Performance Results Dual-socket comparison: CPU-GPU node vs. Dual-CPU node CPU = 8 core SandyBridge E5-2687w 3.10 GHz Chroma SPECFEM3D AMBER WS-LSMS NAMD Single-CPU+K20X Single-CPU+M2090 Dual-CPU Single-CPU+K20X Single-CPU+M2090 Dual-CPU Single-CPU+K20X Single-CPU+M2090 Dual-CPU Single-CPU+K20X Single-CPU+M2090 Dual-CPU Single-CPU+K20X Single-CPU+M2090 Dual-CPU 1.00 1.00 1.00 1.00 1.00 1.80 1.71 2.73 3.46 4.40 5.41 7.17 8.00 8.85 10.20 0 1 2 3 4 5 6 7 8 9 10 11

What Does The Future Hold?

The Future of HPC Programming Computers are not getting faster just wider Need to structure all HPC apps as throughput problems Locality within nodes much more important Need to expose locality (programming model) & explicitly manage memory hierarchy (compiler, runtime, autotuner) How can we enable programmers to code for future processors in a portable way?

Evolution of GPUs This Decade Integration (memory, processor types, network) Further concentration on locality (both HW and SW) Reducing overheads (intra-node, inter-node) Continued convergence with consumer technology

DP GFLOPS per Watt GPU Roadmap 32 16 8 4 Kepler Dynamic Parallelism Maxwell Unified Virtual Memory Volta Stacked DRAM 2 Fermi FP64 1 0.5 Tesla CUDA 2008 2010 2012 2014

The Future of HPC is Green Power is the constraint Vast majority of work must be done by cores designed for efficiency GPU computing has a sustainable model Aligned with technology trends, supported by consumer markets Future evolution will focus on: Integration (CPU, network, memory) Increased generality efficient on any code with high parallelism This is simply how computers will be built