Mathematical computations with GPUs

Similar documents
Mathematical computations with GPUs

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

Timothy Lanfear, NVIDIA HPC

CUDA. Matthew Joyner, Jeremy Williams

High Performance Computing. What is it used for and why?

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Trends in HPC (hardware complexity and software challenges)

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

The Mont-Blanc approach towards Exascale

HPC Algorithms and Applications

High-Performance Scientific Computing

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Hybrid Architectures Why Should I Bother?

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

High Performance Computing. What is it used for and why?

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Supercomputers. Alex Reid & James O'Donoghue

GPUS FOR NGVLA. M Clark, April 2015

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

Steve Scott, Tesla CTO SC 11 November 15, 2011

The Era of Heterogeneous Computing

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Pedraforca: a First ARM + GPU Cluster for HPC

SATGPU - A Step Change in Model Runtimes

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear

Fra superdatamaskiner til grafikkprosessorer og

Fujitsu s Technologies to the K Computer

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

NVIDIA GPUs in Earth System Modelling Thomas Bradley

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen

HPC Issues for DFT Calculations. Adrian Jackson EPCC

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

The Future of GPU Computing

Experiences of the Development of the Supercomputers

GPU for HPC. October 2010

Using Graphics Chips for General Purpose Computation

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1

An Overview of High Performance Computing and Challenges for the Future

HPC Technology Update Challenges or Chances?

Towards Exascale Computing with the Atmospheric Model NUMA

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

On the Path to Energy-Efficient Exascale: A Perspective from the Green500

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

HPC as a Driver for Computing Technology and Education

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL:

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen

Jack Dongarra University of Tennessee Oak Ridge National Laboratory

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

ELEC 5200/6200. Computer Architecture & Design. Victor P. Nelson Broun 326

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction

Digital Signal Processor Supercomputing

China's supercomputer surprises U.S. experts

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

Status and Directions of NVIDIA GPUs for Earth System Modeling

n N c CIni.o ewsrg.au

Experiences with GPGPUs at HLRS

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*

Vectorisation and Portable Programming using OpenCL

Managing Hardware Power Saving Modes for High Performance Computing

The Mont-Blanc Project

INSPUR and HPC Innovation

European energy efficient supercomputer project

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

Confessions of an Accidental Benchmarker

INSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM

Parallel Compact Roadmap Construction of 3D Virtual Environments on the GPU

NVIDIA GPU TECHNOLOGY UPDATE

Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems

CS 668 Parallel Computing Spring 2011

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers

Computational Challenges and Opportunities for Nuclear Astrophysics

CS5222 Advanced Computer Architecture. Lecture 1 Introduction

High-Performance Computing - and why Learn about it?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

An Introduction to OpenACC

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

CUDA Experiences: Over-Optimization and Future HPC

Technology for a better society. hetcomp.com

General Purpose GPU Computing in Partial Wave Analysis

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

There s STILL plenty of room at the bottom! Andreas Olofsson

Building supercomputers from commodity embedded chips

HPC with Multicore and GPUs

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Transcription:

Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University

How to.. Process terabytes of seismic data on desktop PC. Estimate tsunami impact faster then wave reaches dry earth. Rasterize 3D scenes in real-time.

Topics of the course architecture of a GPU and its difference from generalpurpose CPU; general notions related to CUDA; contents and functionality of GPU-optimized libraries; area of applicability of GPUs for solving scientific and applied problems; general principles of optimizing programs for GPUs; methods of debugging and profiling programs running on GPUs; tools and instruments for developing, debugging and profiling programs running on GPUs.

Performance Computer power «Titan» 2012 299 008 Opteron Cores 18 688 k20 NVIDIA GPU 20+ Pflops «K computer» 2011 68 544 x 8-core SPARC64 VIIIfx. 8,162 Pflops 280 Tflops 212,992 CPUs Time

Growth of performance Growth of clock rate Number of cores/cpus CPU architecture complexity Number of registers Pipe-line Digital capacity (4bit CPU, 16, 32, 64) etc

Computer graphics

Computer graphics Small vectors Small matrixes filters/post-processing Calculation of projections etc. Huge mumber of identical small operations.

Performance of video cards

March 2013 Kepler: 28nm, 1.4 Tflops DP Maxwell: 22nm, 4 Tflops DP

Memory bandwidth

Heterogeneous systems + 3 of 5 leaders in top500 list!

Heterogeneous vs. Homogeneous computing systems (06.2012) Tianhe-1A (2 place) Xeon X5670 6C 2.93 GHz, NVIDIA 2050 Jaguar (3 place) Opteron 6-core 2.6 GHz 4,7 Pflops (peak) 4040 kw 2,3 Pflops (peak) 6950 kw

Toward exaflop Direct scaling Tianhe-1A 4,7 Pflops (peak) * 213 = 1001,1 Pflops 4040 kw * 213 = 860 MW Jaguar 2,3 Pflops (peak) * 435 = 1000,5 Pflops 6950 kw * 435 = 3023 MW exa 10 18 peta 10 15 tera 10 12 giga 10 9 mega 10 6 kilo 10 3

Sayano Shushenskaya Dam Turbines Installed capacity Maximum capacity Annual generation 10 640 MW (initial) 6 640 MW (current) 5,120 MW (current) 6,400 MW 23.5 TW

GFLOPs per Watt Green500 (November 2013) www.green500.org/lists List of most power effective supercomputer Ten of top have NVIDIA GPUs. Two 6 month ago. Performance / Watt 4,5 now 0,3 November, 2007 Reaching exascale means boosting speeds by 50-100 times, while keeping power relatively static.

Application speedup Example Applications URL Application Speedup Seismic Database http://www.headwave.com 66x to 100x Mobile Phone Antenna Simulation http://www.acceleware.com Molecular Dynamics http://www.ks.uiuc.edu/research/vmd 45x 21x to 100x Neuron Simulation http://www.evolvedmachines.com 100x MRI processing http://bic-test.beckman.uiuc.edu 245x to 415x Atmospheric Cloud Simulation http://www.cs.clemson.edu/~jesteel/clouds.h tml 50x

Approaches to GPU programming Application Optimized libraries Compiler directives Programming languages (С/С++/FORTRAN) Fast development Maximum performance

CUDA Roadmap 2014 7 years! CUDA 1.0 2007 CUDA 2.0 2008 CUDA 3.0 2009 CUDA 4.0 2011 CUDA 5.0 2012 CUDA 5.5 2013 CUDA 6.0 2014

CUDA vs. OpenCL CUDA NVIDIA GPU (Cray, HP, IBM, T-Platforms, NextIO ) Close to peak performance Functionality Comfort for developers (debugger, performance analyzer, etc.) Support Teaching materials, libraries OpenCL Architecture is not fixed, universality Performance is not high priority

CUDA vs. OpenCL performance CUDA applications have up to 30% higher performance than OpenCL application. http://arxiv.org/ftp/arxiv/papers/1005/1005.2581.pdf http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumb er=6047190

Useful links http://www.nvidia.com/object/cuda_home.html http://www.gputechconf.com/page/home.html http://docs.nvidia.com/

Questions