Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Similar documents
Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Intro To Parallel Computing. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Intro To Parallel Computing. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Why we need Exascale and why we won t get there by 2020

Why we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory

Cray XC Scalability and the Aries Network Tony Ford

High-Performance Computing - and why Learn about it?

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1

Seagate ExaScale HPC Storage

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Introduction to Parallel Programming for Multicore/Manycore Clusters

Trends in HPC (hardware complexity and software challenges)

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*

ECE 574 Cluster Computing Lecture 2

System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited

HPC Algorithms and Applications

HPC as a Driver for Computing Technology and Education

HPCG UPDATE: SC 15 Jack Dongarra Michael Heroux Piotr Luszczek

Chapter 1. Introduction

Overview of Reedbush-U How to Login

High Performance Linear Algebra

Mathematical computations with GPUs

ECE 574 Cluster Computing Lecture 18

Hybrid Architectures Why Should I Bother?

Overview of HPC and Energy Saving on KNL for Some Computations

Emerging Heterogeneous Technologies for High Performance Computing

European Processor Initiative & RISC-V

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

The Future of High- Performance Computing

Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems

Top500

Towards Exascale Computing with the Atmospheric Model NUMA

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Introduction to Parallel Programming for Multicore/Manycore Clusters

HPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek

2018/9/25 (1), (I) 1 (1), (I)

CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27

Steve Scott, Tesla CTO SC 11 November 15, 2011

PART I - Fundamentals of Parallel Computing

Motivation Goal Idea Proposition for users Study

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization

Overview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25

HPC Technology Trends

CINECA and the European HPC Ecosystem

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Real Parallel Computers

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

DISP: Optimizations Towards Scalable MPI Startup

CSCI 2021: Binary Floating Point Numbers

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

HPC projects. Grischa Bolls

Confessions of an Accidental Benchmarker

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Fra superdatamaskiner til grafikkprosessorer og

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

INSPUR and HPC Innovation

Chapter 5 Supercomputers

HPC Technology Update Challenges or Chances?

Oak Ridge National Laboratory Computing and Computational Sciences

A unified Energy Footprint for Simulation Software

Introduction to Parallel Programming for Multicore/Manycore Clusters

Overview of Tianhe-2

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Parallel Programming

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

High Performance Computing in Europe and USA: A Comparison

Deutscher Wetterdienst

Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P

Managing Hardware Power Saving Modes for High Performance Computing

CS240A: Applied Parallel Computing. Introduction

Future Trends in High Performance Computing

Mapping MPI+X Applications to Multi-GPU Architectures

Overview. Introduction to Parallel Computing CIS 410/510 Department of Computer and Information Science. Lecture 1 Overview

Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构

Trends in HPC Architectures and Parallel

Timothy Lanfear, NVIDIA HPC

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

High-Performance Computing & Simulations in Quantum Many-Body Systems PART I. Thomas Schulthess

Inauguration Cartesius June 14, 2013

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

Fujitsu s Technologies to the K Computer

Transcription:

Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want to orient you a bit before parachuting you down into the trenches to deal with OpenACC. The plan is that you walk away with a knowledge of not just OpenACC, but also where it fits into the world of High Performance Computing.

FLOPS we need: Climate change analysis Simulations Cloud resolution, quantifying uncertainty, understanding tipping points, etc., will drive climate to exascale platforms New math, models, and systems support will be needed Courtesy Horst Simon, LBNL Extreme data Reanalysis projects need 100 more computing to analyze observations Machine learning and other analytics are needed today for petabyte data sets Combined simulation/observation will empower policy makers and scientists

Qualitative Improvement of Simulation with Higher Resolution (2011) Michael Wehner, Prabhat, Chris Algieri, Fuyu Li, Bill Collins, Lawrence Berkeley National Laboratory; Kevin Reed, University of Michigan; Andrew Gettelman, Julio Bacmeister, Richard Neale, National Center for Atmospheric Research

Exascale combustion simulations Goal: 50% improvement in engine efficiency Center for Exascale Simulation of Combustion in Turbulence (ExaCT) Combines MS and experimentation Uses new algorithms, programming models, and computer science Courtesy Horst Simon, LBNL

Modha Group at IBM Almaden Mouse Rat Cat Monkey Human N: 16 x 10 6 56 x 10 6 763 x 10 6 2 x 10 9 22 x 10 9 S: 128 x 10 9 448 x 10 9 6.1 x 10 12 20 x 10 12 220 x 10 12 Almaden Watson WatsonShaheen LLNL Dawn Recent simulations achieve unprecedented scale of 65*10 9 neurons and 16*10 12 synapses LLNL Sequoia BG/L BG/L BG/P BG/P BG/Q December, 2006 April, 2007 March, 2009 May, 2009 June, 2012 Courtesy Horst Simon, LBNL

Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data

Moore s Law is not at all dead Intel process technology capabilities High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018 Feature Size 90nm 65nm 45nm 32nm 22nm 16nm 11nm 8nm Integration Capacity (Billions of Transistors) 2 4 8 16 32 64 128 256 50nm Transistor for 90nm Process Source: Intel Influenza Virus Source: CDC

That Power and Clock Inflection Point in 2004 didn t get better. Source: Kogge and Shalf, IEEE CISE Courtesy Horst Simon, LBNL

Not a new problem, just a new scale Power W) Cray-2 with cooling tower in foreground, circa 1985

And how to get more performance from more transistors with the same power. A 15% Reduction In Voltage Yields RULE OF THUMB Frequency Power Performance Reduction Reduction Reduction 15% 45% 10% SINGLE CORE DUAL CORE Area = 1 Voltage = 1 Freq = 1 Power = 1 Perf = 1 Area = 2 Voltage = 0.85 Freq = 0.85 Power = 1 Perf = ~1.8

Parallel Computing One woman can make a baby in 9 months. Can 9 woman make a baby in 1 month? But 9 women can make 9 babies in 9 months. First two bullets are Brook s Law. From The Mythical Man-Month.

Prototypical Application: Serial Weather Model MEMORY

First Parallel Weather Modeling Algorithm: Richardson in 1917 Courtesy John Burkhardt, Virginia Tech

Weather Model: Shared (OpenMP) Core Core Core Core MEMORY Four meterologists in the same room sharing the map.

Weather Model: Distributed (MPI) MEMORY 50 meterologists using telegraphs.

Weather Model: Accelerator (OpenACC) GPU PCI Bus GPU 1 meterologist coordinating 1000 savants using tin cans and a string.

The pieces fit like this OpenACC MPI OpenMP

Top 10 Systems as of November 2017 # Site Manufacturer Computer Interconnect [Accelerator] Cores Rmax (Tflops) Rpeak (Tflops) Power (MW) 1 National Super Computer Center in Guangzhou China NRCPC Sunway TaihuLight Sunway SW26010 260C 1.45GHz 10,649,600 93,014 125,435 15.3 OpenACC is a first class API! 2 National Super Computer Center in Guangzhou China NUDT Tianhe-2 (MilkyWay-2) Intel Xeon E5-2692 2.2 GHz TH Express-2 Intel Xeon Phi 31S1P 3,120,000 33,862 54,902 17.8 3 Swiss National Supercomputing Centre (CSCS) Switzerland Cray Piz Daint Cray XC50 Xeon E5-2690 2.6 GHz Aries NVIDIA P100 361,760 19,590 25,326 2.2 4 Japan Agency for Marine-Earth Science Japan ExaScaler Gyoukou Xeon D-1571 1.3GHz Infiniband EDR 19,860,000 19,135 28,192 1.3 5 DOE/SC/Oak Ridge National Laboratory United States Cray Titan Cray XK7 Opteron 6274 2.2 GHz Gemini NVIDIA K20x 560,640 17,590 27,112 8.2 6 DOE/NNSA/LLNL United States IBM Sequoia BlueGene/Q Power BQC 1.6 GHz Custom 1,572,864 17,173 20,132 7.8 7 DOE/NNSA/LANL/SNL United States Cray Trinity Cray XC40 Xeon E5-2698v3 2.3 GHz Aries Intel Xeon Phi 7250 979,968 17,173 20,132 7.8 8 DOE/SC/LBNL/NERSC United States Cray Cori Cray XC40 Aries Intel Xeon Phi 7250 622,336 14,014 27,880 3.9 9 Joint Center for Advanced High Performance Computing Japan Fujitsu Oakforest Primergy Intel OPA Intel Xeon Phi 7250 556,104 13,554 24,913 2.7 10 RIKEN Advanced Institute for Fujitsu K Computer SPARC64 VIIIfx 2.0 GHz 705,024 10,510 11,280 12.6

Courtesy Horst Simon, LBNL We can do better. We have a role model. Straight forward extrapolation results in a real time human brain scale simulation at about 1-10 Exaflop/s with 4 PB of memory Current predictions envision Exascale computers in 2020 with a power consumption of at best 20-30 MW The human brain takes 20W Even under best assumptions in 2020 our brain will still be a million times more power efficient

Why you should be (extra) motivated. This parallel computing thing is no fad. The laws of physics are drawing this roadmap. If you get on board (the right bus), you can ride this trend for a long, exciting trip. Let s learn how to use these things!