High-Performance Computing - and why Learn about it?

Similar documents
HPC as a Driver for Computing Technology and Education

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Cray XC Scalability and the Aries Network Tony Ford

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Top500

ECE 574 Cluster Computing Lecture 2

Trends in HPC (hardware complexity and software challenges)

Chapter 1. Introduction

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Overview of HPC and Energy Saving on KNL for Some Computations

HPC Technology Trends

HPC Technology Update Challenges or Chances?

HPC future trends from a science perspective

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Roadmapping of HPC interconnects

System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Seagate ExaScale HPC Storage

PART I - Fundamentals of Parallel Computing

Why we need Exascale and why we won t get there by 2020

Emerging Heterogeneous Technologies for High Performance Computing

Oak Ridge National Laboratory Computing and Computational Sciences

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen

Overview of Reedbush-U How to Login

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Why we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory

Overview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

Towards Exascale Computing with the Atmospheric Model NUMA

The Mont-Blanc approach towards Exascale

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*

2018/9/25 (1), (I) 1 (1), (I)

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Hybrid Architectures Why Should I Bother?

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

Mathematical computations with GPUs

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27

Confessions of an Accidental Benchmarker

Interconnect Your Future

CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

HPC Algorithms and Applications

An Overview of High Performance Computing

Intro To Parallel Computing. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

The Future of High- Performance Computing

Mapping MPI+X Applications to Multi-GPU Architectures

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

IBM HPC Technology & Strategy

Present and Future Leadership Computers at OLCF

ENDURING DIFFERENTIATION. Timothy Lanfear

ENDURING DIFFERENTIATION Timothy Lanfear

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

Real Parallel Computers

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

HPC Architectures. Types of resource currently in use

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

What have we learned from the TOP500 lists?

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Supercomputers. Alex Reid & James O'Donoghue

Introduction to Parallel Programming for Multicore/Manycore Clusters

ADVANCED COMPUTER ARCHITECTURES

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Advances of parallel computing. Kirill Bogachev May 2016

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Fabio AFFINITO.

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation

Fujitsu s Approach to Application Centric Petascale Computing

Steve Scott, Tesla CTO SC 11 November 15, 2011

Timothy Lanfear, NVIDIA HPC

Lecture 1. Introduction Course Overview

Making a Case for a Green500 List

An approach to provide remote access to GPU computational power

Parallel Programming

Interconnect Your Future

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Building supercomputers from commodity embedded chips

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Stockholm Brain Institute Blue Gene/L

Transcription:

High-Performance Computing - and why Learn about it? Tarek El-Ghazawi The George Washington University Washington D.C., USA

Outline What is High-Performance Computing? Why is High-Performance Computing Important? Advances in Performance and Architectures Heterogeneous Accelerated Computing Advances in Parallel Programming Making Progress: The HPCS Program, near-term Making Progress: Exascale and DOE Conclusions 2

What is Supercomputing and Parallel Architectures? Also called High-Performance Computing and Parallel Computing Research and innovation in architecture, programming and applications associated with computer systems that are orders of magnitude faster (10X- 1000X or more) than modern desktop and laptop computers Supercomputers achieve speed through massive parallelism- Parallel Architectures! E.g. many processors working together http://www.collegehumor.com/video:1828443 3

Outline What is High-Performance Computing? Why is High-Performance Computing Important? Advances in Performance and Architectures Hardware Accelerators and Accelerated Computing Advances in Parallel Programming What is Next: The HPCS Program, near-term What is Next: Exascale and DARPA UHPC Conclusions 4

Why is HPC Important? Critical for economic competitiveness because of its wide applications (through simulations and intensive data analyses) Drives computer hardware and software innovations for future conventional computing Is becoming ubiquitous, i.e. all computing/information technology is turning into Parallel!! Is that why it is turning into an international HPC muscle flexing contest? 5

Why is HPC Important? Design Build Test Design Model Simulate Build 6

Why is HPC Important? National and Economic Competitiveness Molecular Dynamics Gene Sequence Alignment HIV-1 Protease Inhibitor Drug Simulation for 2ns: 2 weeks on a desktop 6 hours on a supercomputer HPC Application Examples Phylogenetic Analysis: 32 days on desktop 1.5 hrs supercomputer Car Crash Simulations Understanding Fundamental Structure of Matter 2 million elements simulation: 4 days on a desktop 25 minutes on a supercomputer Requires a billionbillion calculations per second 7

Why is HPC Important? National and Economic Competitiveness Industrial competitiveness Computational models that can run on HPC are only for the design of NASA space shuttles, but they can also help with Business Intelligence (e.g. IBM) and Watson Designing effective shapes and/or material for Potato Chips Clorox Bottles 8

HPC Technology of Today is Conventional Computing of Tomorrow: Multi/Many-cores in Desktops and Laptops Intel 80 Core Chip 1 Chip and 1 TeraFLOPs in 2007 The ASCI Red Supercomputer 9000 chips for 3 TeraFLOPs in 1997 Intel 72 Core Chip Xeon Phi KNL 1 Chip and 3 TeraFLOPs in 2016 9

Why is HPC Important?HPC is Ubiquitous Sony PS3 iphone 7 4 Cores 2.34 GHz HPC is Ubiquitous! All Computing is becoming HPC, Can we become Uses the Cell Processors! bystanders? The Road Runner: Was Fastest Supercomputer in 08 Uses Cell Processors! Xeon Phi KNL: A 72 CPU Chip 10

Why this is happening? - The End of Moore s Law in Clocking The phenomenon of exponential improvements in processors was observed in 1979 by Intel co-founder Gordon Moore The speed of a microprocessor doubles every 18-24 months, assuming the price of the processor stays the same Wrong, not anymore! The price of a microchip drops about 48% every 18-24 months, assuming the same processor speed and on chip memory capacity Ok, for Now The number of transistors on a microchip doubles every 18-24 months, assuming the price of the chip stays the same Ok, for Now 11

No faster clocking but more Cores? Source: Ed Davis, Intel 12

Cores and Power Efficiency Source: Ed Davis, Intel 13

Comparative View of Processors and Accelerators Fabrication Process nm Freq GHz # Cores Peak FP Performance SPFP GFlops DPFP GFlops Peak Power W DP Flops/W BW GB/s Memory Memory type PowerXCell 8i 65 3.2 1 + 8 204 102.4 92 1.11 25.6 XDR NVidia Fermi Tesla M2090 40 1.3 512 1330 665 225 2.9 177 GDDR5 Nvidia Kepler K20X NVIDIA Kepler K80 Intel Xeon Phi 5110P (KNC) Intel Xeon Phi 7290 (KNL) Intel Xeon E7-8870 AMD Opteron 6176 SE Xilinx V6 SX475T Altera Stratix V GSB8 28 0.73 2688 3950 1310 235 5.6 250 GDDR5 28 0.88 2x2496 8749 2910 300 9.7 480 GDDR5 22 1.05 14 1.7 32 2.4 (2.8) 60 (240 threads) 72 (288 threads) - 1011 225 4.5 320 GDDR5 - ~3500 245 14.3 115.2 DDR4 10 202.6 101.3 130 0.78 42.7 45 2.5 12 240 120 140 0.86 42.7 DDR3-1333 DDR3-1333 40 - - - 98.8 50 3.3 - - 28 - - - 210 60 3.5 - - 14

Most Power Efficient Architectures: Green 500 https://www.top500.org/green500/lists/2016/11/ 15

Outline What is High-Performance Computing? Why is High-Performance Computing Important? Advances in Performance and Architectures Heterogeneous Accelerated Computing Advances in Parallel Programming What is Next: The HPCS Program, near-term What is Next: Exascale and DoE Conclusions 16

How the Supercomputing Race is Conducted? TOP500 Supercomputers and LINPACK Top500 in November and in June Rmax - Maximal LINPACK performance achieved Rpeak - Theoretical peak performance In the TOP500 List table, the computers are ordered first by their Rmax value In the case of equal performances (Rmax value) for different computers, order is by Rpeak For sites that have the same performance, the order is by memory size and then alphabetically Check www.top500.org for more information 17

Top 10 Supercomputers: November 2016 www.top500.org Rank Countr y Site Computer # Cores R max (PFlops) 1 National Supercomputing Center in Wuxi China Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway NRCPC 10,649,60 0 93.0 2 National University of Defense Technology China 3 Oak Ridge National Laboratory Tianhe-2 (MilkyWay-2) - TH- IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P Titan Cray XK7, Opteron 16 Cores, 2.2GHz, Gemini, Nvidia K20X 3,120,000 33.9 560,640 17.6 4 Lawrence Livermore National Laboratory Sequoia BlueGene/Q, Power BQC 16 Cores, Custom interconnection 1,572,86 4 16.3 5 DOE/SC/LBNL/NERSC United States Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect Cray Inc. 622,336 14.0 18

Top 10 Supercomputers: November 2016 www.top500.org Rank Country Site Computer # Cores R max (PFlop s) 6 Joint Center for Advanced High Performance Computing Japan Oakforest-PACS - PRIMERGY CX1640 M1, Intel Xeon Phi 7250 68C 1.4GHz, Intel Omni- Path, Fujitsu 556,10 4 13.6 7 RIKEN Advanced Institute for Computational Science K Computer SPARC64 VIIIfx 2.0 GHz, Tofu Interconnect 795,02 4 10.5 8 Swiss National Supercomputing Centre (CSCS) Switzerland Piz Daint - Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x Cray Inc. 206,72 0 9.8 9 Argonne National Laboratory Mira BlueGene/Q, Power BQC 16 Cores, Custom interconnection 786,43 2 8.16 10 DOE/NNSA/LANL/SNL United States Trinity - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect Tarek El-Ghazawi, Cray GWU Inc. 301,05 6 8.1 19

History Source: top500.org. Also see: http://spectrum.ieee.org/tech-talk/computing/hardware/china-builds-worldsfastest-supercomputer 20

Supercomputers - History Computer Processor # Pr. Year R max (TFlops) Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz 10649600 2016 93,014 Tianhe-2 (MilkyWay-2) TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P 3120000 2013 33,862 Titan Cray XK7, Opteron 16 Cores, 2.2GHz, Nvidia K20X 560640 2012 17,600 K-Computer, Japan SPARC64 VIIIfx 2.0GHz, 705024 2011 10,510 Tianhe-1A, China Intel EM64T Xeon X56xx (Westmere-EP) 2930 MHz (11.72 Gflops) + NVIDIA GPU, FT-1000 8C 186368 2010 2,566 Jaguar, Cray Cray XT5-HE Opteron Six Core 2.6 GHz 224162 2009 1,759 Roadrunner, IBM PowerXCell 8i 3200 MHz (12.8 GFlops) 122400 2008 1,026 BlueGene/L - eserver Blue Gene Solution, IBM BlueGene/L eserver Blue Gene Solution, IBM PowerPC 440 700 MHz (2.8 GFlops) 212992 2007 478 PowerPC 440 700 MHz (2.8 GFlops) 131072 2005 280 BlueGene/L beta-system IBM PowerPC 440 700 MHz (2.8 GFlops) 32768 2004 70.7 Earth-Simulator / NEC NEC 1000 MHz (8 GFlops) 5120 2002 35.8 IBM ASCI White,SP POWER3 375 MHz (1.5 GFlops) 8192 2001 7.2 IBM ASCI White,SP POWER3 375MHz (1.5 GFlops) 8192 2000 4.9 Intel ASCI Red Intel IA-32 Pentium Pro 333 MHz (0.333 GFlops) 9632 1999 2.4 21

Historical Analysis PetaFLOPS TeraFLOPS Performance Vector Machines Massively Parallel Processors MPPs with Multicores and Heterogeneous Accelerators Tons of Lightweight Cores Discrete Integrated 1993- HPCC 2008-2011 End of Moore s Law in Clocking! 2016 Time 22

DARPA High-Productivity Computing Systems Launched in 2002 Next Generation Supercomputers by 2010 Not only performance, but productivity, where Productivity = f(execution time, Development time) Typically, Productivity = utility/cost Addresses everything hardware and software 23

HPCS Structure Each Team is led by a company and includes university research groups Three Phases Phase I: Research Concepts SGI, HP, Cray, IBM, and Sun Phase II: R&D Cray, IBM, Sun Phase III: Deployment Cray, IBM GWU with SGI in Phase I and IBM in Phase II 24

IBM, Sun & Cray s effort on HPCS Vendor Project Hardware Arch. Language IBM PERCES Power PC X10 Sun Hero Rock, Multi-core Sparc Fortress Cray Cascade Chapel 25

HPCS on IBM, Sun & Cray IBM PERCS(Productive, Easy-to-use, Reliable Computing System) Power Architecture Sum Hero Multi-core Rock Sparc Cray Cascade 26

What is New in HPCS Architecture Lots of Parallelism on the Chip Intelligent and Transactional Memory Innovative Co-Processing: Streaming, PIM, Computations migrate to data, instead of data going to computations Programming PGAS programming Models Parallel Matlab and other simple interfaces Multiple types of parallelism and locality Transactions Reliability Self-Healing More proprietary stuff 27

What is Next: Exascale and DOE The DoE Exascale Computing Project Goals: Deliver 50x performance of today s systems (20 PF) Operate with 20-30 MW power Be sufficiently resilient (MTTI < 1 week) Software stack supporting wide range of apps Growth of supercomputing capability Source: Figure modified from singularity.com Source: https://energy.gov/sites/prod/files/ 2013/09/f2/20130913-SEAB- DOE-Exascale-Initiative.pdf 28

Technical Challenges on The Road to Exascale Bill Dally, Technical Challenges on the Road to Exascale http://developer.download.nvidia.com/gtc/pdf/gtc2012/presentationpdf/billdally_nvidia_sc12.pdf 29

Technical Challenges on The Road to Exascale The High Cost of Data Movement Fetching operands costs more than computing on them 10000" 1000" Picojoules*Per*64bit*opera2on* 100" 10" 1" DP"FLOP" Register" 1mm"on3chip" 5mm"on3chip" 15mm"on3chip" Off3chip/DRAM" local"interconnect" 2008"(45nm)" 2018"(11nm)" Cross"system" Source: ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems Courtesy: John Shalf, PGAS 2015 30

Three pre-exascale Supercomputers as part of the CORAL intiative from DOE Summit Sierra Aurora Contract budget $325M combined $200M Location Oak Ridge Livermore Argonne Delivery Date 2017-18 2018-19 Vendor IBM Cray Node Architecture Multiple IBM POWER9 CPUs, Multiple NVIDIA Volta GPUs Intel Knights Hill Many-core CPUs Node Performance 40+ TFLOPS - 3+ TFLOPS Interconnect Mellanox Dual Rail EDR InfiniBand Intel Omni-Path R peak 150 PFLOPS 120-150 PFLOPS 180 PFLOPS Nodes ~3,400-50,000+ Power ~10 MW ~10 MW ~13 MW 31 31

Aurora Highlights Available Data: Cray Shasta compute platform Intel Knights Hill Manycore CPUs 3 rd Gen Manycore 10nm node 3+ TFLOPS per node 50,000+ nodes 180 PFLOPS 13 MW Intel Omni-Path (2 nd Gen) with Silicon Photonics 500+ TB/s Bisection Bandwidth 2.5+ PB/s Aggregate Node Link Bandwidth Prediction for Next Gen: 1 processor per node One 100-core CPU capable of 4.5TFLOPS peak, or 3+TFLOPS sustained Dual Omni400 Gb/s aggregate BW per node 50,000 nodes 4 nodes per blade 12,500 blades 16 blades per chassis 782 chassis 6 chassis per group 130 groups 32 32

GWU HPCL Facility 33

Historical Highlights of the Facility ~ 50 Tons of Cooling, 2000 sq of elevated floor,.25 MW of power Small experimental parallel systems that represent a wide spectrum of architectural ideas Systems with GPU Accelerators from Cray and ACI System with Intel Phi Accelerators from ACI Systems with FPGA Accelerators from SRC, SGI, Cray and Starbridge Homegrown clusters with Infinitband, Myrinet Many experimental boards and workstations from Xilinx, Intel, 34

35

36

37

GW CRAY XE6m/XK7m 1856 Processor Core Based on 12-core 64-bit AMD Opteron 6100 Series processors and 16-core AMD Bulldozer processors 32 Nvidia K20 GPUs 64 GB registered ECC DDR3 SDRAM per compute node 1 Gemini routing and communications ASIC per two compute nodes 38

39

Conclusions HPC is critical for economic competitiveness at all levels, and it is turning into an international race! Advances in HPC today are the same advances in conventional computing tomorrow HPC is ubiquitous as all computing turns into HPC Multicores and heterogeneous accelerator architecture are getting rising attention but lack software infrastructure and hardware support and will require new programming models and OS support, an opportunity for leadership in research 40

Light Reading http://spectrum.ieee.org/computing/hardware/ibmreclaims-supercomputer-lead, 2005 http://spectrum.ieee.org/techtalk/computing/hardware/china-builds-worldsfastest-supercomputer, 2010 http://spectrum.ieee.org/computing/hardware/china s-homegrown-supercomputers, 2012 41