Exascale: challenges and opportunities in a power constrained world

Similar documents
Carlo Cavazzoni, HPC department, CINECA

HPC Cineca Infrastructure: State of the art and towards the exascale

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

HPC Architectures past,present and emerging trends

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Von Antreibern und Beschleunigern des HPC

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

New HPC architectures landscape and impact on code developments Massimiliano Guarrasi, CINECA Meeting ICT INAF Catania, 13/09/2018

HPC IN EUROPE. Organisation of public HPC resources

Umeå University

Umeå University

HPC Architectures past,present and emerging trends

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

D.A.V.I.D.E. (Development of an Added-Value Infrastructure Designed in Europe) IWOPH 17 E4. WHEN PERFORMANCE MATTERS

April 2 nd, Bob Burroughs Director, HPC Solution Sales

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

Interconnect Your Future

HPC Technology Trends

HPC Architectures evolution: the case of Marconi, the new CINECA flagship system. Piero Lanucara

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

Pedraforca: a First ARM + GPU Cluster for HPC

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Trends in HPC Architectures and Parallel

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

The Future of High Performance Computing

HPC projects. Grischa Bolls

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

Trends in HPC (hardware complexity and software challenges)

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

HPC Architectures. Types of resource currently in use

The Mont-Blanc approach towards Exascale

Barcelona Supercomputing Center

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

HPC future trends from a science perspective

The Stampede is Coming: A New Petascale Resource for the Open Science Community

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

Interconnect Your Future

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

GPUs and Emerging Architectures

Green Supercomputing

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

The Future of High- Performance Computing

Performance of computer systems

Introduction to High-Performance Computing

Steve Scott, Tesla CTO SC 11 November 15, 2011

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Supercomputer and grid infrastructure! in Poland!

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

SCAI SuperComputing Application & Innovation CINECA

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

IBM Power Systems HPC Cluster

Accelerating Insights In the Technical Computing Transformation

CINECA and the European HPC Ecosystem

EUDAT & SeaDataCloud

Jülich Supercomputing Centre

The Future of High Performance Interconnects

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer

Fujitsu s Approach to Application Centric Petascale Computing

An Introduction to the SPEC High Performance Group and their Benchmark Suites

Looking ahead with IBM i. 10+ year roadmap

Timothy Lanfear, NVIDIA HPC

Bei Wang, Dmitry Prohorov and Carlos Rosales

Cray XC Scalability and the Aries Network Tony Ford

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration

Organizational Update: December 2015

Update on LRZ Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities. 2 Oct 2018 Prof. Dr. Dieter Kranzlmüller

The Mont-Blanc Project

PART I - Fundamentals of Parallel Computing

HPC SERVICE PROVISION FOR THE UK

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

IBM Power Systems Update. David Spurway IBM Power Systems Product Manager STG, UK and Ireland

I/O: State of the art and Future developments

Benchmark results on Knight Landing (KNL) architecture

Workshop: Innovation Procurement in Horizon 2020 PCP Contractors wanted

Fujitsu s Technologies to the K Computer

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

General overview and first results.

Intel Many Integrated Core (MIC) Architecture

Systems Architectures towards Exascale

Solutions for Scalable HPC

Intel Xeon Phi архитектура, модели программирования, оптимизация.

The IBM Blue Gene/Q: Application performance, scalability and optimisation

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Transcription:

Exascale: challenges and opportunities in a power constrained world Carlo Cavazzoni c.cavazzoni@cineca.it SuperComputing Applications and Innovation Department

CINECA CINECA non profit Consortium, made up of 70 Italian universities*, The National Institute of Oceanography and Experimental Geophysics - OGS, the CNR (National Research Council), and the Ministry of Education, University and Research (MIUR). CINECA is the largest Italian computing centre, one of the most important worldwide. The HPC department manage the HPC infrastructure, provide support to Italian and European researchers, promote technology transfer initiatives for industry.

PRACE The mission of PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to realize this mission by offering world class computing and data management resources and services through a peer review process. PRACE also seeks to strengthen the European users of HPC in industry through various initiatives. PRACE has a strong interest in improving energy efficiency of computing systems and reducing their environmental impact. http://www.prace-ri.eu/call-announcements/ http://www.prace-ri.eu/prace-resources/

SuperComputing Applications and Innovation. Accelerate the scientific discovery by providing high performance computing resources, data management and storage systems, tools and HPC services, and expertise at large Develop and promote technical and scientific services related to high-performance computing for the Italian and European research community. Enables world-class scientific research by operating and supporting leading-edge supercomputing technologies and by managing a state-of-the-art and effective environment for the different scientific communities. Support and consultancy in HPC tools and techniques and in several scientific domains, such as physics, particle physics, material sciences, chemistry, fluid dynamics

FERMI Name: Fermi Architecture: BlueGene/Q (10 racks) Processor type: IBM PowerA2 @1.6 GHz Computing Nodes: 10.240 Each node: 16 cores and 16GB of RAM Computing Cores: 163.840 RAM: 1GByte / core (163 TBytetotal) Internal Network: 5D Torus Disk Space: 2PByte of scratch space Peak Performance: 2PFlop/s Power Consumption: 820 kwatts High-end system, only for extremely scalable applications N. 7 in Top 500 rank (June 2012) National and PRACE Tier-0 calls

GALILEO Name: Galileo Model: IBM/Lenovo NeXtScale X86 based system for production of medium scalability applications Processor type: Intel Xeon Haswell@ 2.4 GHz Computing Nodes: 516 Each node: 16 cores, 128 GB of RAM Computing Cores: 8.256 RAM: 66 TByte Internal Network: Infiniband4xQDR switches (40 Gb/s) Accelerators: 768 Intel Phi 7120p (2 per node on 384 nodes) + 80 NvidiaK80 Peak Performance: 1.5 PFlops National and PRACE Tier-1 calls

PICO Storage and processing of large volumes of data Model: IBM NeXtScale IB linux cluster Processor type: Intel Xeon E5 2670 v2 @2,5Ghz Computing Nodes: 80 Each node: 20 cores/node, 128 GB of RAM 2 Visualization nodes 2 Big Memnodes 4 data mover nodes Storage 50TByte of SSD 5PByte on-line repository (same fabric of the cluster) 16PByte of tapes Services Hadoop & PBS OpenStack cloud NGS pipelines Workflows (weather/sea forecast) Analytics High-throughput workloads

Cineca Road-map Tier0: Fermi (BGQ) Tier1: Galileo BigData: Pico Tier0: new (on going procurement) (HPC Top10) BigData: Galileo/Pico Tier0 BigData: 50PFlops 50PByte today Q1 2016 Q1 2019

Dennardscalinglaw (downscaling) new VLSI gen. old VLSI gen. L = L / 2 V = V / 2 F = F * 2 D = 1 / L 2 = 4D P = P do not hold anymore! The core frequency and performance do not grow following the Moore s law any longer L = L / 2 V = ~V F = ~F * D = 1 / L 2 = 4 * D P = 4 * P Increase the number of cores to maintain the architectures evolution on the Moore s law The power crisis! Programming crisis!

Moore s Law Number of transistors per chip double every 18 month The true it double every 24 month Oh-oh! Huston!

The siliconlattice 0.54 nm Si lattice 50 atoms! There will be still 4~6 cycles (or technology generations) left until we reach 11 ~ 5.5 nm technologies, at which we will reach downscaling limit, in some year between 2020-30 (H. Iwai, IWJT2008).

Amdahl's law upper limit for the scalability of parallel applications determined by the fraction of the overall execution time spent in nonscalable operations (Amdahl's law). maximum speedup tends to 1 / ( 1 P ) P= parallel fraction 1000000 core P = 0.999999 serial fraction= 0.000001

HPC trends (constrained by the three law) Peak Performance exaflops Moore law opportunity FPU Performance gigaflops Dennard law Number of FPUs 10^9 Moore + Dennard App. Parallelism Serial fraction 1/10^9 Amdahl's law challenge

Energy trends traditional RISK and CISC chips are designed for maximum performance for all possible workloads A lot of silicon to maximize single thread performace Energy Datacenter Capacity Compute Power

Change of paradigm New chips designed for maximum performance in a small set of workloads Simple functional units, poor single thread performance, but maximum throughput Energy Datacenter Capacity Compute Power

Architecture toward exascale CPU ACC. GPU/MIC/FPGA Single thread perf. throughput CPU ACC. OpenPower Nvidia GPU bottleneck CPU ACC. AMD APU Photonic -> platform flexibility TSV -> stacking SoC KNL, ARM 3D stacking Active memory

Exascale architecture CPU ACC. Nvidia GPU Hybrid CPU ACC. AMD APU two model Homogeneus SoC ARM Intel

Accelerator/GPGPU + Sum of 1D array

Intel Vector Units Next to come: AVX-512 up to 16 Multiply Add / clock

Applications Challenges -Programming model -Scalability -I/O, Resiliency/Fault tolerance -Numerical stability -Algorithms -Energy Aware/Efficiency

www.quantum-espresso.org H2020 MaXCenter of Excellence

Scalability The case of Quantum Espresso QE parallelization hierarchy

ok for petascale, not enough for exascale 400 CNT10POR8 -CP on BGQ Ab-initio simulations -> numerical solution of the quantum mechanical equations 350 seconds /steps 300 250 200 150 calphi dforce rhoofr updatc ortho 100 50 0 4096 8192 16384 32768 65536 2048 4096 8192 16384 32768 1 2 4 8 16 Virtual cores Real cores Band groups 23

QE evolution High Throughput / Ensamble Simulations Communication avoiding New Algorithm: CG vs Davidson Coupled Application DSL LAMMPS Task level parallelism Double buffering QE QE QE Reliability Completeness Robustness Standard Interface

Multi-level parallelism Workload Management: system level, High-throughput Python: Ensemble simulations, workfows MPI: Domain partition OpenMP: Node Level shared mem CUDA/OpenCL/OpenAcc:/OpenMP4 floating point accelerators

26 QE (Al2O3 small benchmark) Energy to solution as a function of the clock

Conclusions Exascale Systems, will be there Power is the main architectural constraints Exascale Applications? Yes, but Concurrency, Fault Tolerance, I/O Energy aware