GOING ARM A CODE PERSPECTIVE

Similar documents
EXASCALE COMPUTING ROADMAP IMPACT ON LEGACY CODES MARCH 17 TH, MIC Workshop PAGE 1. MIC workshop Guillaume Colin de Verdière

Arm's role in co-design for the next generation of HPC platforms

OVERVIEW OF MPC JUNE 24 TH LLNL Meeting June 15th, 2015 PAGE 1

Next Generation CEA Computing Centres

Barcelona Supercomputing Center

The Mont-Blanc project Updates from the Barcelona Supercomputing Center

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Modules v4. Yes, Environment Modules project is not dead. Xavier Delaruelle

The Arm Technology Ecosystem: Current Products and Future Outlook

The Mont-Blanc Project

Trends in HPC (hardware complexity and software challenges)

The Cray Programming Environment. An Introduction

Modules v4. Pushing forward user environment management. Xavier Delaruelle FOSDEM 2018 February 4th 2018, ULB, Bruxelles

Software Ecosystem for Arm-based HPC

CEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1

Technologies for Information and Health

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

Slurm at CEA. status and evolutions. 13 septembre 2013 CEA 10 AVRIL 2012 PAGE 1. SLURM User Group - September 2013 F. Belot, F. Diakhaté, M.

Building supercomputers from commodity embedded chips

RobinHood Project Status

The Mont-Blanc approach towards Exascale

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

MPC: Multi-Processor Computing Framework Guest Lecture. Parallel Computing CIS 410/510 Department of Computer and Information Science

Programming Environment 4/11/2015

Parallelization Using a PGAS Language such as X10 in HYDRO and TRITON

Butterfly effect of porting scientific applications to ARM-based platforms

Intel Math Kernel Library

Building supercomputers from embedded technologies

Scientific Programming in C XIV. Parallel programming

HPC projects. Grischa Bolls

User Training Cray XC40 IITM, Pune

Pedraforca: a First ARM + GPU Cluster for HPC

Atos ARM solutions for HPC

ARM High Performance Computing

DATA-MANAGEMENT DIRECTORY FOR OPENMP 4.0 AND OPENACC

Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

Introduction of Oakforest-PACS

Linear Algebra libraries in Debian. DebConf 10 New York 05/08/2010 Sylvestre

I/O Monitoring at JSC, SIONlib & Resiliency

Future of IO: a long and winding road

Intel Many Integrated Core (MIC) Architecture

Enabling the ARM high performance computing (HPC) software ecosystem

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Jack Dongarra University of Tennessee Oak Ridge National Laboratory

Overview of research activities Toward portability of performance

HOKUSAI System. Figure 0-1 System diagram

An Introduction to OpenACC

CODE ANALYSES FOR NUMERICAL ACCURACY WITH AFFINE FORMS: FROM DIAGNOSIS TO THE ORIGIN OF THE NUMERICAL ERRORS. Teratec 2017 Forum Védrine Franck

Reusing this material

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

PORTING PARALLEL APPLICATIONS TO HETEROGENEOUS SUPERCOMPUTERS: LIBRARIES AND TOOLS CAN MAKE IT TRANSPARENT

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

Computer Architecture!

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Tuning Alya with READEX for Energy-Efficiency

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Cray Scientific Libraries. Overview

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Transport Simulations beyond Petascale. Jing Fu (ANL)

Intel Math Kernel Library 10.3

Oracle Developer Studio 12.6

Altix Usage and Application Programming

Getting Reproducible Results with Intel MKL

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

Arm Processor Technology Update and Roadmap

Our new HPC-Cluster An overview

Beyond Hardware IP An overview of Arm development solutions

Scientific Computing with Intel Xeon Phi Coprocessors

Arm in High Performance Computing: Fortran on AArch64

Practical High Performance Computing

Mixed MPI-OpenMP EUROBEN kernels

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information

Lecture 3: Intro to parallel machines and models

Computer Architecture!

The Cray Programming Environment. An Introduction

The Cray Rainier System: Integrated Scalar/Vector Computing

CCR. ISC18 June 28, Kevin Pedretti, Jim H. Laros III, Si Hammond SAND C. Photos placed in horizontal env

European energy efficient supercomputer project

Intel Performance Libraries

Commodity Cluster Computing

Basic Specification of Oakforest-PACS

Efficient Application Mapping on CGRAs Based on Backward Simultaneous Scheduling / Binding and Dynamic Graph Transformations

RobinHood Project Update

John Hengeveld Director of Marketing, HPC Evangelist

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

OP2 FOR MANY-CORE ARCHITECTURES

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

UAntwerpen, 24 June 2016

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München

ECE 574 Cluster Computing Lecture 2

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

Intel Parallel Studio XE 2015

Debugging, benchmarking, tuning i.e. software development tools. Martin Čuma Center for High Performance Computing University of Utah

GPUs and Emerging Architectures

Brief notes on setting up semi-high performance computing environments. July 25, 2014

Transcription:

GOING ARM A CODE PERSPECTIVE ISC18 Guillaume Colin de Verdière JUNE 2018 GCdV PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France June 2018

A history of disruptions All dates are installation dates of the machines at CEA/DAM CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 2 GCdV

The scalar era CDC 7600 1976-1987 Small central memory Larger out of core memory 60bits words Fortran Scalar codes Before IBM 7094 1963 1966 IBM 360/50 1966 1973 CDC 6600 1974-1982 GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 3

The first disruption: vectorization CRAY 1S 1982-1990 64 bits words 1 proc 80 MHz 160 Mflops GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 4

Stability period CRAY XMP 1990 1993 4 procs, 0.96 Gflops CRAY YMP 1990 1997 8 procs, 2.7 Gflops CRAY T90 1996 2002 24 procs, 1.8Gflops/proc, 454MHz IEEE 754 Main use : concurrent scalar jobs GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 5

Introduction of parallelism (R&D) CRAY T3D 1994 1997 64 nodes, 128 procs (Alpha @ 150 MHz) 2x64MB CRAY T3E 1996 2001 192 procs (Alpha) Mainly PVM, a bit of MPI GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 6

The second disruption: Cluster supercomputers TERA-1 2002 2006 640 nodes, 4 x EV68 5TFlops TERA- 10 2006 2011 624 nodes, Intel Montecito @ 1.5GHz, 48GB/node TERA-100 2011 2018 4370 nodes, Intel Xeon 7500 5 MW HPL: 1.05Pflops MPI Domain decomposition mainly GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 7

The third disruption: multi level parallelism TERA1000-2 8004 nodes, KNL 4MW HPL: 11.96 Pflops Strong impact on codes: 3 level parallelization MPI across nodes OpenMP across cores Vectorization inside a core The dawn of the fourth disruption Energy Awareness GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 8

Reasons to go Arm CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 9 GCdV

Code portability is essential Our code are long lived 20 30 years = Several generations of supercomputers Most are mission critical Tera 10 100 1000 are Intel based Need an alternative architecture to validate codes Different compilers Standards interpretations Different optimized libraries Rounding influences Different ISA Difference in optimizations GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 10

Codes in the future Energy is becoming more and more important Our focus is on exascale class machines for ~2022 Balanced between Energy to Solution and Time to Solution Certain classes of codes are better suited to E2S Older ones are more T2S Codes will have to adapt to this new constraint Better dialog with the system hints to SLURM, frequency regulation, New energy aware algorithms Minimize data movements Make the developers conscious of the resources used. GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 11

Arm Proof of Concept future partition Goal: study emerging high efficiency architectures Architecture based on the Mont-Blanc3 (*) project results To be installed later in 2018 Future partition Node type 2* THX2 (30 cores @ 2.2 GHz -- tbc) # compute nodes 160 (tbc) Memory size 256 GB / node I/O router 20 GB/s Interconnect EDR pruning 1:2 Cooling DLC (Sequana infrastructure) (*) This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement n 671697 GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 12

Challenges going Arm CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 13 GCdV

The compiler and runtime matter Lulesh V2.0.3, OpenMP GCC 7.3 Cce/8.6.2 1,2 1 0,8 Parallel efficiency 0,6 Efficacité OpenMP CRAY Efficacité OpenMP GNU 0,4 Cavium ThunderX2 A2 stepping 0,2 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 Credit: JC Weill GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 14

Optimized libraries matter (1/2) A standard Arm version of Intel SVML SVML is automatically used by Intel compiler https://developer.arm.com/products/software-developmenttools/hpc/documentation/vector-math-routines -fsimdmath for C/C++ Based on SLEEF Vectorized Math Library http://sleef.org/ What about optimized FTN? is flang up to the task? GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 15

Optimized libraries matter (2/2) Intel MKL library is heavily used in production Blas, Lapack, FFT, In most cases hand coded algorithms don t beat MKL In the Arm world, optimized libraries start to exist https://developer.arm.com/products/software-development-tools/hpc/arm-performancelibraries BLAS - Basic Linear Algebra Subprograms (including XBLAS, the extended precision BLAS). LAPACK - a comprehensive package of higher level linear algebra routines. FFT - a set of Fast Fourier Transform routines for real and complex data. Math Routines - Optimized exp, pow and log routines Multithreaded versions? TBB? Tasks in general? GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 16

Prepare our code port: CEA-Arm/Allinea collaboration Longstanding collaboration between CEA and Allinea CEA has funded quite a lot DDT, MAP and Performance Report Scalability (large number of cores, large number of libraries, KNL, ) Robustness Thread support (MPC, OpenMP) C++ friendliness Allinea Metric Plugin Interface compatibility with OpenSource profiling tools like MALP (http://malp.hpcframework.com) Collaboration extension to Arm (WiP) Idea: have the same developer experience on Arm and on X86 New items for co-design Compiler MPC support in LLVM, linker optimization, Optimized scientific libraries Profiling and debugging tools for Arm MPI, perf counters, vectorization, Thread debugging OS support (Work in Progress) GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 17

Conclusion Energy to Solution will gain importance Arm based solutions are to be investigated seriously Arm based (super)computers are available now Thanks to MontBlanc3 (Atos) but also CRAY, HPE, The software ecosystem is maturing fast Compiler and libraries provided by Arm Allinea tools It is time to start porting codes to Arm And investigate new algorithms that take Energy into account GCdV CEA, DAM, DIF, F-91297 Arpajon, France June 2018 PAGE 18

PAGE 19 CEA, DAM, DIF, F-91297 Arpajon, France June 2018 Commissariat à l énergie atomique et aux énergies alternatives Centre DAM Ile-de-France 91297 Arpajon Cedex, France T. +33 (0)1 69 26 40 00 DAM DSSI ED GCdV Etablissement public à caractère industriel et commercial RCS Paris B 775 685 019