Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Similar documents
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

CEA and RIKEN AICS Collaboration

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED

IHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing

Post-K: Building the Arm HPC Ecosystem

Introduction of Oakforest-PACS

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED

Basic Specification of Oakforest-PACS

A Multi-Kernel Survey for High-Performance Computing

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities--

Revisiting Virtual Memory for High Performance Computing on Manycore Architectures: A Hybrid Segmentation Kernel Approach

Arm's role in co-design for the next generation of HPC platforms

The Arm Technology Ecosystem: Current Products and Future Outlook

Technical Computing Suite supporting the hybrid system

High Performance Computing Systems

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED

GOING ARM A CODE PERSPECTIVE

Software Ecosystem for Arm-based HPC

Fujitsu s new supercomputer, delivering the next step in Exascale capability

Kengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Extreme-Scale Operating Systems

Fujitsu s Technologies to the K Computer

Overview of the Post-K processor

ARM High Performance Computing

FUJITSU HPC and the Development of the Post-K Supercomputer

Toward Building up ARM HPC Ecosystem

Fujitsu s Approach to Application Centric Petascale Computing

The Architecture and the Application Performance of the Earth Simulator

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Pedraforca: a First ARM + GPU Cluster for HPC

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor

Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

A Design of Hybrid Operating System for a Parallel Computer with Multi-Core and Many-Core Processors

FUJITSU PHI Turnkey Solution

PRIMEHPC FX10: Advanced Software

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25

Cray RS Programming Environment

The DEEP (and DEEP-ER) projects

Programming for Fujitsu Supercomputers

System Software Stack for the Next Generation High-Performance Computers

Cycle Sharing Systems

Fujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited

The State and Needs of IO Performance Tools

Arm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Open Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project,

mos: An Architecture for Extreme Scale Operating Systems

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning

Arm Processor Technology Update and Roadmap

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

Jay Kruemcke Sr. Product Manager, HPC, Arm,

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED

Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi

Overview of Tianhe-2

page migration Implementation and Evaluation of Dynamic Load Balancing Using Runtime Performance Monitoring on Omni/SCASH

MPI-Adapter for Portable MPI Computing Environment

Intel Performance Libraries

MAGMA. Matrix Algebra on GPU and Multicore Architectures

Compute Node Linux: Overview, Progress to Date & Roadmap

MILC Performance Benchmark and Profiling. April 2013

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

HPC future trends from a science perspective

Programming Environment Research Team

Enabling the ARM high performance computing (HPC) software ecosystem

ARM Performance Libraries Current and future interests

The Mont-Blanc project Updates from the Barcelona Supercomputing Center

10 Steps to Virtualization

MAGMA: a New Generation

Fujitsu High Performance CPU for the Post-K Computer

Transforming the Data Center with ARM

Harp-DAAL for High Performance Big Data Computing

Overview of Intel Xeon Phi Coprocessor

Performance Comparison between Two Programming Models of XcalableMP

Findings from real petascale computer systems with meteorological applications

MDHIM: A Parallel Key/Value Store Framework for HPC

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

Barcelona Supercomputing Center

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

Intel Math Kernel Library 10.3

Overview of Reedbush-U How to Login

HOKUSAI System. Figure 0-1 System diagram

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

Early Experiences Writing Performance Portable OpenMP 4 Codes

Arm crossplatform. VI-HPS platform October 16, Arm Limited

NAMD Performance Benchmark and Profiling. November 2010

Post-Petascale Computing. Mitsuhisa Sato

An In-Situ Visualization Approach for the K computer using Mesa 3D and KVS

The Earth Simulator Current Status

Breaking the Boundaries in Heterogeneous-ISA Datacenters

AUTOMATIC SMT THREADING

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Intel Many Integrated Core (MIC) Architecture

The way toward peta-flops

Scaling with PGAS Languages

The Eclipse Parallel Tools Platform

Transcription:

Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017

FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing wide range of HPC applications, running on post K, in order to solve social and science issues in Japan Project organization Post K Computer development RIKEN AICS is in charge of development Fujitsu is vendor partner. International collaborations: DOE, JLESC, CEA Applications The government selected 9 social & scientific priority issues and their R&D organizations. 4 Exploratory Issues I/O Network Maitenance Servers Portal Servers Login Servers Hierarchical Storage System Status Basic Design was finalized and now in Design and Implementation phase. We have decided to choose ARM v8 with SVE as ISA for post K manycore processor. We are working on detail evaluation by simulators and compilers 20017/11/02 2

CPU Architecture ARMv8-A + SVE (Scalable Vector Extension) FP64/FP32/FP16 https://developer.arm.com/products/architecture/a-profile/docs Fujitsuʼs extensions Reducing fork/join sync. Inter core barrier Sector cache Hardware prefetch assist 20017/11/02 3

Easy of Use Easy of use is one of our KPIs (Key Performance Indicators) Linux Distributions Fujitsu Providing wide range of applications/tools/libraries/compilers Languages Fortran, C11/C++, OpenMP, Java, etc. Math Libraries Comm. and I/O MPI(OpenMPI),,, OS Kernel https://www.linaro.org/sig/hpc/ The HPC SIG drives the adoption of ARM in HPC through the creation of a data center ecosystem. It is a collaborative project comprised of members and an advisory board. Current members include ARM, HiSilicon, Qualcomm, Fujitsu, Cavium, Red Hat and HPE. CERN and Riken are on the advisory board. Riken Linux for post-k Languages & Domain Specific Languages XMP, FDPS, Formula Comm. and I/O MPICH, LLC, PiP, DTF, FTAR OpenHPC is a Linux Foundation Collaborative Project whose mission is to provide a reference collection of opensource HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools. OS Kernel IHK/McKernel 20017/11/02 4

McKernel developed at RIKEN Partition resources (CPU cores, memory) Full Linux kernel on some cores System daemons and in-situ non HPC applications Device drivers Light-weight kernel(lwk), McKernel on other cores HPC applications Daemons and kernel activities are In situ non HPC application source of OS noises System daemons TCP stack Dev. Drivers Interrupt Linux VFS File Sys Driers Partition Complex Mem. Mngt. General scheduler McKernel is loadable module of Linux No changes of Linux McKernel supports Linux API Memory The same binary of Linux runs on McKernel Thin LWK Core Core Core Core McKernel currently runs on Intel Xeon and Xeon phi Fujitsu FX10 and FX100 (Experiments) McKernel is being ported to ARM by Fujitsu Enhancement for ARM will start at FY2018 Very? simple memory management No jitter environment HPC Applications Linux API (glibc, /sys/, /proc/) Partition New features are easily implemented Customized OS is dynamically launched Process/Thread management Corefor a Core user 20017/11/02 5

How to deploy McKernel Linux Kernel+Loadable LWK, McKernel Linux Kernel is resident, and daemons for job scheduler and etc. run on Linux McKernel is dynamically reloaded (rebooted) for each application No hardware reboot App A, requiring LWKwithout-scheduler, Is invoked App B, requiring LWK-withscheduler, Is invoked Finish App C, using full Linux capability, Is invoked Finish Finish 20017/11/02 6

OS Noise is harmful in large-scale HPC FWQ (Fixed Work Quanta) Benchmark Measurement of Jitter on single node McKernel Linux 20017/11/02 7

GeoFEM (University of Tokyo) ICCG with Additive Schwartz Domain Decomposition - weak scaling Up to 18% improvement Higher is better Figure of merit (solved problem size normalized to execution time) 16 14 12 10 8 6 4 2 0 Linux IHK/McKernel Results using the same binary Acknowledgement: Kengo Nakajima, University of Tokyo, for providing GeoFEM. This result is on Oakforest PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo 1024 2048 4096 8192 16k 32k 64k 128k Number of physical cores 20017/11/02 8

CCS-QCD (University of Tsukuba) Lattice quantum chromodynamics code - weak scaling Up to 38% improvement Higher is better MFlop/sec/node 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Linux IHK/McKernel Results using the same binary Acknowledgement: Ken ichi Ishikawa, Hiroshima University, providing CCS QCD. This result is on Oakforest PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo 1024 2048 4096 8192 16k 32k 64k 128k Number of physical cores 20017/11/02 9

minife (CORAL benchmark suite) Conjugate gradient - strong scaling Up to 3.5X improvement (Linux falls over.. ) Higher is better Total CG MFlops 12000000 10000000 8000000 6000000 4000000 2000000 Linux IHK/McKernel 3.5X Results using the same binary 20017/11/02 0 1024 2048 4096 8192 16k 32k 64k Number of physical cores Oakforest PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo 10

Concluding Remarks The system software stack for post-k is being designed and implemented with the leverage of international collaborations, CEA, DOE Labs, and JLESC (NCSA, INRIA, ANL, BSC, JSC, RIKEN) DOE MEXT Optimized Memory Management, Efficient MPI for exascale, Dynamic Execution Runtime, Storage Architectures, Metadata and active storage, Storage as a Service, Parallel I/O Libraries, MiniApps for Exascale CoDesign, Performance Models for Proxy Apps, OpenMP/XMP Runtime, Programming Models for Heterogeneity, LLVM for vectorization, Power Monitoring and Control, Power Steering, Resilience API, Shared Fault Data, etc. CEA Programming Language, Runtime Environment, Energy aware batch job scheduler, Large DFT calculations and QM/MM, Application of High Performance Computing to Earthquake Related Issues of Nuclear Power Plant Facilities, KPIs (Key Performance Indicators) JLESC: Joint Laboratory on Extreme Scale Computing (NCSA, ANL, UTK, JSC, BSC, INRIA, RIKEN) Simplified Sustained System performance benchmark, Developer tools for porting and tuning parallel applications on extreme scale parallel systems,. HPC libraries for solving dense symmetric eigenvalue problems,.dtf+sz: Using lossy data compression for direct data transfer in multicomponent applications, Process in Process: Techniques for Practical Address Space Sharing The software stack developed at RIKEN is open source 20017/11/02 11