IHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing

Size: px
Start display at page:

Download "IHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing"

Transcription

1 : A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing Balazs Gerofi Exascale System Software Team, RIKEN Center for Computational Science 218/Nov/15 SC 18 Intel Extreme Computing Users Group (IXPUG) BoF

2 Motivation System software/os challenges for high-end HPC Node architecture: increasing complexity Large number of (possibly heterogeneous) processing cores, deep memory hierarchy, complex cache/numa topology Applications: increasing diversity Traditional/regular HPC + in-situ data analytics + Big Data processing + AI / Machine Learning + Workflows, etc. What do we need from the system software/os? Performance and scalability for large scale parallel apps Support for APIs tools, productivity, monitoring, etc. Full control over HW resources Ability to adapt to HW changes Emerging memory technologies, parallelism, power constrains We need performance and compatibility at the same time! Performance isolation and dynamic reconfiguration According to workload characteristics, support for co-location 2

3 : Lightweight Multi-kernel Architecture Interface for Heterogeneous Kernels (IHK): Allows dynamic partitioning of node resources (i.e., cores, physical memory, etc.) Enables management of multi-kernels (assign resources, load, boot, destroy, etc..) Provides inter-kernel communication (IKC), messaging and notification McKernel: A lightweight kernel developed from scratch, boots from IHK Designed for HPC, noiseless, simple, implements only performance sensitive system calls (roughly process and memory management) and the rest are offloaded to OS jitter contained in, LWK is isolated System daemon System call Kernel daemon Interrupt Proxy process Delegator module IHK Partition HPC Application IHK co-kernel McKernel Memory Partition System call 3

4 : Lightweight Multi-kernel Architecture Interface for Heterogeneous Kernels (IHK): Allows dynamic partitioning of node resources (i.e., cores, physical memory, etc.) Enables management of multi-kernels (assign resources, load, boot, destroy, etc..) Provides inter-kernel communication (IKC), messaging and notification McKernel: A lightweight kernel developed from scratch, boots from IHK No kernel modifications! No node reboot during reconfiguration and LWK initialization. Designed for HPC, noiseless, simple, implements only performance sensitive system calls (roughly process and memory management) and the rest are offloaded to OS jitter contained in, LWK is isolated System daemon System call Kernel daemon Interrupt Proxy process Delegator module IHK Partition HPC Application IHK co-kernel McKernel Memory Partition System call 4

5 vs. McKernel cores on Xeon Phi KNL NUMA NUMA 1 NUMA 2 NUMA 3 LWK runs on the majority of the chip A few cores are reserved for Mechanism to map inter-core communication to MPI process layout McKernel 5

6 Oakforest-PACS Configuration 8k Intel Xeon Phi (Knights Landing) compute nodes Intel OmniPath v1 interconnect Peak performance: ~25 PF Intel Xeon Phi 725 model: GHz 4 HW thread / core 272 logical OS s altogether 64 cores used for McKernel, 4 for 16 GB MCDRAM high-bandwidth memory Hot-pluggable in BIOS 96 GB DRAM Quadrant flat mode 6

7 Mini-applications on full-scale OFP 2.25E+11 2E E E E+11 1E E+1 5E+1 2.5E corespec corespec AMG213 19% MiniFE 2.8X 1.8E+8 1.6E+8 1.4E+8 1.2E+8 1.E+8 8.E+7 6.E+7 4.E+7 2.E+7.E+ 2.5E+8 2.3E+8 2.E+8 1.8E+8 1.5E+8 1.3E+8 1.E+8 7.5E+7 5.E+7 2.5E+7.E+ + corespec corespec Lulesh MILC ~2X 21% 7

8 Mini-applications on full-scale OFP corespec LAMMPS + corespec Analysis run+me (seconds) corespec corespec GeoFEM HPCG GAMERA 27% 8

9 Thank you for your attention! Questions? 9

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks

More information

A Multi-Kernel Survey for High-Performance Computing

A Multi-Kernel Survey for High-Performance Computing A Multi-Kernel Survey for High-Performance Computing Balazs Gerofi, Yutaka Ishikawa, Rolf Riesen, Robert W. Wisniewski, Yoonho Park, Bryan Rosenburg RIKEN Advanced Institute for Computational Science,

More information

Revisiting Virtual Memory for High Performance Computing on Manycore Architectures: A Hybrid Segmentation Kernel Approach

Revisiting Virtual Memory for High Performance Computing on Manycore Architectures: A Hybrid Segmentation Kernel Approach Revisiting Virtual Memory for High Performance Computing on Manycore Architectures: A Hybrid Segmentation Kernel Approach Yuki Soma, Balazs Gerofi, Yutaka Ishikawa 1 Agenda Background on virtual memory

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Multikernels Doug Shook Multikernels Two predominant approaches to OS: Full weight kernel Lightweight kernel Why not both? How does implementation affect usage and performance?

More information

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Update of Post-K Development Yutaka Ishikawa RIKEN AICS Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing

More information

Extreme-Scale Operating Systems

Extreme-Scale Operating Systems Extreme-Scale Operating Systems Rolf Riesen 23 August 2016 Copyright c 2016 Intel Corporation. All rights reserved. ROME Legal Disclaimer Intel and the Intel logo are trademarks of Intel Corporation in

More information

Introduction of Oakforest-PACS

Introduction of Oakforest-PACS Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

mos: An Architecture for Extreme Scale Operating Systems

mos: An Architecture for Extreme Scale Operating Systems mos: An Architecture for Extreme Scale Operating Systems Robert W. Wisniewski, Todd Inglett, Pardo Keppel, Ravi Murty, Rolf Riesen Presented by: Robert W. Wisniewski Chief Software Architect Extreme Scale

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture

More information

Basic Specification of Oakforest-PACS

Basic Specification of Oakforest-PACS Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS

More information

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Performance and Energy Usage of Workloads on KNL and Haswell Architectures Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research

More information

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility

More information

Performance Optimization of Smoothed Particle Hydrodynamics for Multi/Many-Core Architectures

Performance Optimization of Smoothed Particle Hydrodynamics for Multi/Many-Core Architectures Performance Optimization of Smoothed Particle Hydrodynamics for Multi/Many-Core Architectures Dr. Fabio Baruffa Dr. Luigi Iapichino Leibniz Supercomputing Centre fabio.baruffa@lrz.de Outline of the talk

More information

Intel Architecture for HPC

Intel Architecture for HPC Intel Architecture for HPC Georg Zitzlsberger georg.zitzlsberger@vsb.cz 1st of March 2018 Agenda Salomon Architectures Intel R Xeon R processors v3 (Haswell) Intel R Xeon Phi TM coprocessor (KNC) Ohter

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming

More information

Directions in Workload Management

Directions in Workload Management Directions in Workload Management Alex Sanchez and Morris Jette SchedMD LLC HPC Knowledge Meeting 2016 Areas of Focus Scalability Large Node and Core Counts Power Management Failure Management Federated

More information

Scheduler Optimization for Current Generation Cray Systems

Scheduler Optimization for Current Generation Cray Systems Scheduler Optimization for Current Generation Cray Systems Morris Jette SchedMD, jette@schedmd.com Douglas M. Jacobsen, David Paul NERSC, dmjacobsen@lbl.gov, dpaul@lbl.gov Abstract - The current generation

More information

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning 5th ANNUAL WORKSHOP 209 Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning Hari Subramoni Dhabaleswar K. (DK) Panda The Ohio State University The Ohio State University E-mail:

More information

Deep Learning with Intel DAAL

Deep Learning with Intel DAAL Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017 Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration

More information

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED Post-K Development and Introducing DLU 0 Fujitsu s HPC Development Timeline K computer The K computer is still competitive in various fields; from advanced research to manufacturing. Deep Learning Unit

More information

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization

More information

Efficient Parallel Programming on Xeon Phi for Exascale

Efficient Parallel Programming on Xeon Phi for Exascale Efficient Parallel Programming on Xeon Phi for Exascale Eric Petit, Intel IPAG, Seminar at MDLS, Saclay, 29th November 2016 Legal Disclaimers Intel technologies features and benefits depend on system configuration

More information

VLPL-S Optimization on Knights Landing

VLPL-S Optimization on Knights Landing VLPL-S Optimization on Knights Landing 英特尔软件与服务事业部 周姗 2016.5 Agenda VLPL-S 性能分析 VLPL-S 性能优化 总结 2 VLPL-S Workload Descriptions VLPL-S is the in-house code from SJTU, paralleled with MPI and written in C++.

More information

arxiv: v2 [hep-lat] 3 Nov 2016

arxiv: v2 [hep-lat] 3 Nov 2016 MILC staggered conjugate gradient performance on Intel KNL arxiv:1611.00728v2 [hep-lat] 3 Nov 2016 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli@umail.iu.edu Carleton

More information

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past,

More information

CEA and RIKEN AICS Collaboration

CEA and RIKEN AICS Collaboration CEA and RIKEN AICS Collaboration Yutaka Ishikawa RIKEN AICS 16:25 16:55 First French Japanese German Workshop on Programming and Computing for Exascale and beyond, 5 th April 2017, Tokyo Outline of Talk

More information

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies François Tessier, Venkatram Vishwanath, Paul Gressier Argonne National Laboratory, USA Wednesday

More information

System Software Stack for the Next Generation High-Performance Computers

System Software Stack for the Next Generation High-Performance Computers 1,2 2 Gerofi Balazs 1 3 2 4 4 5 6 7 7 PC CPU PC OS MPI I/O System Software Stack for the Next Generation High-Performance Computers Yutaka Ishikawa 1,2 Atsushi Hori 2 Gerofi Balazs 1 Masamichi Takagi 3

More information

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 ROME Workshop @ IPDPS Vancouver Memory Footprint of Locality Information On Many- Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 Locality Matters to HPC Applications Locality Matters

More information

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers

More information

HPC Architectures evolution: the case of Marconi, the new CINECA flagship system. Piero Lanucara

HPC Architectures evolution: the case of Marconi, the new CINECA flagship system. Piero Lanucara HPC Architectures evolution: the case of Marconi, the new CINECA flagship system Piero Lanucara Many advantages as a supercomputing resource: Low energy consumption. Limited floor space requirements Fast

More information

Introduction to Xeon Phi. Bill Barth January 11, 2013

Introduction to Xeon Phi. Bill Barth January 11, 2013 Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

What can/should we measure with benchmarks?

What can/should we measure with benchmarks? What can/should we measure with benchmarks? Jun Makino Department of Planetology, Kobe University FS2020 Project, RIKEN-CCS SC18 BoF 107 Pros and Cons of HPCx benchmarks Nov 13 Overview Last 40 years of

More information

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive Cori (2016) and Beyond Ensuring NERSC Users Stay Productive Nicholas J. Wright! Advanced Technologies Group Lead! Heterogeneous Mul-- Core 4 Workshop 17 September 2014-1 - NERSC Systems Today Edison: 2.39PF,

More information

Designing Shared Address Space MPI libraries in the Many-core Era

Designing Shared Address Space MPI libraries in the Many-core Era Designing Shared Address Space MPI libraries in the Many-core Era Jahanzeb Hashmi hashmi.29@osu.edu (NBCL) The Ohio State University Outline Introduction and Motivation Background Shared-memory Communication

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

A Cost Model for Data Stream Processing on Modern Hardware Constantin Pohl, Philipp Götze, Kai-Uwe Sattler

A Cost Model for Data Stream Processing on Modern Hardware Constantin Pohl, Philipp Götze, Kai-Uwe Sattler Processing on Modern Hardware Constantin Pohl, Philipp Götze, Kai-Uwe Sattler 31.08.17 Motivation and Introduction Main goals on Data Stream Processing Queries: High throughput & low latency Responsibility:

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System

More information

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016 The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016 2016 Supermicro 15 Minutes Two Swim Lanes Intel Phi Roadmap & SKUs Phi in the TOP500 Use Cases Supermicro

More information

Intel Knights Landing Hardware

Intel Knights Landing Hardware Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute

More information

INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT

INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT UPDATE ON OPENSWR: A SCALABLE HIGH- PERFORMANCE SOFTWARE RASTERIZER FOR SCIVIS Jefferson Amstutz Intel

More information

IXPUG 16. Dmitry Durnov, Intel MPI team

IXPUG 16. Dmitry Durnov, Intel MPI team IXPUG 16 Dmitry Durnov, Intel MPI team Agenda - Intel MPI 2017 Beta U1 product availability - New features overview - Competitive results - Useful links - Q/A 2 Intel MPI 2017 Beta U1 is available! Key

More information

Post-K: Building the Arm HPC Ecosystem

Post-K: Building the Arm HPC Ecosystem Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach

More information

A Design of Hybrid Operating System for a Parallel Computer with Multi-Core and Many-Core Processors

A Design of Hybrid Operating System for a Parallel Computer with Multi-Core and Many-Core Processors A Design of Hybrid Operating System for a Parallel Computer with Multi-Core and Many-Core Processors Mikiko Sato 1,5 Go Fukazawa 1 Kiyohiko Nagamine 1 Ryuichi Sakamoto 1 Mitaro Namiki 1,5 Kazumi Yoshinaga

More information

Leveraging Flash in HPC Systems

Leveraging Flash in HPC Systems Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,

More information

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends Collaborators: Richard T. Mills, Argonne National Laboratory Sarat Sreepathi, Oak Ridge National Laboratory Forrest M. Hoffman, Oak Ridge National Laboratory Jitendra Kumar, Oak Ridge National Laboratory

More information

Introduction to tuning on KNL platforms

Introduction to tuning on KNL platforms Introduction to tuning on KNL platforms Gilles Gouaillardet RIST gilles@rist.or.jp 1 Agenda Why do we need many core platforms? KNL architecture Single-thread optimization Parallelization Common pitfalls

More information

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh Lightweight Experience in a Consolidated Environment HPC applications need lightweight

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Best Practices for Setting BIOS Parameters for Performance

Best Practices for Setting BIOS Parameters for Performance White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page

More information

Bei Wang, Dmitry Prohorov and Carlos Rosales

Bei Wang, Dmitry Prohorov and Carlos Rosales Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512

More information

Intel Xeon PhiTM Knights Landing (KNL) System Software Clark Snyder, Peter Hill, John Sygulla

Intel Xeon PhiTM Knights Landing (KNL) System Software Clark Snyder, Peter Hill, John Sygulla Intel Xeon PhiTM Knights Landing (KNL) System Software Clark Snyder, Peter Hill, John Sygulla Motivation The Intel Xeon Phi TM Knights Landing (KNL) has 20 different configurations 5 NUMA modes X 4 memory

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF 1 Outline KNL results Our other work related to HPCG 2 ~47 GF/s per KNL ~10

More information

Data center: The center of possibility

Data center: The center of possibility Data center: The center of possibility Diane bryant Executive vice president & general manager Data center group, intel corporation Data center: The center of possibility The future is Thousands of Clouds

More information

Towards Exascale Computing with the Atmospheric Model NUMA

Towards Exascale Computing with the Atmospheric Model NUMA Towards Exascale Computing with the Atmospheric Model NUMA Andreas Müller, Daniel S. Abdi, Michal Kopera, Lucas Wilcox, Francis X. Giraldo Department of Applied Mathematics Naval Postgraduate School, Monterey

More information

Comparing Performance and Power Consumption on Different Architectures

Comparing Performance and Power Consumption on Different Architectures Comparing Performance and Power Consumption on Different Architectures Andriani Mappoura August 18, 2017 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2017 Abstract

More information

The GeantV prototype on KNL. Federico Carminati, Andrei Gheata and Sofia Vallecorsa for the GeantV team

The GeantV prototype on KNL. Federico Carminati, Andrei Gheata and Sofia Vallecorsa for the GeantV team The GeantV prototype on KNL Federico Carminati, Andrei Gheata and Sofia Vallecorsa for the GeantV team Outline Introduction (Digression on vectorization approach) Geometry benchmarks: vectorization and

More information

Introduction to tuning on many core platforms. Gilles Gouaillardet RIST

Introduction to tuning on many core platforms. Gilles Gouaillardet RIST Introduction to tuning on many core platforms Gilles Gouaillardet RIST gilles@rist.or.jp Agenda Why do we need many core platforms? Single-thread optimization Parallelization Conclusions Why do we need

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Inspur AI Computing Platform

Inspur AI Computing Platform Inspur Server Inspur AI Computing Platform 3 Server NF5280M4 (2CPU + 3 ) 4 Server NF5280M5 (2 CPU + 4 ) Node (2U 4 Only) 8 Server NF5288M5 (2 CPU + 8 ) 16 Server SR BOX (16 P40 Only) Server target market

More information

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,

More information

Tessellation: Space-Time Partitioning in a Manycore Client OS

Tessellation: Space-Time Partitioning in a Manycore Client OS Tessellation: Space-Time ing in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz 1 1 Parallel Computing Laboratory, UC Berkeley 2 Data

More information

Hard Real-time Scheduling for Parallel Run-time Systems

Hard Real-time Scheduling for Parallel Run-time Systems Hard Real-time Scheduling for Parallel Run-time Systems Peter Dinda Xiaoyang Wang Jinghang Wang Chris Beauchene Conor Hetland Prescience Lab Department of EECS Northwestern University pdinda.org presciencelab.org

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi IXPUG 14 Lars Koesterke Acknowledgements Thanks/kudos to: Sponsor: National Science Foundation NSF Grant #OCI-1134872 Stampede Award, Enabling, Enhancing, and Extending Petascale

More information

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 Trends of Network Topology on Supercomputers Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 From Graph Golf to Real Interconnection Networks Case 1: On-chip Networks Case 2: Supercomputer

More information

Exascale: challenges and opportunities in a power constrained world

Exascale: challenges and opportunities in a power constrained world Exascale: challenges and opportunities in a power constrained world Carlo Cavazzoni c.cavazzoni@cineca.it SuperComputing Applications and Innovation Department CINECA CINECA non profit Consortium, made

More information

LLVM for the future of Supercomputing

LLVM for the future of Supercomputing LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM Developers' Meeting What is Supercomputing? Computing for large, tightly-coupled problems. Lots of computational

More information

VARIABILITY IN OPERATING SYSTEMS

VARIABILITY IN OPERATING SYSTEMS VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021

More information

Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs

Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs USA November 14 th 2017 Legal Disclaimer & Optimization

More information

HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017

HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017 HPC Innovation Lab Update Dell EMC HPC Community Meeting 3/28/2017 Dell EMC HPC Innovation Lab charter Design, develop and integrate Heading HPC systems Lorem ipsum Flexible reference dolor sit amet, architectures

More information

April 2 nd, Bob Burroughs Director, HPC Solution Sales

April 2 nd, Bob Burroughs Director, HPC Solution Sales April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software

More information

HPC future trends from a science perspective

HPC future trends from a science perspective HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively

More information

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]

More information

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing Innovative Alternate Architecture for Exascale Computing Surya Hotha Director, Product Marketing Cavium Corporate Overview Enterprise Mobile Infrastructure Data Center and Cloud Service Provider Cloud

More information

Simulation using MIC co-processor on Helios

Simulation using MIC co-processor on Helios Simulation using MIC co-processor on Helios Serhiy Mochalskyy, Roman Hatzky PRACE PATC Course: Intel MIC Programming Workshop High Level Support Team Max-Planck-Institut für Plasmaphysik Boltzmannstr.

More information

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi

Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi Dr. Luigi Iapichino Leibniz Supercomputing Centre Supercomputing 2017 Intel booth, Nerve Center

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL SABELA RAMOS, TORSTEN HOEFLER Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL spcl.inf.ethz.ch Microarchitectures are becoming more and more complex CPU L1 CPU L1 CPU L1 CPU

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information

Decoupling Cores, Kernels and Operating Systems

Decoupling Cores, Kernels and Operating Systems Decoupling Cores, Kernels and Operating Systems Gerd Zellweger, Simon Gerber, Kornilios Kourtis, Timothy Roscoe Systems Group, ETH Zürich 10/6/2014 1 Outline Motivation Trends in hardware and software

More information

CMCP: A Novel Page Replacement Policy for System Level Hierarchical Memory Management on Many-cores

CMCP: A Novel Page Replacement Policy for System Level Hierarchical Memory Management on Many-cores CMCP: A Novel Page Replacement Policy for System Level Hierarchical Memory Management on Many-cores Balazs Gerofi, Akio Shimada, Atsushi Hori, Takagi Masamichi, Yutaka Ishikawa, Graduate School of Information

More information

Preparing your Application for Advanced Manycore Architectures

Preparing your Application for Advanced Manycore Architectures Preparing your Application for Advanced Manycore Architectures Katie Antypas Services Dept Head, NERSC-8 Project Lead CSGF HPC Workshop July 17, 2014-1 - What is Manycore? No precise definition Multicore

More information

Umeå University

Umeå University HPC2N @ Umeå University Introduction to HPC2N and Kebnekaise Jerry Eriksson, Pedro Ojeda-May, and Birgitte Brydsö Outline Short presentation of HPC2N HPC at a glance. HPC2N Abisko, Kebnekaise HPC Programming

More information

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, Performance Profiler Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, 08-09-2016 Faster, Scalable Code, Faster Intel VTune Amplifier Performance Profiler Get Faster Code Faster With Accurate

More information

Umeå University

Umeå University HPC2N: Introduction to HPC2N and Kebnekaise, 2017-09-12 HPC2N @ Umeå University Introduction to HPC2N and Kebnekaise Jerry Eriksson, Pedro Ojeda-May, and Birgitte Brydsö Outline Short presentation of HPC2N

More information

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

Big Data Systems on Future Hardware. Bingsheng He NUS Computing Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big

More information

Current and Future Challenges of the Tofu Interconnect for Emerging Applications

Current and Future Challenges of the Tofu Interconnect for Emerging Applications Current and Future Challenges of the Tofu Interconnect for Emerging Applications Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 22, 2017, ExaComm 2017 Workshop

More information

Welcome. Virtual tutorial starts at BST

Welcome. Virtual tutorial starts at BST Welcome Virtual tutorial starts at 15.00 BST Using KNL on ARCHER Adrian Jackson With thanks to: adrianj@epcc.ed.ac.uk @adrianjhpc Harvey Richardson from Cray Slides from Intel Xeon Phi Knights Landing

More information