Impact of Manufacturing Variability in Power Constrained Supercomputing. Koji Inoue. Kyushu University
|
|
- Cameron Phelps
- 5 years ago
- Views:
Transcription
1 Impact of Manufacturing Variability in Power Constrained Supercomputing Koji Inoue Kyushu University
2 Trends of Supercomputing 1 Exa Flops (10 18 Floating-point Operations Per Second) World-Wide Next Target 1 Exa MW 33.9 Peta MW 30X 1.7X Need to improve power efficiency!
3 Overprovisioned Systems Under-provisioned (Conventional) HW Design ü Ensures the PEAK system power does NOT exceed the limit SW Design ü Tries to maximize the activity of HW components ) ( ) ( ) ( ) ( : ) Over-provisioned HW Design: ü Allows to install HWs w/o considering the power limit ü Provides power-performance knobs SW Design: ü Tunes the knobs to maximize the performance based on SW workloads ü Ensures the ACTUAL system power does NOT exceed the limit ( ) ) )
4 Experimental Setup HPC Challenge: star DGEMM, star STREAM(Triad) NPB: BT, SP, EP Magneto Hydro-Dynamics(MHD) simulation Typical stencil app. to simulate space plasma Calculations and communications appear in turn Fiber benchmark suite: mvmc-mini (mvmc) Blue=EP type Red=With Comm. & Sync. Variational Monte-Carlo simulation for strongly correlated electron system Cab(LLNL) Intel E Sandy Bridge 1, RAPL BG/Q Vulcan (LLNL) IBM PowerPC A2 24, (compute) EMON Teller (SNL) AMD A K Piledriver PI HA8K(Kyushu Univ.) Intel E5-2697v2 Ivy Bridge RAPL
5 Terminology Module CPU CPU Module core core core core memory module memory module memory module memory module core core cache core core cache memory module memory module memory module memory module MC MC CPU Processor chip (including cores, cache, MC, etc.) Module A pair of a CPU and DRAMs directly connected to it
6 Impact on CPU Frequency star DGEMM 140"" 120"" Module"(CPU+DRAM)"power" 30% w/ a uniform power constraint 120# No#power#constraint 110# CPU$power$cap CPU#power#cap Power""[W] 100"" 80"" 60"" CPU$power$ CPU#Power#[W] 100# 90# 80# 40"" 20"" DRAM%power% 70# 60# 0"" 50# 0" 300" 600" 900" 1200" 1500" 1800" 1.5## 2.5## Module"IDs CPU#clock#frequency#[GHz] Power variation is translated into CPU frequency variation applying UI uniform Kyudai Taro,2007 power constraint!
7 Impact on Application Performance star DGEMM Module#(CPU+DRAM)#Power#[W] 140# 130# 120# 110# 100# 90# 80# 70# 60# 50# w/ a uniform power constraint No#power#constraint Cm=110W Cm=100W Cm=90W Cm=80W Cm=Target#Average#Power# Constraint#for#Module Cm=70W 64% 40# 0.8## ## 1.6## 2.4## 2.8## Normalized#ExecuIon#Time 3.2##
8 Problem and Goal Power-Constraint Supercomputing will be applied to future HPC systems Manufacturing Variability leads to performance variation under power constraint Our Goal Mitigate the impact of manufacturing variability on performance of HPC apps. under power constraint!
9 Concept Variation-Aware Power Budgeting W/O power-constraint Power W/ power-constraint (Conventional) W/ power-constraint (Proposed) Power variation Mitigate Variability Same total power budget Performance (=CPU Frequency)
10 Variation-Aware Power Budgeting Strategy Inputs Output HPC Applica+on Source Code Analysis to Insert PMMDs HPC Applica+on with PMMDs Test Runs on a Single Module App. Input Data App. Specific Power Profile Module-level Power Alloca+ons Varia+on-Aware Power Budge+ng Algorithm Power-performance model for all modules Final App. Runs Module Alloca+on (Scheduler) App. level Power Constraint predicted measured
11 Power Model Calibration Application-independent Power Variation Table (PVT) Estimated Application Specific Power Consumption Module 1 Module ID Normalized Power Module ID Power Consumption Module 2 k 1.2 k 120W Module 3 N 0.8 N Obtained once at system installation Test run on a module! Measured power on single module k Power Module ID Consumption k 120W Module N
12 Power Model Calibration Application-independent Power Variation Table (PVT) Estimated Application Specific Power Consumption Module 1 Module ID Normalized Power Module ID Power Consumption Module 2 k 1.2 k 120W Module 3 N 0.8 N (120W/1.2) x 0.8 application dependent average power Measured power on single module k Power Module ID Consumption k 120W Module N
13 Power Model Calibration Application-independent Power Variation Table (PVT) Estimated Application Specific Power Consumption Module 1 Module ID Normalized Power Module ID Power Consumption Module 2 k 1.2 k 120W Module 3 N 0.8 N 80W (120W/1.2) x 0.8 application dependent power on module-n Measured power on single module k Power Module ID Consumption k 120 Module N
14 Options for Power Setting Two options for power settings Power Capping (Pc) using RAPL Frequency Selection (Fs) using CPUFreqlibs Power Capping (Pc) Frequency Selection (Fs) Power Constraint Guaranteed Not guaranteed Performance Equivalence Not guaranteed Guaranteed
15 Tested Power Budgeting Methods Method Name Application Specific Variation Aware Power-Performance Model Pwr. Set. Naive No No Power Cap Pc Yes No Calibration Power Cap VaPc Yes Yes Calibration Power Cap VaFs Yes Yes Calibration Freq. Sel. VaPcOr Yes Yes Oracle Power Cap VaFsOr Yes Yes Oracle Freq. Sel. Va=Variation-Aware, Pc=Power Capping, Fs=Frequency Selection Or=Observed power data are used
16 Speedup Ratios Normalized to Naïve (star DGEMM on 1,920 modules) 1.5## 0.5## 1.5## 6.0## 5.0## 5.0## 140# 4.0## 140# 4.0## 3.0## Cs=ApplicaIon#level#power#constraint 3.0## 130# 130# No#power#constraint 120# 120# Cs=211.2KW 110# 110# Module#(CPU+DRAM)#Power#[W] 100# 90# 80# 70# 60# 50# Cs=192.0kW Cs=172.8kW Cs=153.6kW 64% 0.5## before Cs=134.4kW 40# 0.8## ## 1.6## 2.4## 2.8## 3.2## Normalized#ExecuIon#Time Module#(CPU+DRAM)#power#[W] 100# 90# 80# 70# 60# 50# 3.0## 2.5## 1.5## 0.5## 3.5## 3.0## 2.5## 1.5## 0.5## No#power#constraint Cs=211.2KW Cs=192.0kW# Cs=172.8kW Cs=ApplicaIon#level#power#constraint Cs=153.6kW Cs=134.4kW 12% after 40# 0.8## ## 1.6## 2.4## 2.8## 3.2## Normalized#ExecuIon#Time
17 Speedup Ratios Normalized to Naïve (All results on 1,920 modules) 1.5## 0.5## 6.0## 5.0## 4.0## 3.0## 1.5## 0.5## 5.0## 4.0## 3.0## 3.0## 2.5## 1.5## 0.5## 3.5## 3.0## 2.5## 1.5## 0.5## 5.4X speedup at maximum (NPB-BT) 1.8X speedup in average
18 Conclusions Power constrained computing becomes main-stream! Manufacturing variability causes serious performance issue! Optimize power resource allocation!
19 Acknowledgements This research was supported by JST CREST. Special thanks to Dr. Yuichi Inadomi, Prof. Masaaki Kondo, Dr. Tapasya Patki, Dr. Martin Schulz, and other all members of this project.
Analyzing and Mitigating the Impact of Manufacturing Variability in Power-Constrained Supercomputing
Analyzing and Mitigating the Impact of Manufacturing Variability in Power-Constrained Supercomputing Yuichi Inadomi 1, Tapasya Patki 2, Koji Inoue 1, Mutsumi Aoyagi 1, Barry Rountree 3, Martin Schulz 3,
More informationExploring Hardware Overprovisioning in Power-Constrained, High Performance Computing
Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing Tapasya Patki 1 David Lowenthal 1 Barry Rountree 2 Martin Schulz 2 Bronis de Supinski 2 1 The University of Arizona
More informationSystem Software Solutions for Exploiting Power Limited HPC Systems
http://scalability.llnl.gov/ System Software Solutions for Exploiting Power Limited HPC Systems 45th Martin Schulz, LLNL/CASC SPEEDUP Workshop on High-Performance Computing September 2016, Basel, Switzerland
More informationEconomic Viability of Hardware Overprovisioning in Power- Constrained High Performance Compu>ng
Economic Viability of Hardware Overprovisioning in Power- Constrained High Performance Compu>ng Energy Efficient Supercompu1ng, SC 16 November 14, 2016 This work was performed under the auspices of the U.S.
More informationPower Bounds and Large Scale Computing
1 Power Bounds and Large Scale Computing Friday, March 1, 2013 Bronis R. de Supinski 1 Tapasya Patki 2, David K. Lowenthal 2, Barry L. Rountree 1 and Martin Schulz 1 2 University of Arizona This work has
More informationPower Constrained HPC
http://scalability.llnl.gov/ Power Constrained HPC Martin Schulz Center or Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory With many collaborators and Co-PIs, incl.: LLNL: Barry
More informationIMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM
IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information
More informationPOWER- AWARE RESOURCE MANAGER Maximizing Data Center Performance Under Strict Power Budget
POWER- AWARE RESOURCE MANAGER Maximizing Data Center Performance Under Strict Power Budget Osman Sarood, Akhil Langer*, Abhishek Gupta, Laxmikant Kale Parallel Programming Laboratory Department of Computer
More informationMotivation Goal Idea Proposition for users Study
Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan Computer and Information Science University of Oregon 23 November 2015 Overview Motivation:
More informationPerformance and Energy Usage of Workloads on KNL and Haswell Architectures
Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research
More informationMaster Informatics Eng.
Advanced Architectures Master Informatics Eng. 207/8 A.J.Proença The Roofline Performance Model (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 207/8 AJProença, Advanced Architectures,
More informationNo Tradeoff Low Latency + High Efficiency
No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),
More informationHigh Performance Computing. What is it used for and why?
High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial
More informationA High Performance Cluster System Design by Adaptive Power Control
A High Performance Cluster System Design by Adaptive Power Control Masaaki Kondo, Yoshimichi Ikeda, Hiroshi Nakamura Research Center for Advanced Science and Technology, The University of Tokyo 4-6-1 Komaba,
More informationCOL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques Authors: Huazhe Zhang and Henry Hoffmann, Published: ASPLOS '16 Proceedings
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationWhat Could Deskside Supercomputers Do For The Power Grid?
What Could Deskside Supercomputers Do For The Power Grid? Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com Joint work with Tao Cui and Cory Thoma
More informationIdentifying Working Data Set of Particular Loop Iterations for Dynamic Performance Tuning
Identifying Working Data Set of Particular Loop Iterations for Dynamic Performance Tuning Yukinori Sato (JAIST / JST CREST) Hiroko Midorikawa (Seikei Univ. / JST CREST) Toshio Endo (TITECH / JST CREST)
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationSlurm BOF SC13 Bull s Slurm roadmap
Slurm BOF SC13 Bull s Slurm roadmap SC13 Eric Monchalin Head of Extreme Computing R&D 1 Bullx BM values Bullx BM bullx MPI integration ( runtime) Automatic Placement coherency Scalable launching through
More informationMAHA. - Supercomputing System for Bioinformatics
MAHA - Supercomputing System for Bioinformatics - 2013.01.29 Outline 1. MAHA HW 2. MAHA SW 3. MAHA Storage System 2 ETRI HPC R&D Area - Overview Research area Computing HW MAHA System HW - Rpeak : 0.3
More informationOverview. Energy-Efficient and Power-Constrained Techniques for ExascaleComputing. Motivation: Power is becoming a leading design constraint in HPC
Energy-Efficient and Power-Constrained Techniques for ExascaleComputing Stephanie Labasan Computer and Information Science University of Oregon 17 October 2016 Overview Motivation: Power is becoming a
More informationENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION
ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION Vignesh Adhinarayanan Ph.D. (CS) Student Synergy Lab, Virginia Tech INTRODUCTION Supercomputers are constrained by power Power
More informationA Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than
More informationPhilippe Thierry Sr Staff Engineer Intel Corp.
HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market
More informationTrends in systems and how to get efficient performance
Trends in systems and how to get efficient performance Martin Hilgeman HPC Consultant martin.hilgeman@dell.com The landscape is changing We are no longer in the general purpose era the argument of tuning
More informationHigh Performance Computing. What is it used for and why?
High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationTFLOP Performance for ANSYS Mechanical
TFLOP Performance for ANSYS Mechanical Dr. Herbert Güttler Engineering GmbH Holunderweg 8 89182 Bernstadt www.microconsult-engineering.de Engineering H. Güttler 19.06.2013 Seite 1 May 2009, Ansys12, 512
More informationExploring Hardware Overprovisioning in Power-Constrained, High Performance Computing
Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing Tapasya Patki Dept. of Computer Science The University of Arizona tpatki@cs.arizona.edu Martin Schulz Lawrence Livermore
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationPortable Power/Performance Benchmarking and Analysis with WattProf
Portable Power/Performance Benchmarking and Analysis with WattProf Amir Farzad, Boyana Norris University of Oregon Mohammad Rashti RNET Technologies, Inc. Motivation Energy efficiency is becoming increasingly
More informationOverview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization
Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan & Matt Larsen (University of Oregon), Hank Childs (Lawrence Berkeley National Laboratory) 26
More informationBenchmarking CPU Performance. Benchmarking CPU Performance
Cluster Computing Benchmarking CPU Performance Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance,
More informationECE 571 Advanced Microprocessor-Based Design Lecture 6
ECE 571 Advanced Microprocessor-Based Design Lecture 6 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 4 February 2016 HW#3 will be posted HW#1 was graded Announcements 1 First
More informationAn Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels
An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels Masatomo Hashimoto Masaaki Terai Toshiyuki Maeda Kazuo Minami 26/04/2017 ICPE2017 1 Agenda Performance engineering
More informationPower Capping Linux. Len Brown, Jacob Pan, Srinivas Pandruvada
Power Capping Linux Len Brown, Jacob Pan, Srinivas Pandruvada Agenda Context System Power Management Issues Power Capping Overview Power capping participants Recommendation Linux Power Capping Framework
More informationAdvanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED
Advanced Software for the Supercomputer PRIMEHPC FX10 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied
More informationECE 571 Advanced Microprocessor-Based Design Lecture 16
ECE 571 Advanced Microprocessor-Based Design Lecture 16 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 21 March 2013 Project Reminder Topic Selection by Tuesday (March 26) Once
More informationPowerExecutive. Tom Brey IBM Agenda. Why PowerExecutive. Fundamentals of PowerExecutive. - The Data Center Power/Cooling Crisis.
PowerExecutive IBM Agenda Why PowerExecutive - The Data Center Power/Cooling Crisis Fundamentals of PowerExecutive 1 The Data Center Power/Cooling Crisis Customers want more IT processing cycles to run
More informationEnergy-centric DVFS Controlling Method for Multi-core Platforms
Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To
More informationARCS: Adaptive Runtime Configuration Selection for Power-Constrained OpenMP Applications
ARCS: Adaptive Runtime Configuration Selection for Power-Constrained OpenMP Applications Md Abdullah Shahneous Bari, Nicholas Chaimov, Abid M. Malik, Kevin A. Huck, Barbara Chapman, Allen D. Malony and
More informationMulticore Performance and Tools. Part 1: Topology, affinity, clock speed
Multicore Performance and Tools Part 1: Topology, affinity, clock speed Tools for Node-level Performance Engineering Gather Node Information hwloc, likwid-topology, likwid-powermeter Affinity control and
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationExperimental Calibration and Validation of a Speed Scaling Simulator
IEEE MASCOTS 2016 Experimental Calibration and Validation of a Speed Scaling Simulator Arsham Skrenes Carey Williamson Department of Computer Science University of Calgary Speed Scaling: Inherent Tradeoffs
More informationECE 571 Advanced Microprocessor-Based Design Lecture 7
ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2017 Announcements HW#4 will be posted, some readings 1 Measuring
More informationTrends in the Infrastructure of Computing
Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much
More informationManaging Hardware Power Saving Modes for High Performance Computing
Managing Hardware Power Saving Modes for High Performance Computing Second International Green Computing Conference 2011, Orlando Timo Minartz, Michael Knobloch, Thomas Ludwig, Bernd Mohr timo.minartz@informatik.uni-hamburg.de
More informationMaking a Case for a Green500 List
Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges
More informationEARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA
EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility
More informationAdaptive Power Profiling for Many-Core HPC Architectures
Adaptive Power Profiling for Many-Core HPC Architectures Jaimie Kelley, Christopher Stewart The Ohio State University Devesh Tiwari, Saurabh Gupta Oak Ridge National Laboratory State-of-the-Art Schedulers
More informationLoad-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)
Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in
More informationPower Control in Virtualized Data Centers
Power Control in Virtualized Data Centers Jie Liu Microsoft Research liuj@microsoft.com Joint work with Aman Kansal and Suman Nath (MSR) Interns: Arka Bhattacharya, Harold Lim, Sriram Govindan, Alan Raytman
More informationProgress Report on QDP-JIT
Progress Report on QDP-JIT F. T. Winter Thomas Jefferson National Accelerator Facility USQCD Software Meeting 14 April 16-17, 14 at Jefferson Lab F. Winter (Jefferson Lab) QDP-JIT USQCD-Software 14 1 /
More informationDealing with Heterogeneous Multicores
Dealing with Heterogeneous Multicores François Bodin INRIA-UIUC, June 12 th, 2009 Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism
More informationEvaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin
Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin Lawrence Berkeley National Laboratory Energy efficiency at Exascale A design goal for future
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationPerformance and power analysis for high performance computation benchmarks
Cent. Eur. J. Comp. Sci. 3(1) 2013 1-16 DOI: 10.2478/s13537-013-0101-5 Central European Journal of Computer Science Performance and power analysis for high performance computation benchmarks Research Article
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationEnhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension
Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information
More informationPerformance and Power Co-Design of Exascale Systems and Applications
Performance and Power Co-Design of Exascale Systems and Applications Adolfy Hoisie Work with Kevin Barker, Darren Kerbyson, Abhinav Vishnu Performance and Architecture Lab (PAL) Pacific Northwest National
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationData Management and Analysis for Energy Efficient HPC Centers
Data Management and Analysis for Energy Efficient HPC Centers Presented by Ghaleb Abdulla, Anna Maria Bailey and John Weaver; Copyright 2015 OSIsoft, LLC Data management and analysis for energy efficient
More informationWhat can/should we measure with benchmarks?
What can/should we measure with benchmarks? Jun Makino Department of Planetology, Kobe University FS2020 Project, RIKEN-CCS SC18 BoF 107 Pros and Cons of HPCx benchmarks Nov 13 Overview Last 40 years of
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationPerformance, Power, Die Yield. CS301 Prof Szajda
Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the
More informationTransparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh
Transparent Offloading and Mapping () Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar,
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationDCBench: a Data Center Benchmark Suite
DCBench: a Data Center Benchmark Suite Zhen Jia ( 贾禛 ) http://prof.ict.ac.cn/zhenjia/ Institute of Computing Technology, Chinese Academy of Sciences workshop in conjunction with CCF October 31,2013,Guilin
More informationPotentials and Limitations for Energy Efficiency Auto-Tuning
Center for Information Services and High Performance Computing (ZIH) Potentials and Limitations for Energy Efficiency Auto-Tuning Parco Symposium Application Autotuning for HPC (Architectures) Robert Schöne
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationRow Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationExploring Use-cases for Non-Volatile Memories in support of HPC Resilience
Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1, Saurabh Hukerikar 2, Frank Mueller 1, Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University
More informationCluster Computing Paul A. Farrell 9/15/2011. Dept of Computer Science Kent State University 1. Benchmarking CPU Performance
Many benchmarks available MHz (cycle speed of processor) MIPS (million instructions per second) Peak FLOPS Whetstone Stresses unoptimized scalar performance, since it is designed to defeat any effort to
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationParallel and Distributed Programming Introduction. Kenjiro Taura
Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel Programming? 2 What Parallel Machines Look Like, and Where Performance Come From? 3 How to Program Parallel
More informationInvestigating power capping toward energy-efficient scientific applications
Received: 2 October 217 Revised: 15 December 217 Accepted: 19 February 218 DOI: 1.12/cpe.4485 SPECIAL ISSUE PAPER Investigating power capping toward energy-efficient scientific applications Azzam Haidar
More informationHPC Cineca Infrastructure: State of the art and towards the exascale
HPC Cineca Infrastructure: State of the art and towards the exascale HPC Methods for CFD and Astrophysics 13 Nov. 2017, Casalecchio di Reno, Bologna Ivan Spisso, i.spisso@cineca.it Contents CINECA in a
More informationCSE5351: Parallel Processing Part III
CSE5351: Parallel Processing Part III -1- Performance Metrics and Benchmarks How should one characterize the performance of applications and systems? What are user s requirements in performance and cost?
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationIntel profiling tools and roofline model. Dr. Luigi Iapichino
Intel profiling tools and roofline model Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimization (and to the next hour) We will focus on tools developed
More informationBandwidth Avoiding Stencil Computations
Bandwidth Avoiding Stencil Computations By Kaushik Datta, Sam Williams, Kathy Yelick, and Jim Demmel, and others Berkeley Benchmarking and Optimization Group UC Berkeley March 13, 2008 http://bebop.cs.berkeley.edu
More informationData Centre Energy & Cost Efficiency Simulation Software. Zahl Limbuwala
Data Centre Energy & Cost Efficiency Simulation Software Zahl Limbuwala BCS Data Centre Simulator Overview of Tools Structure of the BCS Simulator Input Data Sample Output Development Path Overview of
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationPAGE PLACEMENT STRATEGIES FOR GPUS WITHIN HETEROGENEOUS MEMORY SYSTEMS
PAGE PLACEMENT STRATEGIES FOR GPUS WITHIN HETEROGENEOUS MEMORY SYSTEMS Neha Agarwal* David Nellans Mark Stephenson Mike O Connor Stephen W. Keckler NVIDIA University of Michigan* ASPLOS 2015 EVOLVING GPU
More informationAutomatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014
Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential
More informationPower-Capped DVFS and Thread Allocation with ANN Models on Modern NUMA Systems
Power-Capped DVFS and Thread Allocation with ANN Models on Modern NUMA Systems Satoshi Imamura Hiroshi Sasaki Koji Inoue Dimitrios S Nikolopoulos Graduate School and Faculty of Information Science and
More informationPerformance Balancing: Software-based On-chip Memory Management for Effective CMP Executions
Performance Balancing: Software-based On-chip Memory Management for Effective CMP Executions Naoto Fukumoto, Kenichi Imazato, Koji Inoue, Kazuaki Murakami Department of Advanced Information Technology,
More informationA POWER CHARACTERIZATION AND MANAGEMENT OF GPU GRAPH TRAVERSAL
A POWER CHARACTERIZATION AND MANAGEMENT OF GPU GRAPH TRAVERSAL ADAM MCLAUGHLIN *, INDRANI PAUL, JOSEPH GREATHOUSE, SRILATHA MANNE, AND SUDHKAHAR YALAMANCHILI * * GEORGIA INSTITUTE OF TECHNOLOGY AMD RESEARCH
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationImplicit and Explicit Optimizations for Stencil Computations
Implicit and Explicit Optimizations for Stencil Computations By Shoaib Kamil 1,2, Kaushik Datta 1, Samuel Williams 1,2, Leonid Oliker 2, John Shalf 2 and Katherine A. Yelick 1,2 1 BeBOP Project, U.C. Berkeley
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationKNL tools. Dr. Fabio Baruffa
KNL tools Dr. Fabio Baruffa fabio.baruffa@lrz.de 2 Which tool do I use? A roadmap to optimization We will focus on tools developed by Intel, available to users of the LRZ systems. Again, we will skip the
More informationIBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation
IBM HPC DIRECTIONS Dr Don Grice ECMWF Workshop November, 2008 IBM HPC Directions Agenda What Technology Trends Mean to Applications Critical Issues for getting beyond a PF Overview of the Roadrunner Project
More informationTechnologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017
Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of
More information