Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Size: px
Start display at page:

Download "Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems"

Transcription

1 Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University Feb 15, 2012

2 Energy Efficiency and Temperature Temperature-induced challenges Cooling Cost Leakage Performance Reliability Thermal challenges accelerate in high-performance systems! Energy problem High cost: a 10MW datacenter spends millions of dollars per year for operational and cooling costs Adverse effects on the environment 2

3 % Time Spent at Various Temperature Ranges Is Energy Management Sufficient? Energy or performance-aware methods are not always effective for managing temperature Dynamic techniques specifically addressing temperature-induced problems Efficient framework for evaluating dynamic techniques 3

4 Outline Modeling Integrated simulation of performance, power, temperature and reliability Analysis Importance of modeling thermal variations Effect of thread migration policies Novel policies 2X increase in processor lifetime with a performance cost of less than 4% Proactive management: Learning workload characteristics for better runtime adaptation 4

5 Modeling Framework Performance Simulator Power Modeling Instruction-Level Thermal Modeling Phase Profile (SimPoint) Phase-Based Performance & Power Modeling (M5 / Wattch) Database Performance / Power Query Tool Scheduling Manager Thermal Modeling (HotSpot) Reliability Computation Offline Runtime [Sigmetrics 09] 5

6 Long-Term Performance Modeling SimPoint: [Sherwood, ASPLOS 02] Captures representative phases Complete phase profile of each application Similar to Co-Phase Matrix for multi-threaded simulation [Biesbrouck, ISPASS 04] All available voltage/frequency settings Stored in the database 6

7 Power (Watts) Phase Modeling bzip Time (ms) M5/Wattch Phase-Based Complete phase profile: every 100 M instructions Profile is recorded in database: Phase-ID trace Power & performance values Queried by scheduler during simulation 7

8 Power Modeling and Management ALU operations Cache accesses Branch predictions M5 [Binkert, CAECW 03] Wattch [Brooks, ISCA 00] Dynamic Power Component area Temperature Voltage setting Leakage Model [Su, ISLPED 03] Leakage Power POWER TRACE L2 caches CACTI [Tarjan, HP Labs] Dynamic & Leakage Dynamic Power Management Fixed timeout Put a core into sleep mode after it has been idle for t timeout 8

9 Thread Management Performance and / or Temperature Info Scheduling Manager DVFS DPM Migration Clock-Gating Job Scheduling Parameter Sampling Interval Wake-up Delay Model: V/f change Core sleep/wake-up Migration Application Startup Value 50ms 25ms syscall + cold start syscall: Measured in Linux-M5 (<3us) Cold start: Average delay: 204us (range: 2 to 740us) Distinct penalty for each benchmark DVFS Migration syscall + 20 us syscall +cold start 9

10 Thermal Modeling Scheduling Manager POWER TRACE Thermal Model HotSpot [Skadron, ISCA 03] Database Die and Package Properties (65nm) bzip 10

11 Reliability Modeling Thermal hot spots [Failure Mechanisms for Semiconductor Devices, JEDEC] Electromigration Time dependent dielectric breakdown: λ e kt E a λ: Failure rate; T: temperature E a : Activation energy, k: Boltzman s constant C increase in temperature causes ~2X increase in failure rate Thermal cycling [JEDEC] Fatigue failures: T q f T: Magnitude of variation f: Frequency of cycles 10 o C increase in ΔT Failures happen 16 times more frequently 11

12 Migration and Clock Gating Stop-Go T > T threshold Stop Clock Migration T > T threshold Migrate job to coolest core Balance Highest IPC job Coolest core High Power Balance_Location Highest IPC job Expected coolest location IPC 1 > IPC 2 > > IPC 16 12

13 Voltage/Frequency Scaling DVFS-Threshold T threshold Reduce V/f one step DVFS-Location 100% 95% DVFS-Performance - Memory-bound Low V/f - CPU-bound High V/f µ : CPI-based metric [Dhiman, ISLPED 07] Low µ: 85% Medium µ: 95% High µ: 100% 5-6% worst-case performance cost 85% 13

14 Systems with Full Utilization MTTF Performance Energy balance_loc & dvfs_t dvfs_t balan_loc & dvfs_perf_t dvfs_perf_t balance_loc & loc_dvfs location_dvfs 14

15 balance balance_loc balance_loc & dvfs_t balance_loc &dvfs_perf_t balance_loc & loc_dvfs dvfs_perf_t dvfs_perf dvfs_t migration location _dvfs stopgo System 87.5% utilized Partial Utilization MTTF Performance Energy 15

16 Temperature (C) Temperature (C) Temporal Thermal Profiles Migration core5 core Time (s) Balance_Location & Location_DVFS Low & stable profile for all the cores Time (s) 16

17 Breakdown of Failures Dynamic power management Sleep state Accelerated thermal cycling 17

18 Guidelines for Runtime Management Modeling thermal cycling is critical, especially for partially utilized systems. Policies that minimize # of migrations help with both performance and reliability. Thermal asymmetries should be considered for effective thermal management. Proactive techniques can raise the performance of the entire system. 18

19 Temperature (C) Temperature (C) Reactive vs. Proactive Management Reactive Proactive Forecast e.g., DVFS, fetch-gating, workload migration, Time Reduce and balance temperature Adjust workload, V/f setting, etc. T after proactive management Time 19

20 Proactive Management Flow Temperature Data from Thermal Sensors Predictor (ARMA) Periodic ARMA Model Validation & Model Update Temperature at time (t current + t n ) for all cores SCHEDULER Temperature-Aware Allocation on Cores [Transactions on CAD 09] 20

21 Temperature Prediction 21

22 What else can we predict? bzip How about parallel workloads?

23 System Model Dispatching Queues Allocation Policy Dynamic Load Balancing (DLB): Threads Core-1 Core-2 Core-3... Recently run thread: Allocate to the core it ran previously on Otherwise Allocate to the core that has the lowest priority thread Significant imbalance at runtime Balance 23

24 Proactive Temperature Balancing Uses principle of locality as in default load balancing policy at initial assignment Utilizes ARMA predictor & thermal forecast: A core is projected to have a hot spot OR ΔT spatial is projected to be large Move waiting threads first to balance temperature Migrate threads as a last resort Threads waiting running Core-1 Core-2 24

25 Experimental Setup Workload and Power Workload characterization: Measured on Sun s UltraSPARC T1 (Niagara-1) Power values: Average power for each unit Niagara-1: Peak power close to average power Core utilization, cache misses, # instructions, etc. Figure: Leon et al., ISSCC 06 Simulation Framework: Scheduler, power manager, thermal simulator 25

26 Simulation Framework Inputs: Workload information Floorplan, package Temperature (for dynamic policies) Scheduler: a. Simulator b. OS Scheduler Inputs: Workload information Activity of cores Power Manager DPM, DVFS Inputs: Power trace for each unit Floorplan, package and die properties Thermal Simulator HotSpot [Skadron, ISCA 03] Transient Temperature Response for Each Unit 26

27 % Hot Spots > 85 C Performance Hot Spots and Performance Load Balancing Reactive Migration Reactive DVFS Proactive DVFS Proactive Balancing Web-med Web-high Web& Database Mplayer& Web AVG Avg Perf (Right Axis) (a) Simulator 27

28 % Hot Spots > 85 C Hot Spots Proactive Balancing (PTB) reduces hot spots by 60% in average w.r.t. Reactive Migration DLB R-Mig PTB 5 0 Web-med Database Web&DB Mplayer AVG across all 8 benchmarks (b) Implementation in Solaris Scheduler 28

29 % of gradients >15C Thermal Gradients Proactive Balancing bounds gradients to <3% DLB R-Mig PTB No PM DPM Spatially balanced temperature improves: Cooling efficiency Reliability Performance (b) Implementation in Solaris Scheduler 29

30 % of cycles >20C Thermal Cycles Frequency of cycles reduced to below 5% for the worst case AVG MAX (Web-med) DLB R-Mig PTB Benefits of reducing cycling: Chip-level Higher reliability Datacenter level Higher cooling efficiency Fan speed or liquid flow rate does not need to vary frequently (b) Implementation in Solaris Scheduler 30

31 Performance Performance Proactive Balancing achieves significant reduction in performance cost in comparison to migration R-Mig PTB 0.9 Web-med Database Web&DB Mplayer *Performance relative to Dynamic Load Balancing. Performance metric is load average. (b) Implementation in Solaris Scheduler 31

32 Summary & On-going Research We need joint analysis & management of power, performance, and temperature for achieving true energy efficiency. Intelligent management provides significant lifetime improvement at minimal performance cost. Proactive strategies learn system and workload dynamics and leverage this information for better decision making. Energy-aware software tuning for high performance computing (HPC) applications Power capping of multicore systems running multithreaded workloads [TEMM 11] [HPEC 11] [ICCAD 11] [MICRO 11]

33 Performance and Energy Aware Computing Laboratory For more information: Funding

Exploring Performance, Power, and Temperature Characteristics of 3D Systems with On-Chip DRAM

Exploring Performance, Power, and Temperature Characteristics of 3D Systems with On-Chip DRAM Exploring Performance, Power, and Temperature Characteristics of 3D Systems with On-Chip DRAM Jie Meng, Daniel Rossell, and Ayse K. Coskun Electrical and Computer Engineering Department, Boston University,

More information

Dynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors

Dynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors Dynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors Jie Meng, Tiansheng Zhang, and Ayse K. Coskun Electrical and Computer Engineering Department, Boston University,

More information

Predictive Thermal Management for Hard Real-Time Tasks

Predictive Thermal Management for Hard Real-Time Tasks Predictive Thermal Management for Hard Real-Time Tasks Albert Mo Kim Cheng and Chen Feng Real-Time System Laboratory, Department of Computer Science University of Houston, Houston, TX 77204, USA {cheng,

More information

Energy efficient mapping of virtual machines

Energy efficient mapping of virtual machines GreenDays@Lille Energy efficient mapping of virtual machines Violaine Villebonnet Thursday 28th November 2013 Supervisor : Georges DA COSTA 2 Current approaches for energy savings in cloud Several actions

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

Thermal Modeling and Active Cooling

Thermal Modeling and Active Cooling Thermal Modeling and Active Cooling for 3D MPSoCs Prof. David Atienza, Embedded Systems Laboratory (ESL), EE Institute, Faculty of Engineering MPSoC 09, 2-7 August 2009 (Savannah, Georgia, USA) Thermal-Reliability

More information

Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design

Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Based on papers by: A.Fedorova, M.Seltzer, C.Small, and D.Nussbaum Pisa November 6, 2006 Multithreaded Chip

More information

Integrated CPU and Cache Power Management in Multiple Clock Domain Processors

Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC

More information

Leakage Mitigation Techniques in Smartphone SoCs

Leakage Mitigation Techniques in Smartphone SoCs Leakage Mitigation Techniques in Smartphone SoCs 1 John Redmond 1 Broadcom International Symposium on Low Power Electronics and Design Smartphone Use Cases Power Device Convergence Diverse Use Cases Camera

More information

POWER MANAGEMENT AND ENERGY EFFICIENCY

POWER MANAGEMENT AND ENERGY EFFICIENCY POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University, W Lafayette, IN

T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University, W Lafayette, IN Resource Area Dilation to Reduce Power Density in Throughput Servers Michael D. Powell 1 Fault Aware Computing Technology Group Intel Massachusetts, Inc. T. N. Vijaykumar School of Electrical and Computer

More information

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 1 School of

More information

Transistors and Wires

Transistors and Wires Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides

More information

Temperature Aware Thread Block Scheduling in GPGPUs

Temperature Aware Thread Block Scheduling in GPGPUs Temperature Aware Thread Block Scheduling in GPGPUs Rajib Nath University of California, San Diego rknath@ucsd.edu Raid Ayoub Strategic CAD Labs, Intel Corporation raid.ayoub@intel.com Tajana Simunic Rosing

More information

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Phase-Based Application-Driven Power Management on the Single-chip Cloud Computer

Phase-Based Application-Driven Power Management on the Single-chip Cloud Computer Phase-Based Application-Driven Power Management on the Single-chip Cloud Computer Nikolas Ioannou, Michael Kauschke, Matthias Gries, and Marcelo Cintra University of Edinburgh Intel Labs Braunschweig Introduction

More information

Reducing the Energy Cost of Computing through Efficient Co-Scheduling of Parallel Workloads

Reducing the Energy Cost of Computing through Efficient Co-Scheduling of Parallel Workloads Reducing the Energy Cost of Computing through Efficient Co-Scheduling of Parallel Workloads Can Hankendi Ayse K. Coskun Electrical and Computer Engineering Department, Boston University, Boston, MA, 2215

More information

Coordinating Liquid and Free Air Cooling with Workload Allocation for Data Center Power Minimization

Coordinating Liquid and Free Air Cooling with Workload Allocation for Data Center Power Minimization Coordinating Liquid and Free Air Cooling with Workload Allocation for Data Center Power Minimization Li Li, Wenli Zheng, Xiaodong Wang, and Xiaorui Wang Dept. of Electrical and Computer Engineering The

More information

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors Temperature-Sensitive Loop Parallelization for Chip Multiprocessors Sri Hari Krishna Narayanan, Guilin Chen, Mahmut Kandemir, Yuan Xie Department of CSE, The Pennsylvania State University {snarayan, guilchen,

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,

More information

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley. CS152 Computer Architecture and Engineering Lecture 9 Performance 2004-09-28 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts

More information

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Improving Virtual Machine Scheduling in NUMA Multicore Systems Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore

More information

Survey of Energy-Cognizant Scheduling Techniques

Survey of Energy-Cognizant Scheduling Techniques 1 Survey of Energy-Cognizant Scheduling Techniques Sergey Zhuravlev, Juan Carlos Saez, Sergey Blagodurov, Alexandra Fedorova and Manuel Prieto Abstract Execution time is no longer the only metric by which

More information

How much energy can you save with a multicore computer for web applications?

How much energy can you save with a multicore computer for web applications? How much energy can you save with a multicore computer for web applications? Peter Strazdins Computer Systems Group, Department of Computer Science, The Australian National University seminar at Green

More information

Announcements. Program #1. Program #0. Reading. Is due at 9:00 AM on Thursday. Re-grade requests are due by Monday at 11:59:59 PM.

Announcements. Program #1. Program #0. Reading. Is due at 9:00 AM on Thursday. Re-grade requests are due by Monday at 11:59:59 PM. Program #1 Announcements Is due at 9:00 AM on Thursday Program #0 Re-grade requests are due by Monday at 11:59:59 PM Reading Chapter 6 1 CPU Scheduling Manage CPU to achieve several objectives: maximize

More information

Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials

Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials Matt Skach1, Manish Arora2,3, Dean Tullsen3, Lingjia Tang1, Jason Mars1 University of Michigan1

More information

Energy consumption in embedded systems; abstractions for software models, programming languages and verification methods

Energy consumption in embedded systems; abstractions for software models, programming languages and verification methods Energy consumption in embedded systems; abstractions for software models, programming languages and verification methods Florence Maraninchi orcid.org/0000-0003-0783-9178 thanks to M. Moy, L. Mounier,

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS Network Swapping Emanuele Lattanzi, Andrea Acquaviva and Alessandro Bogliolo STI University of Urbino, ITALY Outline Motivations HW and SW support for swapping under Linux OS Local devices (CF, µhd) Network

More information

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth

More information

Performance Measurement (as seen by the customer)

Performance Measurement (as seen by the customer) CS5 Computer Architecture and Engineering Last Time: Microcode, Multi-Cycle Lecture 9 Performance 004-09-8 Inputs sequencer control datapath control microinstruction (µ) µ-code ROM Dave Patterson (www.cs.berkeley.edu/~patterson)

More information

CPU Scheduling: Objectives

CPU Scheduling: Objectives CPU Scheduling: Objectives CPU scheduling, the basis for multiprogrammed operating systems CPU-scheduling algorithms Evaluation criteria for selecting a CPU-scheduling algorithm for a particular system

More information

Hot vs Cold Energy Efficient Data Centers. - SVLG Data Center Center Efficiency Summit

Hot vs Cold Energy Efficient Data Centers. - SVLG Data Center Center Efficiency Summit Hot vs Cold Energy Efficient Data Centers - SVLG Data Center Center Efficiency Summit KC Mares November 2014 The Great Debate about Hardware Inlet Temperature Feb 2003: RMI report on high-performance data

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

3D MPSoCs with Active Cooling

3D MPSoCs with Active Cooling System-Level Thermal Management of 3D MPSoCs with Active Cooling Prof. David Atienza, Embedded Systems Laboratory (ESL), Ecole Polytechnique Fédérale de Lausanne (EPFL) MPSoC 11, July 4 th 8 th 2011 (Beaune,

More information

Reconfigurable Multicore Server Processors for Low Power Operation

Reconfigurable Multicore Server Processors for Low Power Operation Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

Performance and Power Analysis of RCCE Message Passing on the Intel Single-Chip Cloud Computer

Performance and Power Analysis of RCCE Message Passing on the Intel Single-Chip Cloud Computer Performance and Power Analysis of RCCE Message Passing on the Intel Single-Chip Cloud Computer John-Nicholas Furst Ayse K. Coskun Electrical and Computer Engineering Department, Boston University, Boston,

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Power Modeling and Thermal Management Techniques for Manycores

Power Modeling and Thermal Management Techniques for Manycores Power Modeling and Thermal Management Techniques for Manycores Rajib Nath Computer Science and Engineering University of California, San Diego Douglas Carmean Extreme Technology Lab Intel Lab, Oregon Tajana

More information

MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES

MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES A Dissertation Presented to The Academic Faculty By William J. Song In Partial Fulfillment Of the Requirements

More information

Temperature measurement in the Intel CoreTM Duo Processor

Temperature measurement in the Intel CoreTM Duo Processor Temperature measurement in the Intel CoreTM Duo Processor E. Rotem, J. Hermerding, A. Cohen, H. Cain To cite this version: E. Rotem, J. Hermerding, A. Cohen, H. Cain. Temperature measurement in the Intel

More information

Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core

Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core 3 rd HiPEAC Workshop IBM, Haifa 17-4-2007 Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core, P. Michaud, L. He, D. Fetis, C. Ioannou, P. Charalambous and A. Seznec

More information

Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors

Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors Xiaorui Wang, Kai Ma, and Yefu Wang Department of Electrical Engineering and Computer Science University of Tennessee,

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance

More information

Dynamic Cache Pooling in 3D Multicore Processors

Dynamic Cache Pooling in 3D Multicore Processors Dynamic Cache Pooling in 3D Multicore Processors TIANSHENG ZHANG, JIE MENG, and AYSE K. COSKUN, BostonUniversity Resource pooling, where multiple architectural components are shared among cores, is a promising

More information

Efficient Program Power Behavior Characterization

Efficient Program Power Behavior Characterization Efficient Program Power Behavior Characterization Chunling Hu Daniel A. Jiménez Ulrich Kremer Department of Computer Science {chunling, djimenez, uli}@cs.rutgers.edu Rutgers University, Piscataway, NJ

More information

Power Control in Virtualized Data Centers

Power Control in Virtualized Data Centers Power Control in Virtualized Data Centers Jie Liu Microsoft Research liuj@microsoft.com Joint work with Aman Kansal and Suman Nath (MSR) Interns: Arka Bhattacharya, Harold Lim, Sriram Govindan, Alan Raytman

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 5

ECE 571 Advanced Microprocessor-Based Design Lecture 5 ECE 571 Advanced Microprocessor-Based Design Lecture 5 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 6 February 2018 Announcements HW#1 graded HW#2 due Thursday 1 HW#1 Review

More information

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University

More information

How Can EDA Help Solve Challenges in Data Center Energy Efficiency?

How Can EDA Help Solve Challenges in Data Center Energy Efficiency? How Can EDA Help Solve Challenges in Data Center Energy Efficiency? Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun http://www.bu.edu/peaclab/

More information

Outline. Emerging Trends. An Integrated Hardware/Software Approach to On-Line Power- Performance Optimization. Conventional Processor Design

Outline. Emerging Trends. An Integrated Hardware/Software Approach to On-Line Power- Performance Optimization. Conventional Processor Design Outline An Integrated Hardware/Software Approach to On-Line Power- Performance Optimization Sandhya Dwarkadas University of Rochester Framework: Dynamically Tunable Clustered Multithreaded Architecture

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Energy Aware Scheduling in Cloud Datacenter

Energy Aware Scheduling in Cloud Datacenter Energy Aware Scheduling in Cloud Datacenter Jemal H. Abawajy, PhD, DSc., SMIEEE Director, Distributed Computing and Security Research Deakin University, Australia Introduction Cloud computing is the delivery

More information

A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT. Felipe Cerqueira and Björn Brandenburg

A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT. Felipe Cerqueira and Björn Brandenburg A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT Felipe Cerqueira and Björn Brandenburg July 9th, 2013 1 Linux as a Real-Time OS 2 Linux as a Real-Time OS Optimizing system responsiveness

More information

Analyzing Performance Asymmetric Multicore Processors for Latency Sensitive Datacenter Applications

Analyzing Performance Asymmetric Multicore Processors for Latency Sensitive Datacenter Applications Analyzing erformance Asymmetric Multicore rocessors for Latency Sensitive Datacenter Applications Vishal Gupta Georgia Institute of Technology vishal@cc.gatech.edu Ripal Nathuji Microsoft Research ripaln@microsoft.com

More information

COL862 - Low Power Computing

COL862 - Low Power Computing COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted

More information

Thermal-aware scratchpad memory design and allocation.

Thermal-aware scratchpad memory design and allocation. Thermal-aware scratchpad memory design and allocation. Citation for published version (APA): Damavandpeyma, M., Stuijk, S., Basten, T., Geilen, M. C. W., & Corporaal, H. (2010). Thermal-aware scratchpad

More information

Power and Thermal Models. for RAMP2

Power and Thermal Models. for RAMP2 Power and Thermal Models for 2 Jose Renau Department of Computer Engineering, University of California Santa Cruz http://masc.cse.ucsc.edu Motivation Performance not the only first order design parameter

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero

Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT) 11-15

More information

Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand

Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand A. Ansari, S. Feng, S. Gupta, J. Torrellas, S. Mahlke HPCA - 2013 University of Illinois University of Michigan June 28, 2013.

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2019 Lecture 8 Scheduling Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ POSIX: Portable Operating

More information

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,

More information

ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial

ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial Roxana Rusitoru Systems Research Engineer, ARM 1 Motivation & background Goal: Why: Who: 2 HPC-oriented

More information

Power and Energy Management

Power and Energy Management Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,

More information

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018 ECE 172 Digital Systems Chapter 15 Turbo Boost Technology Herbert G. Mayer, PSU Status 8/13/2018 1 Syllabus l Introduction l Speedup Parameters l Definitions l Turbo Boost l Turbo Boost, Actual Performance

More information

Response Time and Throughput

Response Time and Throughput Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing

More information

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design

More information

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Reliable Architectures

Reliable Architectures 6.823, L24-1 Reliable Architectures Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 6.823, L24-2 Strike Changes State of a Single Bit 10 6.823, L24-3 Impact

More information

Lecture 18: Multithreading and Multicores

Lecture 18: Multithreading and Multicores S 09 L18-1 18-447 Lecture 18: Multithreading and Multicores James C. Hoe Dept of ECE, CMU April 1, 2009 Announcements: Handouts: Handout #13 Project 4 (On Blackboard) Design Challenges of Technology Scaling,

More information

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed) Announcements Program #1 Due 2/15 at 5:00 pm Reading Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed) 1 Scheduling criteria Per processor, or system oriented CPU utilization

More information

8. CONCLUSION AND FUTURE WORK. To address the formulated research issues, this thesis has achieved each of the objectives delineated in Chapter 1.

8. CONCLUSION AND FUTURE WORK. To address the formulated research issues, this thesis has achieved each of the objectives delineated in Chapter 1. 134 8. CONCLUSION AND FUTURE WORK 8.1 CONCLUSION Virtualization and internet availability has increased virtualized server cluster or cloud computing environment deployments. With technological advances,

More information

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Kashif Ali MoKhtar Aboelaze SupraKash Datta Department of Computer Science and Engineering York University Toronto ON CANADA Abstract

More information

Minimizing Thermal Variation in Heterogeneous HPC System with FPGA Nodes

Minimizing Thermal Variation in Heterogeneous HPC System with FPGA Nodes Minimizing Thermal Variation in Heterogeneous HPC System with FPGA Nodes Yingyi Luo, Xiaoyang Wang, Seda Ogrenci-Memik, Gokhan Memik, Kazutomo Yoshii, Pete Beckman @ICCD 2018 Motivation FPGAs in data centers

More information

Load Balancing. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Load Balancing. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab. Load Balancing Minsoo Ryu Department of Computer Science and Engineering 2 1 Concepts of Load Balancing Page X 2 Load Balancing Algorithms Page X 3 Overhead of Load Balancing Page X 4 Load Balancing in

More information

An Approach for Adaptive DRAM Temperature and Power Management

An Approach for Adaptive DRAM Temperature and Power Management IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance

More information

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto. Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Performance 2005-2-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Tips

More information

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

Towards Energy-Proportional Datacenter Memory with Mobile DRAM Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques Authors: Huazhe Zhang and Henry Hoffmann, Published: ASPLOS '16 Proceedings

More information

CS3350B Computer Architecture CPU Performance and Profiling

CS3350B Computer Architecture CPU Performance and Profiling CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT

TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder {nosayba, ioan, gamvrosi, hwang, bianca}@cs.toronto.edu

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.

More information

HeatWatch Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

HeatWatch Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu HeatWatch Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu Storage Technology Drivers

More information

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information