Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
|
|
- Baldwin Gilbert
- 5 years ago
- Views:
Transcription
1 Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University Feb 15, 2012
2 Energy Efficiency and Temperature Temperature-induced challenges Cooling Cost Leakage Performance Reliability Thermal challenges accelerate in high-performance systems! Energy problem High cost: a 10MW datacenter spends millions of dollars per year for operational and cooling costs Adverse effects on the environment 2
3 % Time Spent at Various Temperature Ranges Is Energy Management Sufficient? Energy or performance-aware methods are not always effective for managing temperature Dynamic techniques specifically addressing temperature-induced problems Efficient framework for evaluating dynamic techniques 3
4 Outline Modeling Integrated simulation of performance, power, temperature and reliability Analysis Importance of modeling thermal variations Effect of thread migration policies Novel policies 2X increase in processor lifetime with a performance cost of less than 4% Proactive management: Learning workload characteristics for better runtime adaptation 4
5 Modeling Framework Performance Simulator Power Modeling Instruction-Level Thermal Modeling Phase Profile (SimPoint) Phase-Based Performance & Power Modeling (M5 / Wattch) Database Performance / Power Query Tool Scheduling Manager Thermal Modeling (HotSpot) Reliability Computation Offline Runtime [Sigmetrics 09] 5
6 Long-Term Performance Modeling SimPoint: [Sherwood, ASPLOS 02] Captures representative phases Complete phase profile of each application Similar to Co-Phase Matrix for multi-threaded simulation [Biesbrouck, ISPASS 04] All available voltage/frequency settings Stored in the database 6
7 Power (Watts) Phase Modeling bzip Time (ms) M5/Wattch Phase-Based Complete phase profile: every 100 M instructions Profile is recorded in database: Phase-ID trace Power & performance values Queried by scheduler during simulation 7
8 Power Modeling and Management ALU operations Cache accesses Branch predictions M5 [Binkert, CAECW 03] Wattch [Brooks, ISCA 00] Dynamic Power Component area Temperature Voltage setting Leakage Model [Su, ISLPED 03] Leakage Power POWER TRACE L2 caches CACTI [Tarjan, HP Labs] Dynamic & Leakage Dynamic Power Management Fixed timeout Put a core into sleep mode after it has been idle for t timeout 8
9 Thread Management Performance and / or Temperature Info Scheduling Manager DVFS DPM Migration Clock-Gating Job Scheduling Parameter Sampling Interval Wake-up Delay Model: V/f change Core sleep/wake-up Migration Application Startup Value 50ms 25ms syscall + cold start syscall: Measured in Linux-M5 (<3us) Cold start: Average delay: 204us (range: 2 to 740us) Distinct penalty for each benchmark DVFS Migration syscall + 20 us syscall +cold start 9
10 Thermal Modeling Scheduling Manager POWER TRACE Thermal Model HotSpot [Skadron, ISCA 03] Database Die and Package Properties (65nm) bzip 10
11 Reliability Modeling Thermal hot spots [Failure Mechanisms for Semiconductor Devices, JEDEC] Electromigration Time dependent dielectric breakdown: λ e kt E a λ: Failure rate; T: temperature E a : Activation energy, k: Boltzman s constant C increase in temperature causes ~2X increase in failure rate Thermal cycling [JEDEC] Fatigue failures: T q f T: Magnitude of variation f: Frequency of cycles 10 o C increase in ΔT Failures happen 16 times more frequently 11
12 Migration and Clock Gating Stop-Go T > T threshold Stop Clock Migration T > T threshold Migrate job to coolest core Balance Highest IPC job Coolest core High Power Balance_Location Highest IPC job Expected coolest location IPC 1 > IPC 2 > > IPC 16 12
13 Voltage/Frequency Scaling DVFS-Threshold T threshold Reduce V/f one step DVFS-Location 100% 95% DVFS-Performance - Memory-bound Low V/f - CPU-bound High V/f µ : CPI-based metric [Dhiman, ISLPED 07] Low µ: 85% Medium µ: 95% High µ: 100% 5-6% worst-case performance cost 85% 13
14 Systems with Full Utilization MTTF Performance Energy balance_loc & dvfs_t dvfs_t balan_loc & dvfs_perf_t dvfs_perf_t balance_loc & loc_dvfs location_dvfs 14
15 balance balance_loc balance_loc & dvfs_t balance_loc &dvfs_perf_t balance_loc & loc_dvfs dvfs_perf_t dvfs_perf dvfs_t migration location _dvfs stopgo System 87.5% utilized Partial Utilization MTTF Performance Energy 15
16 Temperature (C) Temperature (C) Temporal Thermal Profiles Migration core5 core Time (s) Balance_Location & Location_DVFS Low & stable profile for all the cores Time (s) 16
17 Breakdown of Failures Dynamic power management Sleep state Accelerated thermal cycling 17
18 Guidelines for Runtime Management Modeling thermal cycling is critical, especially for partially utilized systems. Policies that minimize # of migrations help with both performance and reliability. Thermal asymmetries should be considered for effective thermal management. Proactive techniques can raise the performance of the entire system. 18
19 Temperature (C) Temperature (C) Reactive vs. Proactive Management Reactive Proactive Forecast e.g., DVFS, fetch-gating, workload migration, Time Reduce and balance temperature Adjust workload, V/f setting, etc. T after proactive management Time 19
20 Proactive Management Flow Temperature Data from Thermal Sensors Predictor (ARMA) Periodic ARMA Model Validation & Model Update Temperature at time (t current + t n ) for all cores SCHEDULER Temperature-Aware Allocation on Cores [Transactions on CAD 09] 20
21 Temperature Prediction 21
22 What else can we predict? bzip How about parallel workloads?
23 System Model Dispatching Queues Allocation Policy Dynamic Load Balancing (DLB): Threads Core-1 Core-2 Core-3... Recently run thread: Allocate to the core it ran previously on Otherwise Allocate to the core that has the lowest priority thread Significant imbalance at runtime Balance 23
24 Proactive Temperature Balancing Uses principle of locality as in default load balancing policy at initial assignment Utilizes ARMA predictor & thermal forecast: A core is projected to have a hot spot OR ΔT spatial is projected to be large Move waiting threads first to balance temperature Migrate threads as a last resort Threads waiting running Core-1 Core-2 24
25 Experimental Setup Workload and Power Workload characterization: Measured on Sun s UltraSPARC T1 (Niagara-1) Power values: Average power for each unit Niagara-1: Peak power close to average power Core utilization, cache misses, # instructions, etc. Figure: Leon et al., ISSCC 06 Simulation Framework: Scheduler, power manager, thermal simulator 25
26 Simulation Framework Inputs: Workload information Floorplan, package Temperature (for dynamic policies) Scheduler: a. Simulator b. OS Scheduler Inputs: Workload information Activity of cores Power Manager DPM, DVFS Inputs: Power trace for each unit Floorplan, package and die properties Thermal Simulator HotSpot [Skadron, ISCA 03] Transient Temperature Response for Each Unit 26
27 % Hot Spots > 85 C Performance Hot Spots and Performance Load Balancing Reactive Migration Reactive DVFS Proactive DVFS Proactive Balancing Web-med Web-high Web& Database Mplayer& Web AVG Avg Perf (Right Axis) (a) Simulator 27
28 % Hot Spots > 85 C Hot Spots Proactive Balancing (PTB) reduces hot spots by 60% in average w.r.t. Reactive Migration DLB R-Mig PTB 5 0 Web-med Database Web&DB Mplayer AVG across all 8 benchmarks (b) Implementation in Solaris Scheduler 28
29 % of gradients >15C Thermal Gradients Proactive Balancing bounds gradients to <3% DLB R-Mig PTB No PM DPM Spatially balanced temperature improves: Cooling efficiency Reliability Performance (b) Implementation in Solaris Scheduler 29
30 % of cycles >20C Thermal Cycles Frequency of cycles reduced to below 5% for the worst case AVG MAX (Web-med) DLB R-Mig PTB Benefits of reducing cycling: Chip-level Higher reliability Datacenter level Higher cooling efficiency Fan speed or liquid flow rate does not need to vary frequently (b) Implementation in Solaris Scheduler 30
31 Performance Performance Proactive Balancing achieves significant reduction in performance cost in comparison to migration R-Mig PTB 0.9 Web-med Database Web&DB Mplayer *Performance relative to Dynamic Load Balancing. Performance metric is load average. (b) Implementation in Solaris Scheduler 31
32 Summary & On-going Research We need joint analysis & management of power, performance, and temperature for achieving true energy efficiency. Intelligent management provides significant lifetime improvement at minimal performance cost. Proactive strategies learn system and workload dynamics and leverage this information for better decision making. Energy-aware software tuning for high performance computing (HPC) applications Power capping of multicore systems running multithreaded workloads [TEMM 11] [HPEC 11] [ICCAD 11] [MICRO 11]
33 Performance and Energy Aware Computing Laboratory For more information: Funding
Exploring Performance, Power, and Temperature Characteristics of 3D Systems with On-Chip DRAM
Exploring Performance, Power, and Temperature Characteristics of 3D Systems with On-Chip DRAM Jie Meng, Daniel Rossell, and Ayse K. Coskun Electrical and Computer Engineering Department, Boston University,
More informationDynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors
Dynamic Cache Pooling for Improving Energy Efficiency in 3D Stacked Multicore Processors Jie Meng, Tiansheng Zhang, and Ayse K. Coskun Electrical and Computer Engineering Department, Boston University,
More informationPredictive Thermal Management for Hard Real-Time Tasks
Predictive Thermal Management for Hard Real-Time Tasks Albert Mo Kim Cheng and Chen Feng Real-Time System Laboratory, Department of Computer Science University of Houston, Houston, TX 77204, USA {cheng,
More informationEnergy efficient mapping of virtual machines
GreenDays@Lille Energy efficient mapping of virtual machines Violaine Villebonnet Thursday 28th November 2013 Supervisor : Georges DA COSTA 2 Current approaches for energy savings in cloud Several actions
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationThermal Modeling and Active Cooling
Thermal Modeling and Active Cooling for 3D MPSoCs Prof. David Atienza, Embedded Systems Laboratory (ESL), EE Institute, Faculty of Engineering MPSoC 09, 2-7 August 2009 (Savannah, Georgia, USA) Thermal-Reliability
More informationPerformance of Multithreaded Chip Multiprocessors and Implications for Operating System Design
Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Based on papers by: A.Fedorova, M.Seltzer, C.Small, and D.Nussbaum Pisa November 6, 2006 Multithreaded Chip
More informationIntegrated CPU and Cache Power Management in Multiple Clock Domain Processors
Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC
More informationLeakage Mitigation Techniques in Smartphone SoCs
Leakage Mitigation Techniques in Smartphone SoCs 1 John Redmond 1 Broadcom International Symposium on Low Power Electronics and Design Smartphone Use Cases Power Device Convergence Diverse Use Cases Camera
More informationPOWER MANAGEMENT AND ENERGY EFFICIENCY
POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption
More informationA Simple Model for Estimating Power Consumption of a Multicore Server System
, pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of
More informationT. N. Vijaykumar School of Electrical and Computer Engineering Purdue University, W Lafayette, IN
Resource Area Dilation to Reduce Power Density in Throughput Servers Michael D. Powell 1 Fault Aware Computing Technology Group Intel Massachusetts, Inc. T. N. Vijaykumar School of Electrical and Computer
More informationA task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b
5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 1 School of
More informationTransistors and Wires
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides
More informationTemperature Aware Thread Block Scheduling in GPGPUs
Temperature Aware Thread Block Scheduling in GPGPUs Rajib Nath University of California, San Diego rknath@ucsd.edu Raid Ayoub Strategic CAD Labs, Intel Corporation raid.ayoub@intel.com Tajana Simunic Rosing
More informationA2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications
A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationPhase-Based Application-Driven Power Management on the Single-chip Cloud Computer
Phase-Based Application-Driven Power Management on the Single-chip Cloud Computer Nikolas Ioannou, Michael Kauschke, Matthias Gries, and Marcelo Cintra University of Edinburgh Intel Labs Braunschweig Introduction
More informationReducing the Energy Cost of Computing through Efficient Co-Scheduling of Parallel Workloads
Reducing the Energy Cost of Computing through Efficient Co-Scheduling of Parallel Workloads Can Hankendi Ayse K. Coskun Electrical and Computer Engineering Department, Boston University, Boston, MA, 2215
More informationCoordinating Liquid and Free Air Cooling with Workload Allocation for Data Center Power Minimization
Coordinating Liquid and Free Air Cooling with Workload Allocation for Data Center Power Minimization Li Li, Wenli Zheng, Xiaodong Wang, and Xiaorui Wang Dept. of Electrical and Computer Engineering The
More informationTemperature-Sensitive Loop Parallelization for Chip Multiprocessors
Temperature-Sensitive Loop Parallelization for Chip Multiprocessors Sri Hari Krishna Narayanan, Guilin Chen, Mahmut Kandemir, Yuan Xie Department of CSE, The Pennsylvania State University {snarayan, guilchen,
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationPower-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,
More informationCS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.
CS152 Computer Architecture and Engineering Lecture 9 Performance 2004-09-28 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts
More informationImproving Virtual Machine Scheduling in NUMA Multicore Systems
Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore
More informationSurvey of Energy-Cognizant Scheduling Techniques
1 Survey of Energy-Cognizant Scheduling Techniques Sergey Zhuravlev, Juan Carlos Saez, Sergey Blagodurov, Alexandra Fedorova and Manuel Prieto Abstract Execution time is no longer the only metric by which
More informationHow much energy can you save with a multicore computer for web applications?
How much energy can you save with a multicore computer for web applications? Peter Strazdins Computer Systems Group, Department of Computer Science, The Australian National University seminar at Green
More informationAnnouncements. Program #1. Program #0. Reading. Is due at 9:00 AM on Thursday. Re-grade requests are due by Monday at 11:59:59 PM.
Program #1 Announcements Is due at 9:00 AM on Thursday Program #0 Re-grade requests are due by Monday at 11:59:59 PM Reading Chapter 6 1 CPU Scheduling Manage CPU to achieve several objectives: maximize
More informationVirtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials
Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials Matt Skach1, Manish Arora2,3, Dean Tullsen3, Lingjia Tang1, Jason Mars1 University of Michigan1
More informationEnergy consumption in embedded systems; abstractions for software models, programming languages and verification methods
Energy consumption in embedded systems; abstractions for software models, programming languages and verification methods Florence Maraninchi orcid.org/0000-0003-0783-9178 thanks to M. Moy, L. Mounier,
More informationChapter 6: CPU Scheduling. Operating System Concepts 9 th Edition
Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time
More informationNetwork Swapping. Outline Motivations HW and SW support for swapping under Linux OS
Network Swapping Emanuele Lattanzi, Andrea Acquaviva and Alessandro Bogliolo STI University of Urbino, ITALY Outline Motivations HW and SW support for swapping under Linux OS Local devices (CF, µhd) Network
More informationA Cool Scheduler for Multi-Core Systems Exploiting Program Phases
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth
More informationPerformance Measurement (as seen by the customer)
CS5 Computer Architecture and Engineering Last Time: Microcode, Multi-Cycle Lecture 9 Performance 004-09-8 Inputs sequencer control datapath control microinstruction (µ) µ-code ROM Dave Patterson (www.cs.berkeley.edu/~patterson)
More informationCPU Scheduling: Objectives
CPU Scheduling: Objectives CPU scheduling, the basis for multiprogrammed operating systems CPU-scheduling algorithms Evaluation criteria for selecting a CPU-scheduling algorithm for a particular system
More informationHot vs Cold Energy Efficient Data Centers. - SVLG Data Center Center Efficiency Summit
Hot vs Cold Energy Efficient Data Centers - SVLG Data Center Center Efficiency Summit KC Mares November 2014 The Great Debate about Hardware Inlet Temperature Feb 2003: RMI report on high-performance data
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationAccurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems
Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More information3D MPSoCs with Active Cooling
System-Level Thermal Management of 3D MPSoCs with Active Cooling Prof. David Atienza, Embedded Systems Laboratory (ESL), Ecole Polytechnique Fédérale de Lausanne (EPFL) MPSoC 11, July 4 th 8 th 2011 (Beaune,
More informationReconfigurable Multicore Server Processors for Low Power Operation
Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture
More informationAdvanced Computer Architecture (CS620)
Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).
More informationPerformance and Power Analysis of RCCE Message Passing on the Intel Single-Chip Cloud Computer
Performance and Power Analysis of RCCE Message Passing on the Intel Single-Chip Cloud Computer John-Nicholas Furst Ayse K. Coskun Electrical and Computer Engineering Department, Boston University, Boston,
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationPower Modeling and Thermal Management Techniques for Manycores
Power Modeling and Thermal Management Techniques for Manycores Rajib Nath Computer Science and Engineering University of California, San Diego Douglas Carmean Extreme Technology Lab Intel Lab, Oregon Tajana
More informationMANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES
MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES A Dissertation Presented to The Academic Faculty By William J. Song In Partial Fulfillment Of the Requirements
More informationTemperature measurement in the Intel CoreTM Duo Processor
Temperature measurement in the Intel CoreTM Duo Processor E. Rotem, J. Hermerding, A. Cohen, H. Cain To cite this version: E. Rotem, J. Hermerding, A. Cohen, H. Cain. Temperature measurement in the Intel
More informationInitial Results on the Performance Implications of Thread Migration on a Chip Multi-Core
3 rd HiPEAC Workshop IBM, Haifa 17-4-2007 Initial Results on the Performance Implications of Thread Migration on a Chip Multi-Core, P. Michaud, L. He, D. Fetis, C. Ioannou, P. Charalambous and A. Seznec
More informationAchieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors
Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors Xiaorui Wang, Kai Ma, and Yefu Wang Department of Electrical Engineering and Computer Science University of Tennessee,
More informationPicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor
PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,
More informationSecurity-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat
Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance
More informationDynamic Cache Pooling in 3D Multicore Processors
Dynamic Cache Pooling in 3D Multicore Processors TIANSHENG ZHANG, JIE MENG, and AYSE K. COSKUN, BostonUniversity Resource pooling, where multiple architectural components are shared among cores, is a promising
More informationEfficient Program Power Behavior Characterization
Efficient Program Power Behavior Characterization Chunling Hu Daniel A. Jiménez Ulrich Kremer Department of Computer Science {chunling, djimenez, uli}@cs.rutgers.edu Rutgers University, Piscataway, NJ
More informationPower Control in Virtualized Data Centers
Power Control in Virtualized Data Centers Jie Liu Microsoft Research liuj@microsoft.com Joint work with Aman Kansal and Suman Nath (MSR) Interns: Arka Bhattacharya, Harold Lim, Sriram Govindan, Alan Raytman
More informationECE 571 Advanced Microprocessor-Based Design Lecture 5
ECE 571 Advanced Microprocessor-Based Design Lecture 5 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 6 February 2018 Announcements HW#1 graded HW#2 due Thursday 1 HW#1 Review
More informationEvaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000
Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University
More informationHow Can EDA Help Solve Challenges in Data Center Energy Efficiency?
How Can EDA Help Solve Challenges in Data Center Energy Efficiency? Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun http://www.bu.edu/peaclab/
More informationOutline. Emerging Trends. An Integrated Hardware/Software Approach to On-Line Power- Performance Optimization. Conventional Processor Design
Outline An Integrated Hardware/Software Approach to On-Line Power- Performance Optimization Sandhya Dwarkadas University of Rochester Framework: Dynamically Tunable Clustered Multithreaded Architecture
More informationAn Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling
An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate
More informationComputer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13
Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,
More informationEnergy Aware Scheduling in Cloud Datacenter
Energy Aware Scheduling in Cloud Datacenter Jemal H. Abawajy, PhD, DSc., SMIEEE Director, Distributed Computing and Security Research Deakin University, Australia Introduction Cloud computing is the delivery
More informationA Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT. Felipe Cerqueira and Björn Brandenburg
A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT Felipe Cerqueira and Björn Brandenburg July 9th, 2013 1 Linux as a Real-Time OS 2 Linux as a Real-Time OS Optimizing system responsiveness
More informationAnalyzing Performance Asymmetric Multicore Processors for Latency Sensitive Datacenter Applications
Analyzing erformance Asymmetric Multicore rocessors for Latency Sensitive Datacenter Applications Vishal Gupta Georgia Institute of Technology vishal@cc.gatech.edu Ripal Nathuji Microsoft Research ripaln@microsoft.com
More informationCOL862 - Low Power Computing
COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted
More informationThermal-aware scratchpad memory design and allocation.
Thermal-aware scratchpad memory design and allocation. Citation for published version (APA): Damavandpeyma, M., Stuijk, S., Basten, T., Geilen, M. C. W., & Corporaal, H. (2010). Thermal-aware scratchpad
More informationPower and Thermal Models. for RAMP2
Power and Thermal Models for 2 Jose Renau Department of Computer Engineering, University of California Santa Cruz http://masc.cse.ucsc.edu Motivation Performance not the only first order design parameter
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationEfficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero
Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT) 11-15
More informationIllusionist: Transforming Lightweight Cores into Aggressive Cores on Demand
Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand A. Ansari, S. Feng, S. Gupta, J. Torrellas, S. Mahlke HPCA - 2013 University of Illinois University of Michigan June 28, 2013.
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2019 Lecture 8 Scheduling Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ POSIX: Portable Operating
More informationPower and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur
Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,
More informationARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial
ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial Roxana Rusitoru Systems Research Engineer, ARM 1 Motivation & background Goal: Why: Who: 2 HPC-oriented
More informationPower and Energy Management
Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,
More informationECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018
ECE 172 Digital Systems Chapter 15 Turbo Boost Technology Herbert G. Mayer, PSU Status 8/13/2018 1 Syllabus l Introduction l Speedup Parameters l Definitions l Turbo Boost l Turbo Boost, Actual Performance
More informationResponse Time and Throughput
Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing
More informationBig.LITTLE Processing with ARM Cortex -A15 & Cortex-A7
Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design
More informationMEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS
MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing
More informationChapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationReliable Architectures
6.823, L24-1 Reliable Architectures Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 6.823, L24-2 Strike Changes State of a Single Bit 10 6.823, L24-3 Impact
More informationLecture 18: Multithreading and Multicores
S 09 L18-1 18-447 Lecture 18: Multithreading and Multicores James C. Hoe Dept of ECE, CMU April 1, 2009 Announcements: Handouts: Handout #13 Project 4 (On Blackboard) Design Challenges of Technology Scaling,
More informationAnnouncements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)
Announcements Program #1 Due 2/15 at 5:00 pm Reading Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed) 1 Scheduling criteria Per processor, or system oriented CPU utilization
More information8. CONCLUSION AND FUTURE WORK. To address the formulated research issues, this thesis has achieved each of the objectives delineated in Chapter 1.
134 8. CONCLUSION AND FUTURE WORK 8.1 CONCLUSION Virtualization and internet availability has increased virtualized server cluster or cloud computing environment deployments. With technological advances,
More informationPredictive Line Buffer: A fast, Energy Efficient Cache Architecture
Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Kashif Ali MoKhtar Aboelaze SupraKash Datta Department of Computer Science and Engineering York University Toronto ON CANADA Abstract
More informationMinimizing Thermal Variation in Heterogeneous HPC System with FPGA Nodes
Minimizing Thermal Variation in Heterogeneous HPC System with FPGA Nodes Yingyi Luo, Xiaoyang Wang, Seda Ogrenci-Memik, Gokhan Memik, Kazutomo Yoshii, Pete Beckman @ICCD 2018 Motivation FPGAs in data centers
More informationLoad Balancing. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.
Load Balancing Minsoo Ryu Department of Computer Science and Engineering 2 1 Concepts of Load Balancing Page X 2 Load Balancing Algorithms Page X 3 Overhead of Load Balancing Page X 4 Load Balancing in
More informationAn Approach for Adaptive DRAM Temperature and Power Management
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance
More informationEmbedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.
Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationMediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 7 Performance 2005-2-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Tips
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls
More informationCOL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques Authors: Huazhe Zhang and Henry Hoffmann, Published: ASPLOS '16 Proceedings
More informationCS3350B Computer Architecture CPU Performance and Profiling
CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada
More informationTEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT
TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder {nosayba, ioan, gamvrosi, hwang, bianca}@cs.toronto.edu
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.
More informationHeatWatch Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
HeatWatch Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu Storage Technology Drivers
More informationProfiling: Understand Your Application
Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel
More information