Power-Aware Compile Technology. Xiaoming Li
|
|
- Brian Hodge
- 5 years ago
- Views:
Transcription
1 Power-Aware Compile Technology Xiaoming Li
2 Frying Eggs
3 Future CPU? Watts/cm i386 Hot plate i486 Nuclear Reactor Pentium III processor Pentium II processor Pentium Pro processor Pentium processor!"#$!$ %"&$ %"#$ %"'#$ %"(#$ %"!)$ %"!'$ %"!$ %"%&$ Rocket Nozzle Sun s Surface ( Source: Fred Pollack, Intel, Micro 1999 keynote)
4 Maximizing power efficiency is within reach Hardware support Enhanced SpeedStep: Low overhead frequency/voltage scaling. 10us/transition. Opportunities: CPU frequency and frontbus frequency are decoupled. Programs have memory bound segments and CPU bound segments
5 Is any energy wasted when executing programs? Current Status: Programs run at a single frequency from start to end, neglecting segmental behavior of execution. CPU idles in memory bound segments. Out plan: Find out how a program execute. Remove the fat in energy consumption
6 Searching for the most power efficient FFT Select the proper frequency/voltage for different regions in the FFT code. Challenges include: Where to switch? Which frequency? Schedule the code to reveal more opportunities for frequency scaling. Challenges include: How to schedule for power?
7 What previous research does? Hardware/OS [Berkeley, MIT, UMD, UMich, UVa, ] Interactive applications Predict memory access pattern Fix window size -> Wrong prediction Batch applications Predict the execution time of every task Distribute unused time to remaining tasks Low granularity, no use for DSP program 7
8 Previous Compiler-based DVS algorithms The Design, Implementation, and Evaluation of a Compiler Algorithm for CPU Energy Reduction Chung- Hsing Hsu and Ulrich Kremer, PLDI 03. Select basic blocks from program structure ENTRY C1 C2 L3 Entry/exit is unique EXIT Loop Call site If statement Seq. of regions Entire procedure C5 L4 8 16
9 Chung-Hsing Hsu and Ulrich Kremer s Approach (Cont ) Measure the execution time and the power consumption of every region at every possible frequencies. Change the frequency of only one region Exhaustive search for the best region and the optimal frequency. 9
10 Run programs on the simulator Compile-time Dynamic Voltage Scaling Settings: Opportunities and Limits, Fen Xie, Margaret Martonosi, and Sharad Malik, PLDI 03 Divide the execution into memory accesses and cpu operations. Assuming the processor has continuous frequency spectrum. Model power consumption in memory accesses phases and cpu active phases. Use existing optimizing software to find the best single region for scheduling. 10
11 Re-examine our goal What we really want to optimize? Power ~ O(v) Trivial solution if just to reduce power Energy ~ O(v 2 ) Minimal energy consumption at the lowest frequency Energy Delay SPEC / Jules Energy Delay 2 Test if we really make improvement 11
12 Energy vs. Delay Landscape A energy B C How to affect tradeoffs? How to compare tradeoffs? delay
13 Optimization Space energy Pareto Optimal delay
14 Projection of Compile Optimizations F parallelizing scaling energy energy-aware compilation F =(F, s, p, -O) runtime 14
15 Our Goal energy new, higher quality Pareto front for any metric runtime 15
16 Simulator vs. Real Machine? Simulator Watt, SimPower Power-model should be verified. Not the best environment for compiler research. Real machine How to identify phases in the program? How to measure power consumption? How to search the front-line of energy-delay? 16
17 Identify Program Phases Use hardware counters Low overhead Limited number Find the correct events Memory access: L2_Cache_Load_In, L2_Prefetch_Load_In Instruction number: Instruction_Retired Execution time: Cycle 17
18 Insert Reading Points Control the overhead of reading. Reading evenly during the execution Use a simplified model of memory accesses and working cycles. Understand how compiler translate instructions. Constant loading Array access 18
19 L2 Cache Miss/10 us Are there really program phases? e e+06 2e e+06 3e+06 Cycle PM 19
20 30 25 L2 Cache Miss/10 us e e e e+06 Cycle PM 20
21 Iterations have different patterns 21
22 Frequency Scaling Select the program region with the highest cache miss ratio. Lower the processor frequency before entering the region. Restore the frequency after exiting the region. 22
23 WHT-2 19 (out-of-cache) low frequency Each point shows the cache miss ratio every 100!seconds Cache miss ratio high frequency Time 23
24 Example: code with voltage/frequency scaling instructions setfreq(2); i14 = 0; while (i14 <= 32767) { s277 = T2[i14]; s278 = T2[ i14]; t459 = s277 - s278; i14++; } setfreq(3); decrease frequency increase frequency 24
25 Frequency Scaling Select the program region with the highest cache miss ratio. Lower the processor frequency before entering the region. Restore the frequency after exiting the region. Transform the program to reveal more opportunities for frequency scaling. 25
26 26
27 Measure Energy Consumption Energy = Volt*Amp*Time Volt: Constant Amp: Oscilloscope Time: Cycle/Frequency 27
28 Pentium-M 2.13GHz Six frequency settings 2.13 GHz at volt (max performance) 800 MHz at volt (min performance/energy) The change in performance/energy tradeoff is dramatic. 28
29 29
30 30
31 WHT-2 20 Experiment Results Energy versus execution time Pareto curve Energy (Joules) Energy (Joules) % energy reduction % energy reduction Withing 5% of the execution time of the fastest version Execution Time (Seconds) Execution Time (Seconds) Fixed Dynamic 31
32 DCT-2 20 Energy versus execution time Pareto curve Energy (Joules) Energy (Joules) Execution Time (Seconds) Execution Time (Seconds) Fixed Dynamic 32
33 Real DFT-2 20 Energy versus execution time Pareto curve Energy (Joules) Energy (Joules) Execution Time (Seconds) Execution Time (Seconds) Fixed Dynamic 33
34 DFT-2 20 Energy versus execution time Pareto curve Energy (Joules) Execution Time (Seconds) Energy (Joules) Execution Time (Seconds) Fixed Dynamic 34
35 Future Directions Loop transformation Global optimization Strength reduction Parallelization for power... 35
CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era
More informationNew Challenges in Microarchitecture and Compiler Design
New Challenges in Microarchitecture and Compiler Design Contributors: Jesse Fang Tin-Fook Ngai Fred Pollack Intel Fellow Director of Microprocessor Research Labs Intel Corporation fred.pollack@intel.com
More informationPower dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.
The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults
More informationCrusoe Power Management:
Crusoe Power Management: Cutting x86 Operating Power Through LongRun Marc Fleischmann Director, Low Power Programs Transmeta Corporation Crusoe, LongRun and Code Morphing are trademarks of Transmeta Corp.
More informationIntroduction to Energy-Efficient Software 2 nd life talk
Introduction to Energy-Efficient Software 2 nd life talk Intel Software and Solutions Group Bob Steigerwald Nov 8, 2007 Taylor Kidd Nov 15, 2007 Agenda Demand for Mobile Computing Devices What is Energy-Efficient
More informationDFT Compiler for Custom and Adaptable Systems
DFT Compiler for Custom and Adaptable Systems Paolo D Alberto Electrical and Computer Engineering Carnegie Mellon University Personal Research Background Embedded and High Performance Computing Compiler:
More informationPhase-Based Application-Driven Power Management on the Single-chip Cloud Computer
Phase-Based Application-Driven Power Management on the Single-chip Cloud Computer Nikolas Ioannou, Michael Kauschke, Matthias Gries, and Marcelo Cintra University of Edinburgh Intel Labs Braunschweig Introduction
More informationComputer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John
Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M
More informationParallel Computing. Parallel Computing. Hwansoo Han
Parallel Computing Parallel Computing Hwansoo Han What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple
More informationMulticore and Parallel Processing
Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement
More informationCSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering
CSE 141: Computer 0 Architecture Professor: Michael Taylor RF UCSD Department of Computer Science & Engineering Computer Architecture from 10,000 feet foo(int x) {.. } Class of application Physics Computer
More informationEfficient Program Power Behavior Characterization
Efficient Program Power Behavior Characterization Chunling Hu Daniel A. Jiménez Ulrich Kremer Department of Computer Science {chunling, djimenez, uli}@cs.rutgers.edu Rutgers University, Piscataway, NJ
More informationHigh-Performance Parallel Computing
High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;
More informationA2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications
A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas
More informationCIT 668: System Architecture
CIT 668: System Architecture Computer Systems Architecture I 1. System Components 2. Processor 3. Memory 4. Storage 5. Network 6. Operating System Topics Images courtesy of Majd F. Sakr or from Wikipedia
More informationHow to get realistic C-states latency and residency? Vincent Guittot
How to get realistic C-states latency and residency? Vincent Guittot Agenda Overview Exit latency Enter latency Residency Conclusion Overview Overview PMWG uses hikey960 for testing our dev on b/l system
More informationPower-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,
More informationEE241 - Spring 2004 Advanced Digital Integrated Circuits
EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.
More informationDynamic Performance Tuning for Speculative Threads
Dynamic Performance Tuning for Speculative Threads Yangchun Luo, Venkatesan Packirisamy, Nikhil Mungre, Ankit Tarkas, Wei-Chung Hsu, and Antonia Zhai Dept. of Computer Science and Engineering Dept. of
More informationPower Measurement Using Performance Counters
Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power
More informationApplications. Department of Computer Science, University of Pittsburgh, Pittsburgh, PA fnevine, mosse, childers,
Toward the Placement of Power Management Points in Real Time Applications Nevine AbouGhazaleh, Daniel Mosse, Bruce Childers and Rami Melhem Department of Computer Science, University of Pittsburgh, Pittsburgh,
More informationSeparating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance
Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential
More informationAdministration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:
CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.126A ACES Email: pingali@cs.utexas.edu TA: Aditya Rawal Email: 83.aditya.rawal@gmail.com University of Texas,
More informationEmbedded Systems Architecture
Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur M. Eng. Mariusz Rudnicki 1/47 Software and hardware minimizing energy consumption
More informationA Cool Scheduler for Multi-Core Systems Exploiting Program Phases
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth
More informationAdministration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture
CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.26A ACES Email: pingali@cs.utexas.edu TA: Xin Sui Email: xin@cs.utexas.edu University of Texas, Austin Fall
More informationCHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER
73 CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER 7.1 INTRODUCTION The proposed DVS algorithm is implemented on DELL INSPIRON 6000 model laptop, which has Intel Pentium Mobile Processor
More informationParallel Functional Programming Lecture 1. John Hughes
Parallel Functional Programming Lecture 1 John Hughes Moore s Law (1965) The number of transistors per chip increases by a factor of two every year two years (1975) Number of transistors What shall we
More informationMany-Core Computing Era and New Challenges. Nikos Hardavellas, EECS
Many-Core Computing Era and New Challenges Nikos Hardavellas, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 2007 45nm 2010 32nm 2013 22nm 2016 16nm 2019
More informationOperating System Support for Shared-ISA Asymmetric Multi-core Architectures
Operating System Support for Shared-ISA Asymmetric Multi-core Architectures Tong Li, Paul Brett, Barbara Hohlt, Rob Knauerhase, Sean McElderry, Scott Hahn Intel Corporation Contact: tong.n.li@intel.com
More informationHow many cores are too many cores? Dr. Avi Mendelson, Intel - Mobile Processors Architecture group
How many cores are too many cores? Dr. Avi Mendelson, Intel - Mobile Processors Architecture group avi.mendelson@intel.com 1 Disclaimer No Intel proprietary information is disclosed. Every future estimate
More informationEnergy-centric DVFS Controlling Method for Multi-core Platforms
Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To
More informationOverview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization
Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan & Matt Larsen (University of Oregon), Hank Childs (Lawrence Berkeley National Laboratory) 26
More informationLecture #1. Teach you how to make sure your circuit works Do you want your transistor to be the one that screws up a 1 billion transistor chip?
Instructor: Jan Rabaey EECS141 1 Introduction to digital integrated circuit design engineering Will describe models and key concepts needed to be a good digital IC designer Models allow us to reason about
More informationEmbedded System Architecture
Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur Embedded Systems Architecture 1/44 Software and hardware minimizing energy consumption
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationReconfigurable Multicore Server Processors for Low Power Operation
Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture
More informationEnergy Conservation In Computational Grids
Energy Conservation In Computational Grids Monika Yadav 1 and Sudheer Katta 2 and M. R. Bhujade 3 1 Department of Computer Science and Engineering, IIT Bombay monika@cse.iitb.ac.in 2 Department of Electrical
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing
More informationEECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 2: Figures of Merit and Evaluation Methodologies
1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 2: Figures of Merit and Evaluation Methodologies Instructor: Ron Dreslinski Winter 2016 1 1 Measuring performance 2 2 Performance
More informationPotentials and Limitations for Energy Efficiency Auto-Tuning
Center for Information Services and High Performance Computing (ZIH) Potentials and Limitations for Energy Efficiency Auto-Tuning Parco Symposium Application Autotuning for HPC (Architectures) Robert Schöne
More informationWhat is this class all about?
EE141-Fall 2012 Digital Integrated Circuits Instructor: Elad Alon TuTh 11-12:30pm 247 Cory 1 What is this class all about? Introduction to digital integrated circuit design engineering Will describe models
More informationAnalyzing the Energy-Time Tradeoff in High-Performance Computing Applications
Analyzing the Energy-Time Tradeoff in High-Performance Computing Applications Vincent W. Freeh Feng Pan David K. Lowenthal Nandini Kappiah Rob Springer Barry L. Rountree Mark E. Femal Department of Computer
More informationWhat is this class all about?
EE141-Fall 2007 Digital Integrated Circuits Instructor: Elad Alon TuTh 3:30-5pm 155 Donner 1 1 What is this class all about? Introduction to digital integrated circuit design engineering Will describe
More informationMEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS
MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing
More informationAn Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs
An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs Xin Huo, Vignesh T. Ravi, Wenjing Ma and Gagan Agrawal Department of Computer Science and Engineering
More informationCSC630/CSC730 Parallel & Distributed Computing
CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Part 1.1.2: Introduction (Digital VLSI Systems) Liang Liu liang.liu@eit.lth.se 1 Outline Why Digital? History & Roadmap Device Technology & Platforms System
More informationLast Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks
Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power
More informationEITF20: Computer Architecture Part1.1.1: Introduction
EITF20: Computer Architecture Part1.1.1: Introduction Liang Liu liang.liu@eit.lth.se 1 Course Factor Computer Architecture (7.5HP) http://www.eit.lth.se/kurs/eitf20 EIT s Course Service Desk (studerandeexpedition)
More informationCS3350B Computer Architecture CPU Performance and Profiling
CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada
More informationLecture #10 Context Switching & Performance Optimization
SPRING 2015 Integrated Technical Education Cluster At AlAmeeria E-626-A Real-Time Embedded Systems (RTES) Lecture #10 Context Switching & Performance Optimization Instructor: Dr. Ahmad El-Banna Agenda
More informationIntegrated CPU and Cache Power Management in Multiple Clock Domain Processors
Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC
More informationSDR Forum Technical Conference 2007
THE APPLICATION OF A NOVEL ADAPTIVE DYNAMIC VOLTAGE SCALING SCHEME TO SOFTWARE DEFINED RADIO Craig Dolwin (Toshiba Research Europe Ltd, Bristol, UK, craig.dolwin@toshiba-trel.com) ABSTRACT This paper presents
More informationPerformance of computer systems
Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationEvaluating MMX Technology Using DSP and Multimedia Applications
Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical
More informationPower Management as I knew it. Jim Kardach
Power Management as I knew it Jim Kardach 1 Agenda Philosophy of power management PM Timeline Era of OS Specific PM (OSSPM) Era of OS independent PM (OSIPM) Era of OS Assisted PM (APM) Era of OS & hardware
More informationSaving Energy with Architectural and Frequency Adaptations for Multimedia Applications Chris Hughes
Saving Energy with Architectural and Frequency Adaptations for Multimedia Applications Chris Hughes w/ Jayanth Srinivasan and Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign
More informationLow-power Architecture. By: Jonathan Herbst Scott Duntley
Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media
More information19: I/O Devices: Clocks, Power Management
19: I/O Devices: Clocks, Power Management Mark Handley Clock Hardware: A Programmable Clock Pulses Counter, decremented on each pulse Crystal Oscillator On zero, generate interrupt and reload from holding
More informationPerformance Analysis in the Real World of Online Services
Performance Analysis in the Real World of Online Services Dileep Bhandarkar, Ph. D. Distinguished Engineer 2009 IEEE International Symposium on Performance Analysis of Systems and Software My Background:
More informationA Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance Qiang Wu 1, V.J. Reddi 2, Youfeng Wu 3, Jin Lee 3, Dan Connors 2, David Brooks 4, Margaret Martonosi 1, Douglas W.
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationSWAP: EFFECTIVE FINE-GRAIN MANAGEMENT
: EFFECTIVE FINE-GRAIN MANAGEMENT OF SHARED LAST-LEVEL CACHES WITH MINIMUM HARDWARE SUPPORT Xiaodong Wang, Shuang Chen, Jeff Setter, and José F. Martínez Computer Systems Lab Cornell University Page 1
More informationEE141- Spring 2004 Introduction to Digital Integrated Circuits. What is this class about?
- Spring 2004 Introduction to Digital Integrated Circuits Tu-Th am-2:30pm 203 McLaughlin What is this class about? Introduction to digital integrated circuits.» CMOS devices and manufacturing technology.
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationIndependent DSP Benchmarks: Methodologies and Results. Outline
Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Structured Computer Organization A computer s native language, machine language, is difficult for human s to use to program the computer
More informationPower Profiling and Optimization for Heterogeneous Multi-Core Systems
Power Profiling and Optimization for Heterogeneous Multi-Core Systems Kuen Hung Tsoi and Wayne Luk Department of Computing, Imperial College London {khtsoi, wl}@doc.ic.ac.uk ABSTRACT Processing speed and
More informationThe Role of Performance
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware
More informationDesigning for Performance. Patrick Happ Raul Feitosa
Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance
More informationProfiling and Workflow
Profiling and Workflow Preben N. Olsen University of Oslo and Simula Research Laboratory preben@simula.no September 13, 2013 1 / 34 Agenda 1 Introduction What? Why? How? 2 Profiling Tracing Performance
More informationEE141- Spring 2002 Introduction to Digital Integrated Circuits. What is this class about?
- Spring 2002 Introduction to Digital Integrated Circuits Tu-Th 9:30-am 203 McLaughlin What is this class about? Introduction to digital integrated circuits.» CMOS devices and manufacturing technology.
More informationEE141- Spring 2007 Introduction to Digital Integrated Circuits
- Spring 2007 Introduction to Digital Integrated Circuits Tu-Th 5pm-6:30pm 150 GSPP 1 What is this class about? Introduction to digital integrated circuits.» CMOS devices and manufacturing technology.
More informationCOL862 Programming Assignment-1
Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,
More informationIMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM
IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information
More informationECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141
ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition
More informationDynamic Translation for EPIC Architectures
Dynamic Translation for EPIC Architectures David R. Ditzel Chief Architect for Hybrid Computing, VP IAG Intel Corporation Presentation for 8 th Workshop on EPIC Architectures April 24, 2010 1 Dynamic Translation
More informationArchitecture Tuning Study: the SimpleScalar Experience
Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.
More informationAutomatic Post Silicon Clock Scheduling 08/12/2008. UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Power Issues in Computer Architecture Fall 2008 Power Density Trend for Intel mp 1000 Watt/cm2 100 10
More informationWhat is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.
Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470
More informationAdministration. Prerequisites. Website. CSE 392/CS 378: High-performance Computing: Principles and Practice
CSE 392/CS 378: High-performance Computing: Principles and Practice Administration Professors: Keshav Pingali 4.126 ACES Email: pingali@cs.utexas.edu Jim Browne Email: browne@cs.utexas.edu Robert van de
More informationAdministration. Coursework. Prerequisites. CS 378: Programming for Performance. 4 or 5 programming projects
CS 378: Programming for Performance Administration Instructors: Keshav Pingali (Professor, CS department & ICES) 4.126 ACES Email: pingali@cs.utexas.edu TA: Hao Wu (Grad student, CS department) Email:
More informationResponse Time and Throughput
Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing
More informationThe benefits and costs of writing a POSIX kernel in a high-level language
1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38
More informationWhat is this class all about?
-Fall 2004 Digital Integrated Circuits Instructor: Borivoje Nikolić TuTh 3:30-5 247 Cory EECS141 1 What is this class all about? Introduction to digital integrated circuits. CMOS devices and manufacturing
More informationMore on Conjunctive Selection Condition and Branch Prediction
More on Conjunctive Selection Condition and Branch Prediction CS764 Class Project - Fall Jichuan Chang and Nikhil Gupta {chang,nikhil}@cs.wisc.edu Abstract Traditionally, database applications have focused
More informationStructural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort
Structural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort Fred DePiero, Ph.D. Electrical Engineering Department CalPoly State University Goal: Subgraph Matching for Use in Real-Time
More informationMediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip
More informationECE 571 Advanced Microprocessor-Based Design Lecture 6
ECE 571 Advanced Microprocessor-Based Design Lecture 6 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 4 February 2016 HW#3 will be posted HW#1 was graded Announcements 1 First
More informationImproving Cache Performance
Improving Cache Performance Computer Organization Architectures for Embedded Computing Tuesday 28 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,
More informationR-Storm: A Resource-Aware Scheduler for STORM. Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell
R-Storm: A Resource-Aware Scheduler for STORM Mohammad Hosseini Boyang Peng Zhihao Hong Reza Farivar Roy Campbell Introduction STORM is an open source distributed real-time data stream processing system
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science
More informationImproving Cache Performance
Improving Cache Performance Tuesday 27 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Memory hierarchy
More informationRechnerarchitektur (RA)
12 Rechnerarchitektur (RA) Sommersemester 2017 Architecture-Aware Optimizations -Software Optimizations- Jian-Jia Chen Informatik 12 Jian-jia.chen@tu-.. http://ls12-www.cs.tu.de/daes/ Tel.: 0231 755 6078
More informationLast Level Cache Size Flexible Heterogeneity in Embedded Systems
Last Level Cache Size Flexible Heterogeneity in Embedded Systems Mario D. Marino, Kuan-Ching Li Leeds Beckett University, m.d.marino@leedsbeckett.ac.uk Corresponding Author, Providence University, kuancli@gm.pu.edu.tw
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationEffects of Dynamic Voltage and Frequency Scaling on a K20 GPU
Effects of Dynamic Voltage and Frequency Scaling on a K2 GPU Rong Ge, Ryan Vogt, Jahangir Majumder, and Arif Alam Dept. of Mathematics, Statistics and Computer Science Marquette University Milwaukee, WI,
More information