Modeling CPU Energy Consumption for Energy Efficient Scheduling

Size: px
Start display at page:

Download "Modeling CPU Energy Consumption for Energy Efficient Scheduling"

Transcription

1 Modeling CPU Energy Consumption for Energy Efficient Scheduling Abhishek Jaiantilal, Yifei Jiang, Shivakant Mishra University of Colorado - Boulder GCM '10 Proceedings of the 1st Workshop on Green Computing 2010 ACM

2 Outline Introduction Energy Model Overview Power Consumed and CPU Cycles Experimental Results Conclusions 2

3 Introduction (1/2) The processor is the component that consumes the most power. 3

4 Introduction (2/2) Dynamic Voltage and Frequency Scaling (DVFS) is used in CPU, referring as P-states. Per Core Power Gating (PCPG), or Dynamic Core Gating (DCG) is a hardware feature allowing the cores in a multicore CPU to shut themselves off. It is also called C-states. C0 - Active state C1 - Inactive state with the core not running on these idle cycles C3 - Inactive state with the cache saved C6 - All the PLL turned off 4

5 Energy Model Overview (1/3) Black Box approach PCPG is hardware controlled, so we use Black Box approach. Obtained the statistics of /proc/stat file A scheduling policy to limit these loops on few cores might not be the best compared with running them on all the cores. Still a low power profile. Lesser execution time. So we need to know the power consumption of a task 5

6 Energy Model Overview (2/3) Even though the processes are running at 100% load, the power consumed is different for different tasks. Because some of these tasks are float-cycle intensive and others are integer or memory cycle intensive. 6

7 Energy Model Overview (3/3) Modified Black Box approach If we know how much power a task is consuming, then we can fit a schedule that allowing for a shorter execution time and a lower energy consumption. We need the training data to choose the best task schedule depending on the tradeoff between the power consumption and the execution time. Disadvantages Need training data from all the possible tasks first Computers should have the same configuration 7

8 Power Consumed and CPU Cycles (1/7) System power consumption P(System) f(p CPU + P Memory + P Fans + P HDD + P Northbridge + P Southbridge + P Graphics + P(Other components)) f() = Efficiency of the Power supply 8

9 Power Consumed and CPU Cycles (2/7) Simplified system power consumption P(System) P CPU + P Memory + P Bias Bias = Power of Fans, Motherboard, North-bridge, South-bridge, Graphics, HDD, and Other Components. 9

10 Power Consumed and CPU Cycles (3/7) We proposed if we know the CPU cycle profile for a task, we can build a simple linear model to account the CPU load and energy consumed. P System Cycles FPU + Cycles INT + Cycles Memory + P(Bias) P(Task i ) Cycles FPU + Cycles IU + Cycles Cache N P System Power Task i i=1 + Bias 10

11 Power Consumed and CPU Cycles (4/7) We need to know the counts and the types of CPU cycles executed by a task. Dtrace for Solaris Oprofile Intel Vtune for Linux We used Vtune in an offline manner and sampled the application and store the cycle time over some period. (30 minutes~1 hour) 11

12 Power Consumed and CPU Cycles (5/7) Linear Regression Model Power Task i = F number offp cycles +I number of Int Cycles +M number of Memory Cycles F, I, and M are multiplier for watt cost of running a single FP, INT, or Memory cycle. But there is no direct way to find them. 12

13 Power Consumed and CPU Cycles (6/7) We use the statistical approach of minimizing the square error to find these unknown variables. min F,I,M Measured wattage Y Predicted wattage Y 2 Y = F Number offp cycles +I (Number of Int Cycles) +M Number of Memory Cycles F, I, M > 0, β = F I M + Bias = Xβ Once we know X, Y, then F, I, and M (stored in the β vector) can be obtained as: β = X T X + λi 1 X T Y 13

14 Power Consumed and CPU Cycles (7/7) We also used another statistical algorithm - Random Forests in our experiments. Random Forests is a popular machine learning/statistical approach that uses decision trees. It is a non-linear algorithm compared to the linear regression formulation. 14

15 Experimental Results (1/6) Regression Model Training We obtained training data from the following benchmarks first: memcpy While-float mprime Then we obtained separated test data for: SPECjvm While-Int While-Branch 15

16 Experimental Results (2/6) Results of Regression Model 16

17 Experimental Results (3/6) Energy Efficient Scheduler We proposed that we do not wake up a core from idle state until its needed. The cores that were not allocated any tasks were shut off. A core cannot execute more than a specific number of processor cycles. We used the average number of cycles executed to predict the energy consumed and then chose the best energy efficient schedule. The ideal case would be in an online fashion, based on the current load/cycle executed and evaluate the task schedule every second. 17

18 Experimental Results (4/6) 18

19 Experimental Results (5/6) 19

20 Experimental Results (6/6) 20

21 Conclusions We showed that a linear and Random Forests model can be used for predicting energy consumption. We also proposed a simple scheduler that utilizes this model to minimize power consumption but still maintain similar execution time. In the future, we propose to come up with a better mathematical model for scheduler. We also propose to use model in an online fashion and allowing the OS to limit processes that consume power greater than a fixed limit. 21

Abhishek Pandey Aman Chadha Aditya Prakash

Abhishek Pandey Aman Chadha Aditya Prakash Abhishek Pandey Aman Chadha Aditya Prakash System: Building Blocks Motivation: Problem: Determining when to scale down the frequency at runtime is an intricate task. Proposed Solution: Use Machine learning

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

Power Measurement Using Performance Counters

Power Measurement Using Performance Counters Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power

More information

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun

More information

DE0 Nano SoC - CPU Performance and Power

DE0 Nano SoC - CPU Performance and Power DE0 Nano SoC DE0 Nano SoC - CPU Performance and Power While Running Debian 19 th March 2017 - Satyen Akolkar Group 5 - AR Internet of Things By: Satyen Akolkar OVERVIEW The benchmark was performed by using

More information

Tips and Tricks: Designing low power Native and WebApps. Harita Chilukuri and Abhishek Dhanotia

Tips and Tricks: Designing low power Native and WebApps. Harita Chilukuri and Abhishek Dhanotia Tips and Tricks: Designing low power Native and WebApps Harita Chilukuri and Abhishek Dhanotia Acknowledgements William Baughman for his help with the browser analysis Ross Burton & Thomas Wood for information

More information

LOWERING POWER CONSUMPTION OF HEVC DECODING. Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014

LOWERING POWER CONSUMPTION OF HEVC DECODING. Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014 LOWERING POWER CONSUMPTION OF HEVC DECODING Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014 Introduction How to achieve low power HEVC video decoding? Modern processors expose many low power

More information

Managing Hardware Power Saving Modes for High Performance Computing

Managing Hardware Power Saving Modes for High Performance Computing Managing Hardware Power Saving Modes for High Performance Computing Second International Green Computing Conference 2011, Orlando Timo Minartz, Michael Knobloch, Thomas Ludwig, Bernd Mohr timo.minartz@informatik.uni-hamburg.de

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Junaid Nomani and Jakub Szefer Computer Architecture and Security Laboratory Yale University junaid.nomani@yale.edu

More information

Power Measurements using performance counters

Power Measurements using performance counters Power Measurements using performance counters CSL862: Low-Power Computing By Suman A M (2015SIY7524) Android Power Consumption in Android Power Consumption in Smartphones are powered from batteries which

More information

A Probabilistic Graphical Model-based Approach for Minimizing Energy under Performance Constraints

A Probabilistic Graphical Model-based Approach for Minimizing Energy under Performance Constraints A Probabilistic Graphical Model-based Approach for Minimizing Energy under Performance Constraints Nikita Mishra, Huazhe Zhang, John Lafferty and Hank Hoffmann University of Chicago Fraction of time CPU

More information

Myths in PMC-based Power Estimation. Jason Mair, Zhiyi Huang, David Eyers, and Haibo Zhang

Myths in PMC-based Power Estimation. Jason Mair, Zhiyi Huang, David Eyers, and Haibo Zhang Myths in PMC-based Power Estimation Jason Mair, Zhiyi Huang, David Eyers, and Haibo Zhang Outline PMC-based power modeling Experimental setup and configuration Myth 1: Sample rate Myth 2: Thermal effects

More information

COL862 Programming Assignment-1

COL862 Programming Assignment-1 Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,

More information

Workload Prediction for Adaptive Power Scaling Using Deep Learning. Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14

Workload Prediction for Adaptive Power Scaling Using Deep Learning. Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14 Workload Prediction for Adaptive Power Scaling Using Deep Learning Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14 In these slides Machine learning (ML) is applied to

More information

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques Authors: Huazhe Zhang and Henry Hoffmann, Published: ASPLOS '16 Proceedings

More information

Evaluating the Effectiveness of Model Based Power Characterization

Evaluating the Effectiveness of Model Based Power Characterization Evaluating the Effectiveness of Model Based Power Characterization John McCullough, Yuvraj Agarwal, Jaideep Chandrashekhar (Intel), Sathya Kuppuswamy, Alex C. Snoeren, Rajesh Gupta Computer Science and

More information

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power

More information

I/O Systems (4): Power Management. CSE 2431: Introduction to Operating Systems

I/O Systems (4): Power Management. CSE 2431: Introduction to Operating Systems I/O Systems (4): Power Management CSE 2431: Introduction to Operating Systems 1 Outline Overview Hardware Issues OS Issues Application Issues 2 Why Power Management? Desktop PCs Battery-powered Computers

More information

POWER MANAGEMENT AND ENERGY EFFICIENCY

POWER MANAGEMENT AND ENERGY EFFICIENCY POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption

More information

FUNCTIONS OF COMPONENTS OF A PERSONAL COMPUTER

FUNCTIONS OF COMPONENTS OF A PERSONAL COMPUTER FUNCTIONS OF COMPONENTS OF A PERSONAL COMPUTER Components of a personal computer - Summary Computer Case aluminium casing to store all components. Motherboard Central Processor Unit (CPU) Power supply

More information

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth

More information

Real Time Power Estimation and Thread Scheduling via Performance Counters. By Singh, Bhadauria, McKee

Real Time Power Estimation and Thread Scheduling via Performance Counters. By Singh, Bhadauria, McKee Real Time Power Estimation and Thread Scheduling via Performance Counters By Singh, Bhadauria, McKee Estimating Power Consumption Power Consumption is a highly important metric for developers Simple power

More information

Bill Nesheim Sun Microsystems, Inc. Bob Kasten Intel Corporation

Bill Nesheim Sun Microsystems, Inc. Bob Kasten Intel Corporation Bill Nesheim Sun Microsystems, Inc. Bob Kasten Intel Corporation 1 Executive Summary Sun and Intel strategic alliance has resulted in powerful innovations for customers The Solaris OS and the Intel Xeon

More information

Power Models Supporting Energy-Efficient Co- Design on Ultra-Low Power Embedded Systems

Power Models Supporting Energy-Efficient Co- Design on Ultra-Low Power Embedded Systems Power Models Supporting Energy-Efficient Co- Design on Ultra-Low Power Embedded Systems Vi Ngoc-Nha Tran 1, Brendan Barry 2, Phuong Ha 1 1 Department of Computer Science, UiT The Arctic University of Norway

More information

COL862 - Low Power Computing

COL862 - Low Power Computing COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted

More information

Quad-core Press Briefing First Quarter Update

Quad-core Press Briefing First Quarter Update Quad-core Press Briefing First Quarter Update AMD Worldwide Server/Workstation Marketing C O N F I D E N T I A L Outstanding Dual-core Performance Toady Average of scores places AMD ahead by 2% Average

More information

Crusoe Power Management:

Crusoe Power Management: Crusoe Power Management: Cutting x86 Operating Power Through LongRun Marc Fleischmann Director, Low Power Programs Transmeta Corporation Crusoe, LongRun and Code Morphing are trademarks of Transmeta Corp.

More information

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto. Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

Advanced and parallel architectures. Part B. Prof. A. Massini. June 13, Exercise 1a (3 points) Exercise 1b (3 points) Exercise 2 (8 points)

Advanced and parallel architectures. Part B. Prof. A. Massini. June 13, Exercise 1a (3 points) Exercise 1b (3 points) Exercise 2 (8 points) Advanced and parallel architectures Prof. A. Massini June 13, 2017 Part B Exercise 1a (3 points) Exercise 1b (3 points) Exercise 2 (8 points) Student s Name Exercise 3 (4 points) Exercise 4 (3 points)

More information

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD OS SIMD 1 2 SIMD (Single Instruction Multiple Data) SIMD OS (Operating System) SIMD SIMD OS Utilization of a SIMD unit in the OS Kernel Shogo Saito 1 and Shuichi Oikawa 2 Nowadays, it is very common that

More information

AMD Opteron 4200 Series Processor

AMD Opteron 4200 Series Processor What s new in the AMD Opteron 4200 Series Processor (Codenamed Valencia ) and the new Bulldozer Microarchitecture? Platform Processor Socket Chipset Opteron 4000 Opteron 4200 C32 56x0 / 5100 (codenamed

More information

External Docking Station for 2.5in or 3.5in SATA III 6Gbps Hard Drives - esata or USB 3.0 with UASP

External Docking Station for 2.5in or 3.5in SATA III 6Gbps Hard Drives - esata or USB 3.0 with UASP External Docking Station for 2.5in or 3.5in SATA III 6Gbps Hard Drives - esata or USB 3.0 with UASP Product ID: SDOCKU33EF This USB 3.0 and esata docking station makes it easy for you to connect and swap

More information

EXPLORING PARALLEL PROCESSING OPPORTUNITIES IN AERMOD. George Delic * HiPERiSM Consulting, LLC, Durham, NC, USA

EXPLORING PARALLEL PROCESSING OPPORTUNITIES IN AERMOD. George Delic * HiPERiSM Consulting, LLC, Durham, NC, USA EXPLORING PARALLEL PROCESSING OPPORTUNITIES IN AERMOD George Delic * HiPERiSM Consulting, LLC, Durham, NC, USA 1. INTRODUCTION HiPERiSM Consulting, LLC, has a mission to develop (or enhance) software and

More information

The EPU functions that are supported vary with motherboard models.

The EPU functions that are supported vary with motherboard models. E043 December 2009 / First Edition is an energy-efficient tool that provides you with a total system power-saving solution. It detects the current computer loading and intelligently adjusts the power usage

More information

Case Study IBM PowerPC 620

Case Study IBM PowerPC 620 Case Study IBM PowerPC 620 year shipped: 1995 allowing out-of-order execution (dynamic scheduling) and in-order commit (hardware speculation). using a reorder buffer to track when instruction can commit,

More information

Energy-centric DVFS Controlling Method for Multi-core Platforms

Energy-centric DVFS Controlling Method for Multi-core Platforms Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Static and Dynamic Frequency Scaling on Multicore CPUs

Static and Dynamic Frequency Scaling on Multicore CPUs Static and Dynamic Frequency Scaling on Multicore CPUs Wenlei Bao 1 Changwan Hong 1 Sudheer Chunduri 2 Sriram Krishnamoorthy 3 Louis-Noël Pouchet 4 Fabrice Rastello 5 P. Sadayappan 1 1 The Ohio State University

More information

Frame Shared Memory: Line-Rate Networking on Commodity Hardware John Giacomoni

Frame Shared Memory: Line-Rate Networking on Commodity Hardware John Giacomoni Frame Shared Memory: Line-Rate Networking on Commodity Hardware John Giacomoni John K. Bennett, Douglas C. Sicker, and Manish Vachharajani Alexander L. Wolf - Imperial College London Antonio Carzaniga

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

PoTrA: A framework for Building Power Models For Next Generation Multicore Architectures

PoTrA: A framework for Building Power Models For Next Generation Multicore Architectures www.bsc.es PoTrA: A framework for Building Power Models For Next Generation Multicore Architectures Part II: modeling methods Outline Background Known pitfalls Objectives Part I: Decomposable power models:

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,

More information

Crusoe Processor Model TM5800

Crusoe Processor Model TM5800 Model TM5800 Crusoe TM Processor Model TM5800 Features VLIW processor and x86 Code Morphing TM software provide x86-compatible mobile platform solution Processors fabricated in latest 0.13µ process technology

More information

A priori power estimation of linear solvers on multi-core processors

A priori power estimation of linear solvers on multi-core processors A priori power estimation of linear solvers on multi-core processors Dimitar Lukarski 1, Tobias Skoglund 2 Uppsala University Department of Information Technology Division of Scientific Computing 1 Division

More information

Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption. Jeremy Bennett

Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption. Jeremy Bennett Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption Jeremy Bennett Why? Ericsson T65 released 2001 Li-Ion 720 mah standby 300 h talk time 11 h includes

More information

CS3350B Computer Architecture CPU Performance and Profiling

CS3350B Computer Architecture CPU Performance and Profiling CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA The AMD64 Technology for Server and Workstation Dr. Ulrich Knechtel Enterprise Program Manager EMEA Agenda Direct Connect Architecture AMD Opteron TM Processor Roadmap Competition OEM support The AMD64

More information

Optimising Multicore JVMs. Khaled Alnowaiser

Optimising Multicore JVMs. Khaled Alnowaiser Optimising Multicore JVMs Khaled Alnowaiser Outline JVM structure and overhead analysis Multithreaded JVM services JVM on multicore An observational study Potential JVM optimisations Basic JVM Services

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

LECTURE 3:CPU SCHEDULING

LECTURE 3:CPU SCHEDULING LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Power Management for Embedded Systems

Power Management for Embedded Systems Power Management for Embedded Systems Minsoo Ryu Hanyang University Why Power Management? Battery-operated devices Smartphones, digital cameras, and laptops use batteries Power savings and battery run

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

USB 3.0 / esata Dual Hard Drive Docking Station with UASP for 2.5/3.5in SATA SSD / HDD SATA 6 Gbps

USB 3.0 / esata Dual Hard Drive Docking Station with UASP for 2.5/3.5in SATA SSD / HDD SATA 6 Gbps USB 3.0 / esata Dual Hard Drive Docking Station with UASP for 2.5/3.5in SATA SSD / HDD SATA 6 Gbps Product ID: SDOCK2U33EB The SDOCK2U33EB Dual 2.5/3.5" SATA hard drive docking station lets you dock and

More information

Parallel Computing. Parallel Computing. Hwansoo Han

Parallel Computing. Parallel Computing. Hwansoo Han Parallel Computing Parallel Computing Hwansoo Han What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple

More information

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007 Benchmarking of Dynamic Power Management Solutions Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007 Why Benchmarking?! From Here to There, 2000whatever Vendor NXP

More information

USB 3.0 to 4-Bay SATA 6Gbps Hard Drive Docking Station w/ UASP & Dual Fans - 2.5/3.5in SSD / HDD Dock

USB 3.0 to 4-Bay SATA 6Gbps Hard Drive Docking Station w/ UASP & Dual Fans - 2.5/3.5in SSD / HDD Dock USB 3.0 to 4-Bay SATA 6Gbps Hard Drive Docking Station w/ UASP & Dual Fans - 2.5/3.5in SSD / HDD Dock Product ID: SDOCK4U33 The SDOCK4U33 four-bay 2.5/3.5" SATA HDD / SSD docking station lets you dock

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Response Time and Throughput

Response Time and Throughput Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 22

ECE 571 Advanced Microprocessor-Based Design Lecture 22 ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ

More information

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.

More information

Part 1 of 3 -Understand the hardware components of computer systems

Part 1 of 3 -Understand the hardware components of computer systems Part 1 of 3 -Understand the hardware components of computer systems The main circuit board, the motherboard provides the base to which a number of other hardware devices are connected. Devices that connect

More information

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks Ryzen Agenda What is Ryzen? History Features Zen Architecture SenseMI Technology Master Software Benchmarks The Ryzen Chip What is Ryzen? CPU chip family released by AMD in 2017, which uses their latest

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

80 Plus Gold Certi ed

80 Plus Gold Certi ed P1 550B BEFX Designed for serious gamers and DIY professionals, the XFX XTR Series 650W Full Modular 80 Plus Gold power supply delivers the clean and stable power required for demanding gaming rigs and

More information

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,

More information

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor

More information

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI

More information

ECE 695 Numerical Simulations Lecture 3: Practical Assessment of Code Performance. Prof. Peter Bermel January 13, 2017

ECE 695 Numerical Simulations Lecture 3: Practical Assessment of Code Performance. Prof. Peter Bermel January 13, 2017 ECE 695 Numerical Simulations Lecture 3: Practical Assessment of Code Performance Prof. Peter Bermel January 13, 2017 Outline Time Scaling Examples General performance strategies Computer architectures

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

USB 3.0 Dual Hard Drive Docking Station with UASP for 2.5/3.5in SSD / HDD SATA 6 Gbps

USB 3.0 Dual Hard Drive Docking Station with UASP for 2.5/3.5in SSD / HDD SATA 6 Gbps USB 3.0 Dual Hard Drive Docking Station with UASP for 2.5/3.5in SSD / HDD SATA 6 Gbps Product ID: SDOCK2U33 The SDOCK2U33 Dual 2.5/3.5" SATA hard drive docking station lets you dock and swap drives from

More information

Charles Lefurgy IBM Research, Austin

Charles Lefurgy IBM Research, Austin Super-Dense Servers: An Energy-efficient Approach to Large-scale Server Clusters Outline Problem Internet data centers use a lot of energy Opportunity Load-varying applications Servers can be power-managed

More information

Kaisen Lin and Michael Conley

Kaisen Lin and Michael Conley Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC

More information

Outline. How Fast is -fast? Performance Analysis of KKD Applications using Hardware Performance Counters on UltraSPARC-III

Outline. How Fast is -fast? Performance Analysis of KKD Applications using Hardware Performance Counters on UltraSPARC-III Outline How Fast is -fast? Performance Analysis of KKD Applications using Hardware Performance Counters on UltraSPARC-III Peter Christen and Adam Czezowski CAP Research Group Department of Computer Science,

More information

Chapter 5. Introduction ARM Cortex series

Chapter 5. Introduction ARM Cortex series Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1

More information

Hakam Zaidan Stephen Moore

Hakam Zaidan Stephen Moore Hakam Zaidan Stephen Moore Outline Vector Architectures Properties Applications History Westinghouse Solomon ILLIAC IV CDC STAR 100 Cray 1 Other Cray Vector Machines Vector Machines Today Introduction

More information

Motion Control Computing Architectures for Ultra Precision Machines

Motion Control Computing Architectures for Ultra Precision Machines Motion Control Computing Architectures for Ultra Precision Machines Mile Erlic Precision MicroDynamics, Inc., #3-512 Frances Avenue, Victoria, B.C., Canada, V8Z 1A1 INTRODUCTION Several computing architectures

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

Multithreaded Value Prediction

Multithreaded Value Prediction Multithreaded Value Prediction N. Tuck and D.M. Tullesn HPCA-11 2005 CMPE 382/510 Review Presentation Peter Giese 30 November 2005 Outline Motivation Multithreaded & Value Prediction Architectures Single

More information

Performance Analysis in the Real World of Online Services

Performance Analysis in the Real World of Online Services Performance Analysis in the Real World of Online Services Dileep Bhandarkar, Ph. D. Distinguished Engineer 2009 IEEE International Symposium on Performance Analysis of Systems and Software My Background:

More information

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas

More information

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group Simultaneous Multi-threading Implementation in POWER5 -- IBM's Next Generation POWER Microprocessor Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group Outline Motivation Background Threading Fundamentals

More information

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS The most important thing we build is trust ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS UT840 LEON Quad Core First Silicon Results Cobham Semiconductor

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

MICROPROCESSOR TECHNOLOGY

MICROPROCESSOR TECHNOLOGY MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 20 Ch.10 Intel Core Duo Processor Architecture 2-Jun-15 1 Chapter Objectives Understand the concept of dual core technology. Look inside

More information

USB 3.0/eSATA Dual 3.5 SATA III Hard Drive External RAID Enclosure w/ UASP and Fan Black

USB 3.0/eSATA Dual 3.5 SATA III Hard Drive External RAID Enclosure w/ UASP and Fan Black USB 3.0/eSATA Dual 3.5 SATA III Hard Drive External RAID Enclosure w/ UASP and Fan Black Product ID: S3520BU33ER The S3520BU33ER 2-Bay RAID Enclosure offers a high-performance external storage solution,

More information

Potentials and Limitations for Energy Efficiency Auto-Tuning

Potentials and Limitations for Energy Efficiency Auto-Tuning Center for Information Services and High Performance Computing (ZIH) Potentials and Limitations for Energy Efficiency Auto-Tuning Parco Symposium Application Autotuning for HPC (Architectures) Robert Schöne

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Variations on Regression Models. Prof. Bennett Math Models of Data Science 2/02/06

Variations on Regression Models. Prof. Bennett Math Models of Data Science 2/02/06 Variations on Regression Models Prof. Bennett Math Models of Data Science 2/02/06 Outline Steps in modeling Review of Least Squares model Model in E & K pg 24-29 Aqualsol version of E&K Other loss functions

More information

Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution

Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution CSCE 614 (Fall 2017) Computer Architecture Homework Set # 1 COVER SHEET Please turn in with your own solution Eun Jung Kim Write your answers on the sheets provided. Submit with the COVER SHEET. If you

More information

A Smart Port Card Tutorial --- Hardware

A Smart Port Card Tutorial --- Hardware A Smart Port Card Tutorial --- Hardware John DeHart Washington University jdd@arl.wustl.edu http://www.arl.wustl.edu/~jdd 1 References: New Links from Kits References Page Intel Embedded Module: Data Sheet

More information

Hierarchical PLABs, CLABs, TLABs in Hotspot

Hierarchical PLABs, CLABs, TLABs in Hotspot Hierarchical s, CLABs, s in Hotspot Christoph M. Kirsch ck@cs.uni-salzburg.at Hannes Payer hpayer@cs.uni-salzburg.at Harald Röck hroeck@cs.uni-salzburg.at Abstract Thread-local allocation buffers (s) are

More information