Energy'Introspector:'Standard'Physical'Library' Interface'for'Full;System'Microarchitecture'and' Mul>;Physics'Simula>ons'
|
|
- Rolf Beasley
- 5 years ago
- Views:
Transcription
1 Energy'Introspector:'Standard'Physical'Library' Interface'for'Full;System''and' Mul>;Physics'Simula>ons'! William!Song,!Saibal!Mukhopadhyay,!Arun!Rodrigues,!! and!sudhakar!yalamanchili! Georgia!Ins?tute!of!Technology,!Atlanta,!GA! Sandia!Na?onal!Laboratories,!Albuquerque,!NM! SPONSORS!
2 Major'Challenges'! A!modeling!methodology!and!infrastructure!for!full$system* applica/on*+*microarchitecture*+*mul/$physics*simula/ons*! Interac?ons!between!applica?ons!and!physical!phenomena!! Interac?ve!dynamics!between!mul?ple,!dis?nct!physical!phenomena,! e.g.,!temperature!and!reliability!! Interface*design!between!system/microarchitecture!models!and! physical!models!! Standard*library!and!API*interface! Back!Side!Air!Cooling!! Ease!of!ReMuse! Cu!heat!spreader! Tier!4! Tier!3! Tier!2! Tier!1! BT!substrate! PCB! 2
3 State'of'the'Prac>ce' 1. Current!prac?ces!(e.g.,!traceMdriven!simula?ons)!are!insufficient!at! scale!to!address!research!problems!at!the!intersec?on!of! applica?ons,!microarchitecture,!power,!thermal,!and!reliability,!etc.!!!we*need*holis/c*modeling*environment.* Applica>ons'! Architecture' Physics' 2. A!single!model!does!not!provide!all!necessary!details!and!modeling! capabili?es!for!different!research!problems.!!!we*need*a*framework*that*is*open*to*the*composi/on*of*** *************new*or*updated*physical*models!(universal).* * *The*physical*models*and*simulators*should*not*have* * *******cross$dependency*in*implementa/ons*(standard!interface).* 3
4 Problem'Descrip>on' In!this!talk,!we!address!two!major!problems:! 1. Implemen>ng'integrated'power;thermal;reliability'simula>ons'! Why!do!we!need!microarchitecture,*mul/$physics*co$simula/on! environment!(vs*conven/onal*trace$driven*simula/ons)?! Simulations Performance* Counters* Power Modeling Power*Traces* Thermal Modeling Thermal*Traces* Instantaneous* Failure*Rates* Wear Modeling and Physical Configuration Package Configuration and Floorplanning Benchmarks Configuration Functional Emulation (Frontend) Instruction Stream Timing Simulation Access Counter Statistics Leakage Feedback Leakage Energy Power Modeling +! Dynamic Energy Floor-plan Power Thermal Modeling Clock Frequency Voltage Multi-Physics Interactions Management Algorithms Controller Trace&driven+simula0ons vs Full&system+integrated+simula0ons+ + Which+simula0on+model+should+we+use?+ Cumulative Failure Rate Reliability Modeling Floor-plan Temperature 4
5 Problem'Descrip>on'(cont.)' 2. Incorpora>ng'Various'Implementa>ons'of'Modeling'Tools'via' Standard+Libraries+! A!single!model!doesn t!provide!all!necessary!details!or!modeling! capabili?es!for!different!research!problems.!! Open'to'Integra>on:!Can!we!standardize*the*interface*and* integra/on!of!models?!!!so]ware!engineering!problem! McPAT, DSENT, etc. HotSpot, 3D-ICE, etc. BTI, TDDB, etc. Noise, Delay, etc. Power Library Thermal Library Reliability Library Other Libraries Standard'Mul>;Physics'Library'Interface' 'Simula>on'Framework' 5
6 General'Prac>ces:'Trace;Driven'Simula>ons' Simulations Performance Counters Power Modeling Power Traces Thermal Modeling Thermal Traces Wear Modeling Instantaneous Failure Rates Trace$driven*simula/on!is!the!most!commonly! used!approach!to!analyze!physical!impacts!of! microarchitectural!opera?ons.! Each!step!is!an!independent!simula?on.! This!approach!does!not!capture!feedback* interac/ons!between!the!models! (e.g.,!temperature!!leakage!power!feedback).! This!simula?on!approach!can!only!be!used!to! model!monotonous*processor*execu/ons!! (i.e.,!steadymstate!analysis).! Ref:*A.*Coskun,*T.*Rosing,*K.*Mihic,*G.*Micheli,*and*Y.*Leblebici,* Analysis*and*Op/miza/on*of*MPSoC* Reliability, *JOLPE,*Jan.*2006.** 6
7 Full;System''&'Mul>;Physics'' Co;Simula>ons'! We!propose!an!integrated!microarchitecture!and!mul?Mphysics! simula?on!model:! Benchmarks Functional Emulation (Frontend) Instruction Stream Management Algorithms Configuration Timing Simulation Clock Frequency Controller and Physical Configuration Package Configuration and Floorplanning Access Counter Statistics Leakage Feedback Leakage Energy Power Modeling +! Dynamic Energy Floor-plan Power Thermal Modeling Voltage Multi-Physics Interactions Cumulative Failure Rate Reliability Modeling Floor-plan Temperature 7
8 Library'Integra>on'and'Mul>;Physics'Interac>ons'! By!standardizing!individual!tools!into!a!set!of!libraries,!linking* library*models*becomes*a*problem*of*describing*physical* interac/ons,!instead!of!the!so]ware!integra?on!of!tools.! Benchmarks Functional Emulation (Frontend) Instruction Stream Management Algorithms Power Models (Power Library) Cacti/McPAT, Orion/DSENT, etc. and Physical Configuration Package Configuration and Floorplanning Thermal Models (Thermal Library) 3D-ICE, HotSpot, Microfluidics, etc. Configuration Timing Simulation Access Counter Statistics Leakage Feedback Leakage Energy Power Modeling +! Dynamic Energy Floor-plan Power Thermal Modeling Clock Frequency Voltage Multi-Physics Interactions Controller Cumulative Failure Rate Reliability Modeling Floor-plan Temperature Wear Models (Reliability Library) NBTI, TDDB, HCI, Electromigration, etc. 8
9 SoPware'Engineering'Problems'! With!mul?ple!models!integrated!into!the!same!framework,!there! are!several!engineering!problems!to!be!resolved:! 1. Unified'Processor'Configura>on:!! There!has!to!be!a!way!to!associate*different*physical*models* with*different*processor*components,!e.g.,!sram!vs.!logic!vs.! interconnect,!thermal!vs.!reliability,!etc.*! Different*physical*phenomena*are*characterized*at*different* levels*of*processor*abstrac/on,!i.e.,!package,!floormplan,! architectural!unit,!etc.! 2. Data'Synchroniza>on'and'Manipula>on:!! Physical!interac?ons/data!across!different!models!must!be! synchronized.! Ref:*W.*Song,*S.*Mukhopadhyay,*and*S.*Yalamanchili,* Energy*Introspector:*A*Parallel,*Composable* Framework*for*Integrated*Power$Reliability$Thermal*Modeling*for*Mul/core*Architectures, *ISPASS* (Short*Paper),*Mar.*2014.* 9
10 Associa>ng'Physical'Models'and'Processor'Components''! A!pseudo*component!is!a!physically* defined*unit!where!a!model!can! es?mate!physical!phenomena,!! e.g.!l2!cache!power.!! A!processor!is!configured!by! composing!pseudo*component* hierarchy.!! Libraries*are*a]ached*to*pseudo* components!and!simulate!different! physical!phenomena!at!different! levels! Pseudo Component (Floor-plan: Core0) Model Library = Reliability Library Pseudo Component (Source: Inst$) Model Library = Energy Library Instruction Cache Instruction TLB Floor7planning) Fetch Buffer Branch Prediction Pseudo Component (Floor-plan: Core1) Pseudo Component (Cores: Intermediate) Model Library = Reliability Library Pseudo Component (Source: Registers) Model Library = Energy Library Instruction Decoder Packaging) Architecture)Decomposi/on) Pseudo Component (Package) Instruction Window Data TLB Model Library = NULL Pseudo Component (Floor-plan: CoreN) Data)Reference)) (Physical)Interac/on)) Model Library = Reliability Library Pseudo Component (Source: ALUs) Model Library = Energy Library Register Files L1 Data Cache Model Library = Thermal Library Pseudo Component (Uncore: Floor-plan) Model Library = Reliability Library Pseudo Component (Source: L2$) Model Library = Energy Library ALU FPU ST LD L2 Data Cache On-Chip Network 10
11 Standard'API'Func>ons'! The!Energy!Introspector!framework!provides!a*set*of*API* func/ons!to!be!called!by!user!architecture!simulators.! while(simulation runs) {! do (architecture timing simulation); // User Architecture Simulation! sampling point) {! // Power Calculation! for(all architecture components to calculate power) {! EI_client->calculate_power(arch_component_id, current_time,! sampling_interval, access_counters);! }! // Thermal Calculation Models are internally synchronized.! EI_client->calculate_temperature(package_component_id, current_time,! sampling_interval);! // Reliability (Failure Rate) Calculation! for(all components to calculate reliability) {! EI_client->calculate_failure_rate(block_component_id, current_time,! sampling_interval);! }! // Probe any component to collect data! int err_code = EI_client->pull_data(component_id, current_time,! sampling_interval, data_type, &data);! // Apply execution control (i.e., voltage scaling)! int err_code = EI_client->push_and_synchronize_data(component_id, current_time,! sampling_interval, EI_DATA_VOLTAGE, &new_voltage);! }! }! 11
12 Energy'Introspector'is'An'Enabler.'! The!primary!goal!of!the!Energy!Introspector!is!to!enable!the! explora/on*at*the*intersec/on*of*microarchitecture,*power,* thermal,*and*reliability.!! We!provide!several!exemplary!studies:! 1. Mul/core**and*Microfluidics*Cooling*in*3D*ICs* 2. Power,*thermal,*and*throughput*regula/on*via*adap/ve*control* algorithms*in*mul/core*processors* 3. GPU*power*modeling*with*McPAT* 4. Life/me*reliability*characteriza/on*and*management*in*mul/core* processors* 12
13 Case'I:'Mul>core''and'Microfluidics' Cooling'in'3D'ICs'! How!much!leakage*power*saving!or!improvement*in*energy* efficiency!can!microfluidics*cooling*achieve!as!a!func?on!of! layering,!pinfin!geometry,!and!pumping!power?!! Energy!Introspector!captures!interac/ons*between*mul/ple* physical*metrics.* 1. Z.*Wan,*H.*Xiao,*Y.*Joshi,*and*S.*Yalamanchili,* Co$Design*of*Mul/core*Architectures*and*Microfluidic* Cooling*for*3D*Stacked*ICs, *Therminic,*2013.* 2. H.*Xiao,*Z.*Wan,*S.*Yalamanchili,*and*Y.*Joshi,* Leakage*Power*Characteriza/on*and*Minimiza/on*over* 3D*Stacked*Mul/$core*Chip*with*Microfluidic*Cooling, *SemiTherm,*2014.* 13
14 Case'II:'Power,'Thermal,'and'Throughput'Regula>on'via' Adap>ve'Control'Algorithms'in'Mul>core'Processors'! Adap/ve*control*algorithms!u?lize!the!DVFS!capability!of! microprocessors!to!regulate!power,!thermal,!or!throughput!to! constant!level.!! Energy!Introspector!provides!an! interface*to*apply*dynamic* execu/on*controls,!e.g.,!dvfs.! 1. N.*Almoosa,*W.*Song,*S.*Yalamanchili,*and*Y.*Wardi,* Throughput*Regula/on*in*Mul/core*Processors* via*ipa, *CDC,*2012.* 2. N.*Almoosa,*W.*Song,*S.*Yalamanchili,*and*Y.*Wardi,* A*Power*Capping*Controller*for*Mul/core* Processors, *ACC,*2012.* 14
15 Case'III:'GPU'Power'Modeling'Using'McPAT'! Basic!models!of!the!McPAT!(i.e.,!caches,!interconnects,!latches,! etc)!are!remorganized!to!configure!a!gpu!architecture.!! Energy!Introspector!is!configurable*to*model*different* microarchitecture!or!processor!designs.! 1. J.*Lim,*N.*Lakshminarayana,*H.*Kim,*W.*Song,*S.*Yalamanchili,*and*W.*Sung,* Power*Modeling*for*GPU* Architectures*Using*McPAT, *TODEAS,*June*2014.* 15
16 Case'IV:'Life>me'Reliability'Characteriza>on'and' Management'in'Mul>core'Processors'! Variance*Reduc/on:!Adap?ve!control!on!core!execu?on!reduces! variances!in!life?me!reliability!distribu?on!across!the!mul?core!dies! and!improves!overall!processormlevel!life?me!reliability.!! Energy!Introspector!provides!an*integrated*applica/on$ microarchitecture$power$thermal$life/me*reliability*simula/on.! Normalized+MTTF+ 1$ 0.9$ 0.8$ 0.7$ 0.6$ 0.5$ 100%$ 90%$ 80%$ 70%$ 60%$ Performability+Threshold+ 50%$ μ=1.0,σ=0.05$ μ=1.0,σ=0.10$ μ=1.0,σ=0.15$ μ=1.0,σ=0.20$ Normalized+MTTF+ 1$ 0.9$ 0.8$ 0.7$ 0.6$ 0.5$ 100%$ Prac4cal$Region$of$Life4me$ 90%$ 80%$ 70%$ 60%$ 50%$ Performability+Threshold+ μ=1.0,σ=0.20$ μ=0.80,σ=0.05$ 1. W.*Song,*S.*Mukhopadhyay,*and*S.*Yalamanchili,* Architectural*Reliability:*Life/me*Reliability* Characteriza/on*and*Management*of*Many$Core*Processors, *CAL,*2014.* 16
17 Conclusion'and'Future'Works'! In!general!prac?ces,!microarchitecture!and!physical!proper?es! have!been!analyzed!in!separate*models*and*simula/ons.!! Such!a!approach!is!difficult*to*capture*inter$dependency!between! various!physical!phenomena!and!their!impacts!on! microarchitecture.!! Therefore,!holis/c*modeling*and*simula/on*environment!is! essen?al!to!enable!the!explora?ons!at!the!intersec?on!of! applica?ons,!microarchitecture,!power/energy,!thermal/cooling,! reliability,!etc.!! Our!standard!library!interface!is!scalable!to!incorporate!further! physical!phenomena!and!their!models.! 17
18 Summary'! What'is'the'major'contribu>on'of''your'research?'! An!infrastructure!to!enable!integrated*applica/on,*microarchitecture,* and*mul/$physics*simula/ons,!based!on!library*implementa/on/ integra/on*of*physical*models.!!!integrated*infrastructure,*scalable*framework,*standard*api*! What'are'the'gaps'you'iden>fy'in'the'research'coverage'in'your' area?'! Lack!of!standards!in!using!physical!models!! Lack!of!parallel!implementa?on!of!computa?onally!intensive!models! or!large!coremcount!systems!! Fast!compact!models!for!thermal!and!power!delivery!! Need!agreement!on!power!models!across!technologies!! Need!of!higherMlevel!(applica?onMlevel)!models! 18
19 Summary'(cont.)'! What'is'the'bigger'picture'for'your'research'(i.e.,'synerge>c'or' complementary'projects)?'! Applica?onMArchitecture!CoMdesign!! MeasurementMbased!valida?on!infrastructure!! What'major'opportuni>es'do'you'see'for'cross;pollina>on' between'your'projects'and'others?'! Acquiring!new,!beier!physical!models!! Par?cipa?on!in!coMdesign!ac?vi?es,!e.g.,!architectureMapplica?on,! power!deliverympower!management!! PlajormMneutral!models!of!physical!behaviors!from!applica?ons!! Interac?on!with!compiler!and!run!?me!projects! 19
20 Summary'(cont.)'! What'is'one'thing'that'would'make'it'easier/possible'to'leverage/ use'the'results'of'other'projects'to'further'your'own'research?'! Standardiza?on!of!APIs!for!accessing/exercising!physical!models!at! mul?ple!levels!of!abstrac?on!! What'would'you'like'to'most'see'solved/addressed'other'than' what'you'are'working'on?'! Characteriza?on!of!fault!behaviors!as!a!func?on!of!physical!phenomena! and!applica?on!demand!! Higher!level!applica?on!models!to!drive!powerMthermalMreliability! analysis! 20
Energy Introspector: Coordinated Architecture-Level Simulation of Processor Physics
Energy Introspector: Coordinated Architecture-Level Simulation of Processor Physics William J. Song, Saibal Mukhopadhyay, Arun Rodrigues and Sudhakar Yalamanchili School of Electrical and Computer Engineering,
More informationManifold: A Parallel Simulation Framework for Multicore Systems
Manifold: A Parallel Simulation Framework for Multi Systems Jun Wang, Jesse Beu, Rishiraj Bheda, Tom Conte, Zhenjiang Dong, Chad Kersey, Mitchelle Rasquinha, George Riley, William Song, He Xiao, Peng Xu,
More informationHighly Parallel Wafer Level Reliability Systems with PXI SMUs
Highly Parallel Wafer Level Reliability Systems with PXI SMUs Submitted by National Instruments Overview Reliability testing has long served as a method of ensuring that semiconductor devices maintain
More informationMANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES
MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES A Dissertation Presented to The Academic Faculty By William J. Song In Partial Fulfillment Of the Requirements
More informationPower and Thermal Models. for RAMP2
Power and Thermal Models for 2 Jose Renau Department of Computer Engineering, University of California Santa Cruz http://masc.cse.ucsc.edu Motivation Performance not the only first order design parameter
More informationThe University of Texas at Austin
EE382 (20): Computer Architecture - Parallelism and Locality Lecture 4 Parallelism in Hardware Mattan Erez The University of Texas at Austin EE38(20) (c) Mattan Erez 1 Outline 2 Principles of parallel
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationWilliam Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function
William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Registers
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationLecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2
Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time
More informationA Scheme of Predictor Based Stream Buffers. Bill Hodges, Guoqiang Pan, Lixin Su
A Scheme of Predictor Based Stream Buffers Bill Hodges, Guoqiang Pan, Lixin Su Outline Background and motivation Project hypothesis Our scheme of predictor-based stream buffer Predictors Predictor table
More informationItanium 2 Processor Microarchitecture Overview
Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs
More informationAR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More informationXT Node Architecture
XT Node Architecture Let s Review: Dual Core v. Quad Core Core Dual Core 2.6Ghz clock frequency SSE SIMD FPU (2flops/cycle = 5.2GF peak) Cache Hierarchy L1 Dcache/Icache: 64k/core L2 D/I cache: 1M/core
More informationVLIW Digital Signal Processor. Michael Chang. Alison Chen. Candace Hobson. Bill Hodges
VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges Introduction Functionality ISA Implementation Functional blocks Circuit analysis Testing Off Chip Memory Status Things
More informationProblem Set 1 Solutions
CSE 260 Digital Computers: Organization and Logical Design Jon Turner Problem Set 1 Solutions 1. Give a brief definition of each of the following parts of a computer system: CPU, main memory, floating
More informationENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013
ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of
More informationArchitectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.
Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central
More informationSam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation
Next Generation Itanium Processor Overview Gary Hammond Principal Architect Enterprise Platform Group Corporation August 27-30, 2001 Sam Naffziger Lead Circuit Architect Microprocessor Technology Lab HP
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationEvaluation of RISC-V RTL with FPGA-Accelerated Simulation
Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, Krste Asanovic CARRV 2017 10/14/2017 Evaluation Methodologies For Computer
More informationComputer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can
More informationComputer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John
Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Desirable features for modeling/evaluation techniques Accurate Not expensive Non-invasive User-friendly Fast Easy to change
More informationTEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems
International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationCPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner
CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that
More informationProcessors. Young W. Lim. May 12, 2016
Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version
More informationCS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19
More informationGood luck and have fun!
Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.
More informationModule 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1
Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 8 General Purpose Processors - I Version 2 EE IIT, Kharagpur 2 In this lesson the student will learn the following Architecture
More information" # " $ % & ' ( ) * + $ " % '* + * ' "
! )! # & ) * + * + * & *,+,- Update Instruction Address IA Instruction Fetch IF Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Program Counter Instruction Register Register File
More informationSRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design
SRAMs to Memory Low Power VLSI System Design Lecture 0: Low Power Memory Design Prof. R. Iris Bahar October, 07 Last lecture focused on the SRAM cell and the D or D memory architecture built from these
More informationare Softw Instruction Set Architecture Microarchitecture are rdw
Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics
More informationMesocode: Optimizations for Improving Fetch Bandwidth of Future Itanium Processors
: Optimizations for Improving Fetch Bandwidth of Future Itanium Processors Marsha Eng, Hong Wang, Perry Wang Alex Ramirez, Jim Fung, and John Shen Overview Applications of for Itanium Improving fetch bandwidth
More informationA Framework for Modeling GPUs Power Consumption
A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January
More informationAgenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks
Ryzen Agenda What is Ryzen? History Features Zen Architecture SenseMI Technology Master Software Benchmarks The Ryzen Chip What is Ryzen? CPU chip family released by AMD in 2017, which uses their latest
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationCPU ARCHITECTURE. QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system.
CPU ARCHITECTURE QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system. ANSWER 1 Data Bus Width the width of the data bus determines the number
More informationCS252 Prerequisite Quiz. Solutions Fall 2007
CS252 Prerequisite Quiz Krste Asanovic Solutions Fall 2007 Problem 1 (29 points) The followings are two code segments written in MIPS64 assembly language: Segment A: Loop: LD r5, 0(r1) # r5 Mem[r1+0] LD
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More information6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU
1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high
More informationMachine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationDesign and Analysis of Real-Time Systems Microarchitectural Analysis
Design and Analysis of Real-Time Systems Microarchitectural Analysis Jan Reineke Advanced Lecture, Summer 2013 Structure of WCET Analyzers Reconstructs a control-flow graph from the binary. Determines
More informationc. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?
Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined
More informationPower Measurement Using Performance Counters
Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power
More informationSuperscalar Machines. Characteristics of superscalar processors
Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance
More informationDesign and Implementation of a FPGA-based Pipelined Microcontroller
Design and Implementation of a FPGA-based Pipelined Microcontroller Rainer Bermbach, Martin Kupfer University of Applied Sciences Braunschweig / Wolfenbüttel Germany Embedded World 2009, Nürnberg, 03.03.09
More informationPentium 4 Processor Block Diagram
FP FP Pentium 4 Processor Block Diagram FP move FP store FMul FAdd MMX SSE 3.2 GB/s 3.2 GB/s L D-Cache and D-TLB Store Load edulers Integer Integer & I-TLB ucode Netburst TM Micro-architecture Pipeline
More informationKeyStone II. CorePac Overview
KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB
More informationECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories
ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-06 1 Processor and L1 Cache Interface
More informationMeltdown and Spectre: Complexity and the death of security
Meltdown and Spectre: Complexity and the death of security May 8, 2018 Meltdown and Spectre: Wait, my computer does what? May 8, 2018 Meltdown and Spectre: Whoever thought that was a good idea? May 8,
More informationCPU Structure and Function
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com http://www.yildiz.edu.tr/~naydin CPU Structure and Function 1 2 CPU Structure Registers
More informationCPU Structure and Function
CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationSuperscalar Processors Ch 14
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationSuperscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationUNIT- 5. Chapter 12 Processor Structure and Function
UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationMicro-programmed Control Ch 17
Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationFor this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units
CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)
More informationFrom the table we can see that the main contribution to. EDA Publishing/THERMINIC 2011
Single-hip loud omputer Thermal odel ohammadsadegh Sadri, Andrea Bartolini, Luca Benini University of Bologna Via Risorgimento, 2, 40136 Bologna, Italy Tel:0039(0)512093787;Fax:0039(0)512093785, Email:mohammadsadegh.sadr2,a.bartolini,luca.benini@unibo.it
More informationHardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions
Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary Hardwired Control (4) Complex Fast Difficult to design Difficult to modify
More informationModule 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.
Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch
More informationAn overview of standard cell based digital VLSI design
An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased
More informationHigh-level Power Estimation. Naehyuck Chang Dept. of EECS/CSE Seoul National University
High-level Power Estimation Naehyuck Chang Dept. of EECS/CSE Seoul National University naehyuck@snu.ac.kr 1 Power Macromodeling Motivation Hard to estimate the switching activity at higher levels Logic
More informationStrober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL
Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, Krste Asanović
More informationThe CPU Pipeline. MIPS R4000 Microprocessor User's Manual 43
The CPU Pipeline 3 This chapter describes the basic operation of the CPU pipeline, which includes descriptions of the delay instructions (instructions that follow a branch or load instruction in the pipeline),
More informationIF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB
EE 4720 Homework 4 Solution Due: 22 April 2002 To solve Problem 3 and the next assignment a paper has to be read. Do not leave the reading to the last minute, however try attempting the first problem below
More informationOPENSPARC T1 OVERVIEW
Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share
More informationFull Name: NetID: Midterm Summer 2017
Full Name: NetID: Midterm Summer 2017 OAKLAND UNIVERSITY, School of Engineering and Computer Science CSE 564: Computer Architecture Please write and/or mark your answers clearly and neatly; answers that
More informationLecture 4: RISC Computers
Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationCS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit
More informationPowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors
PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors Peter Sandon Senior PowerPC Processor Architect IBM Microelectronics All information in these materials is subject to
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory
More informationData-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001
Data-flow prescheduling for large instruction windows in out-of-order processors Pierre Michaud, André Seznec IRISA / INRIA January 2001 2 Introduction Context: dynamic instruction scheduling in out-oforder
More informationCS/EE 260. Digital Computers Organization and Logical Design
CS/EE 260. Digital Computers Organization and Logical Design David M. Zar Computer Science and Engineering Department Washington University dzar@cse.wustl.edu http://www.cse.wustl.edu/~dzar/class/260 Digital
More informationLecture 1 An Overview of High-Performance Computer Architecture. Automobile Factory (note: non-animated version)
Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Fall 2002 Edward F. Gehringer Automobile Factory (note: non-animated version) Automobile Factory (note: non-animated version)
More informationXMTSim: A Simulator of the XMT Many-core Architecture
XMTSim: A Simulator of the XMT Many-core Architecture Fuat Keceli Intel Corporation Hillsboro, OR 97124 fuat.keceli@intel.com Uzi Vishkin Department of Electrical and Computer Engineering University of
More informationCS3350B Computer Architecture CPU Performance and Profiling
CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada
More informationCourse on Advanced Computer Architectures
Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1
More informationProfiling: Understand Your Application
Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel
More informationWilliam Stallings Computer Organization and Architecture
William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions
More informationStanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015
Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 What is Intel Processor Trace? Intel Processor Trace (Intel PT) provides hardware a means to trace branching, transaction, and timing information
More informationComputer Architecture and Organization (CS-507)
Computer Architecture and Organization (CS-507) Muhammad Zeeshan Haider Ali Lecturer ISP. Multan ali.zeeshan04@gmail.com https://zeeshanaliatisp.wordpress.com/ Lecture 4 Basic Computer Function, Instruction
More informationIntegrated CPU and Cache Power Management in Multiple Clock Domain Processors
Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC
More informationComputer Architecture
Computer Architecture Pipelined and Parallel Processor Design Michael J. Flynn Stanford University Technische Universrtat Darmstadt FACHBEREICH INFORMATIK BIBLIOTHEK lnventar-nr.: Sachgebiete: Standort:
More informationTechniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company
Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities
More informationComputer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 13:
More informationLEON4: Fourth Generation of the LEON Processor
LEON4: Fourth Generation of the LEON Processor Magnus Själander, Sandi Habinc, and Jiri Gaisler Aeroflex Gaisler, Kungsgatan 12, SE-411 19 Göteborg, Sweden Tel +46 31 775 8650, Email: {magnus, sandi, jiri}@gaisler.com
More informationLecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ
Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)
More informationLecture 8: RISC & Parallel Computers. Parallel computers
Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer
More informationSuperscalar Processors
Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance
More information