Energy'Introspector:'Standard'Physical'Library' Interface'for'Full;System'Microarchitecture'and' Mul>;Physics'Simula>ons'

Size: px
Start display at page:

Download "Energy'Introspector:'Standard'Physical'Library' Interface'for'Full;System'Microarchitecture'and' Mul>;Physics'Simula>ons'"

Transcription

1 Energy'Introspector:'Standard'Physical'Library' Interface'for'Full;System''and' Mul>;Physics'Simula>ons'! William!Song,!Saibal!Mukhopadhyay,!Arun!Rodrigues,!! and!sudhakar!yalamanchili! Georgia!Ins?tute!of!Technology,!Atlanta,!GA! Sandia!Na?onal!Laboratories,!Albuquerque,!NM! SPONSORS!

2 Major'Challenges'! A!modeling!methodology!and!infrastructure!for!full$system* applica/on*+*microarchitecture*+*mul/$physics*simula/ons*! Interac?ons!between!applica?ons!and!physical!phenomena!! Interac?ve!dynamics!between!mul?ple,!dis?nct!physical!phenomena,! e.g.,!temperature!and!reliability!! Interface*design!between!system/microarchitecture!models!and! physical!models!! Standard*library!and!API*interface! Back!Side!Air!Cooling!! Ease!of!ReMuse! Cu!heat!spreader! Tier!4! Tier!3! Tier!2! Tier!1! BT!substrate! PCB! 2

3 State'of'the'Prac>ce' 1. Current!prac?ces!(e.g.,!traceMdriven!simula?ons)!are!insufficient!at! scale!to!address!research!problems!at!the!intersec?on!of! applica?ons,!microarchitecture,!power,!thermal,!and!reliability,!etc.!!!we*need*holis/c*modeling*environment.* Applica>ons'! Architecture' Physics' 2. A!single!model!does!not!provide!all!necessary!details!and!modeling! capabili?es!for!different!research!problems.!!!we*need*a*framework*that*is*open*to*the*composi/on*of*** *************new*or*updated*physical*models!(universal).* * *The*physical*models*and*simulators*should*not*have* * *******cross$dependency*in*implementa/ons*(standard!interface).* 3

4 Problem'Descrip>on' In!this!talk,!we!address!two!major!problems:! 1. Implemen>ng'integrated'power;thermal;reliability'simula>ons'! Why!do!we!need!microarchitecture,*mul/$physics*co$simula/on! environment!(vs*conven/onal*trace$driven*simula/ons)?! Simulations Performance* Counters* Power Modeling Power*Traces* Thermal Modeling Thermal*Traces* Instantaneous* Failure*Rates* Wear Modeling and Physical Configuration Package Configuration and Floorplanning Benchmarks Configuration Functional Emulation (Frontend) Instruction Stream Timing Simulation Access Counter Statistics Leakage Feedback Leakage Energy Power Modeling +! Dynamic Energy Floor-plan Power Thermal Modeling Clock Frequency Voltage Multi-Physics Interactions Management Algorithms Controller Trace&driven+simula0ons vs Full&system+integrated+simula0ons+ + Which+simula0on+model+should+we+use?+ Cumulative Failure Rate Reliability Modeling Floor-plan Temperature 4

5 Problem'Descrip>on'(cont.)' 2. Incorpora>ng'Various'Implementa>ons'of'Modeling'Tools'via' Standard+Libraries+! A!single!model!doesn t!provide!all!necessary!details!or!modeling! capabili?es!for!different!research!problems.!! Open'to'Integra>on:!Can!we!standardize*the*interface*and* integra/on!of!models?!!!so]ware!engineering!problem! McPAT, DSENT, etc. HotSpot, 3D-ICE, etc. BTI, TDDB, etc. Noise, Delay, etc. Power Library Thermal Library Reliability Library Other Libraries Standard'Mul>;Physics'Library'Interface' 'Simula>on'Framework' 5

6 General'Prac>ces:'Trace;Driven'Simula>ons' Simulations Performance Counters Power Modeling Power Traces Thermal Modeling Thermal Traces Wear Modeling Instantaneous Failure Rates Trace$driven*simula/on!is!the!most!commonly! used!approach!to!analyze!physical!impacts!of! microarchitectural!opera?ons.! Each!step!is!an!independent!simula?on.! This!approach!does!not!capture!feedback* interac/ons!between!the!models! (e.g.,!temperature!!leakage!power!feedback).! This!simula?on!approach!can!only!be!used!to! model!monotonous*processor*execu/ons!! (i.e.,!steadymstate!analysis).! Ref:*A.*Coskun,*T.*Rosing,*K.*Mihic,*G.*Micheli,*and*Y.*Leblebici,* Analysis*and*Op/miza/on*of*MPSoC* Reliability, *JOLPE,*Jan.*2006.** 6

7 Full;System''&'Mul>;Physics'' Co;Simula>ons'! We!propose!an!integrated!microarchitecture!and!mul?Mphysics! simula?on!model:! Benchmarks Functional Emulation (Frontend) Instruction Stream Management Algorithms Configuration Timing Simulation Clock Frequency Controller and Physical Configuration Package Configuration and Floorplanning Access Counter Statistics Leakage Feedback Leakage Energy Power Modeling +! Dynamic Energy Floor-plan Power Thermal Modeling Voltage Multi-Physics Interactions Cumulative Failure Rate Reliability Modeling Floor-plan Temperature 7

8 Library'Integra>on'and'Mul>;Physics'Interac>ons'! By!standardizing!individual!tools!into!a!set!of!libraries,!linking* library*models*becomes*a*problem*of*describing*physical* interac/ons,!instead!of!the!so]ware!integra?on!of!tools.! Benchmarks Functional Emulation (Frontend) Instruction Stream Management Algorithms Power Models (Power Library) Cacti/McPAT, Orion/DSENT, etc. and Physical Configuration Package Configuration and Floorplanning Thermal Models (Thermal Library) 3D-ICE, HotSpot, Microfluidics, etc. Configuration Timing Simulation Access Counter Statistics Leakage Feedback Leakage Energy Power Modeling +! Dynamic Energy Floor-plan Power Thermal Modeling Clock Frequency Voltage Multi-Physics Interactions Controller Cumulative Failure Rate Reliability Modeling Floor-plan Temperature Wear Models (Reliability Library) NBTI, TDDB, HCI, Electromigration, etc. 8

9 SoPware'Engineering'Problems'! With!mul?ple!models!integrated!into!the!same!framework,!there! are!several!engineering!problems!to!be!resolved:! 1. Unified'Processor'Configura>on:!! There!has!to!be!a!way!to!associate*different*physical*models* with*different*processor*components,!e.g.,!sram!vs.!logic!vs.! interconnect,!thermal!vs.!reliability,!etc.*! Different*physical*phenomena*are*characterized*at*different* levels*of*processor*abstrac/on,!i.e.,!package,!floormplan,! architectural!unit,!etc.! 2. Data'Synchroniza>on'and'Manipula>on:!! Physical!interac?ons/data!across!different!models!must!be! synchronized.! Ref:*W.*Song,*S.*Mukhopadhyay,*and*S.*Yalamanchili,* Energy*Introspector:*A*Parallel,*Composable* Framework*for*Integrated*Power$Reliability$Thermal*Modeling*for*Mul/core*Architectures, *ISPASS* (Short*Paper),*Mar.*2014.* 9

10 Associa>ng'Physical'Models'and'Processor'Components''! A!pseudo*component!is!a!physically* defined*unit!where!a!model!can! es?mate!physical!phenomena,!! e.g.!l2!cache!power.!! A!processor!is!configured!by! composing!pseudo*component* hierarchy.!! Libraries*are*a]ached*to*pseudo* components!and!simulate!different! physical!phenomena!at!different! levels! Pseudo Component (Floor-plan: Core0) Model Library = Reliability Library Pseudo Component (Source: Inst$) Model Library = Energy Library Instruction Cache Instruction TLB Floor7planning) Fetch Buffer Branch Prediction Pseudo Component (Floor-plan: Core1) Pseudo Component (Cores: Intermediate) Model Library = Reliability Library Pseudo Component (Source: Registers) Model Library = Energy Library Instruction Decoder Packaging) Architecture)Decomposi/on) Pseudo Component (Package) Instruction Window Data TLB Model Library = NULL Pseudo Component (Floor-plan: CoreN) Data)Reference)) (Physical)Interac/on)) Model Library = Reliability Library Pseudo Component (Source: ALUs) Model Library = Energy Library Register Files L1 Data Cache Model Library = Thermal Library Pseudo Component (Uncore: Floor-plan) Model Library = Reliability Library Pseudo Component (Source: L2$) Model Library = Energy Library ALU FPU ST LD L2 Data Cache On-Chip Network 10

11 Standard'API'Func>ons'! The!Energy!Introspector!framework!provides!a*set*of*API* func/ons!to!be!called!by!user!architecture!simulators.! while(simulation runs) {! do (architecture timing simulation); // User Architecture Simulation! sampling point) {! // Power Calculation! for(all architecture components to calculate power) {! EI_client->calculate_power(arch_component_id, current_time,! sampling_interval, access_counters);! }! // Thermal Calculation Models are internally synchronized.! EI_client->calculate_temperature(package_component_id, current_time,! sampling_interval);! // Reliability (Failure Rate) Calculation! for(all components to calculate reliability) {! EI_client->calculate_failure_rate(block_component_id, current_time,! sampling_interval);! }! // Probe any component to collect data! int err_code = EI_client->pull_data(component_id, current_time,! sampling_interval, data_type, &data);! // Apply execution control (i.e., voltage scaling)! int err_code = EI_client->push_and_synchronize_data(component_id, current_time,! sampling_interval, EI_DATA_VOLTAGE, &new_voltage);! }! }! 11

12 Energy'Introspector'is'An'Enabler.'! The!primary!goal!of!the!Energy!Introspector!is!to!enable!the! explora/on*at*the*intersec/on*of*microarchitecture,*power,* thermal,*and*reliability.!! We!provide!several!exemplary!studies:! 1. Mul/core**and*Microfluidics*Cooling*in*3D*ICs* 2. Power,*thermal,*and*throughput*regula/on*via*adap/ve*control* algorithms*in*mul/core*processors* 3. GPU*power*modeling*with*McPAT* 4. Life/me*reliability*characteriza/on*and*management*in*mul/core* processors* 12

13 Case'I:'Mul>core''and'Microfluidics' Cooling'in'3D'ICs'! How!much!leakage*power*saving!or!improvement*in*energy* efficiency!can!microfluidics*cooling*achieve!as!a!func?on!of! layering,!pinfin!geometry,!and!pumping!power?!! Energy!Introspector!captures!interac/ons*between*mul/ple* physical*metrics.* 1. Z.*Wan,*H.*Xiao,*Y.*Joshi,*and*S.*Yalamanchili,* Co$Design*of*Mul/core*Architectures*and*Microfluidic* Cooling*for*3D*Stacked*ICs, *Therminic,*2013.* 2. H.*Xiao,*Z.*Wan,*S.*Yalamanchili,*and*Y.*Joshi,* Leakage*Power*Characteriza/on*and*Minimiza/on*over* 3D*Stacked*Mul/$core*Chip*with*Microfluidic*Cooling, *SemiTherm,*2014.* 13

14 Case'II:'Power,'Thermal,'and'Throughput'Regula>on'via' Adap>ve'Control'Algorithms'in'Mul>core'Processors'! Adap/ve*control*algorithms!u?lize!the!DVFS!capability!of! microprocessors!to!regulate!power,!thermal,!or!throughput!to! constant!level.!! Energy!Introspector!provides!an! interface*to*apply*dynamic* execu/on*controls,!e.g.,!dvfs.! 1. N.*Almoosa,*W.*Song,*S.*Yalamanchili,*and*Y.*Wardi,* Throughput*Regula/on*in*Mul/core*Processors* via*ipa, *CDC,*2012.* 2. N.*Almoosa,*W.*Song,*S.*Yalamanchili,*and*Y.*Wardi,* A*Power*Capping*Controller*for*Mul/core* Processors, *ACC,*2012.* 14

15 Case'III:'GPU'Power'Modeling'Using'McPAT'! Basic!models!of!the!McPAT!(i.e.,!caches,!interconnects,!latches,! etc)!are!remorganized!to!configure!a!gpu!architecture.!! Energy!Introspector!is!configurable*to*model*different* microarchitecture!or!processor!designs.! 1. J.*Lim,*N.*Lakshminarayana,*H.*Kim,*W.*Song,*S.*Yalamanchili,*and*W.*Sung,* Power*Modeling*for*GPU* Architectures*Using*McPAT, *TODEAS,*June*2014.* 15

16 Case'IV:'Life>me'Reliability'Characteriza>on'and' Management'in'Mul>core'Processors'! Variance*Reduc/on:!Adap?ve!control!on!core!execu?on!reduces! variances!in!life?me!reliability!distribu?on!across!the!mul?core!dies! and!improves!overall!processormlevel!life?me!reliability.!! Energy!Introspector!provides!an*integrated*applica/on$ microarchitecture$power$thermal$life/me*reliability*simula/on.! Normalized+MTTF+ 1$ 0.9$ 0.8$ 0.7$ 0.6$ 0.5$ 100%$ 90%$ 80%$ 70%$ 60%$ Performability+Threshold+ 50%$ μ=1.0,σ=0.05$ μ=1.0,σ=0.10$ μ=1.0,σ=0.15$ μ=1.0,σ=0.20$ Normalized+MTTF+ 1$ 0.9$ 0.8$ 0.7$ 0.6$ 0.5$ 100%$ Prac4cal$Region$of$Life4me$ 90%$ 80%$ 70%$ 60%$ 50%$ Performability+Threshold+ μ=1.0,σ=0.20$ μ=0.80,σ=0.05$ 1. W.*Song,*S.*Mukhopadhyay,*and*S.*Yalamanchili,* Architectural*Reliability:*Life/me*Reliability* Characteriza/on*and*Management*of*Many$Core*Processors, *CAL,*2014.* 16

17 Conclusion'and'Future'Works'! In!general!prac?ces,!microarchitecture!and!physical!proper?es! have!been!analyzed!in!separate*models*and*simula/ons.!! Such!a!approach!is!difficult*to*capture*inter$dependency!between! various!physical!phenomena!and!their!impacts!on! microarchitecture.!! Therefore,!holis/c*modeling*and*simula/on*environment!is! essen?al!to!enable!the!explora?ons!at!the!intersec?on!of! applica?ons,!microarchitecture,!power/energy,!thermal/cooling,! reliability,!etc.!! Our!standard!library!interface!is!scalable!to!incorporate!further! physical!phenomena!and!their!models.! 17

18 Summary'! What'is'the'major'contribu>on'of''your'research?'! An!infrastructure!to!enable!integrated*applica/on,*microarchitecture,* and*mul/$physics*simula/ons,!based!on!library*implementa/on/ integra/on*of*physical*models.!!!integrated*infrastructure,*scalable*framework,*standard*api*! What'are'the'gaps'you'iden>fy'in'the'research'coverage'in'your' area?'! Lack!of!standards!in!using!physical!models!! Lack!of!parallel!implementa?on!of!computa?onally!intensive!models! or!large!coremcount!systems!! Fast!compact!models!for!thermal!and!power!delivery!! Need!agreement!on!power!models!across!technologies!! Need!of!higherMlevel!(applica?onMlevel)!models! 18

19 Summary'(cont.)'! What'is'the'bigger'picture'for'your'research'(i.e.,'synerge>c'or' complementary'projects)?'! Applica?onMArchitecture!CoMdesign!! MeasurementMbased!valida?on!infrastructure!! What'major'opportuni>es'do'you'see'for'cross;pollina>on' between'your'projects'and'others?'! Acquiring!new,!beier!physical!models!! Par?cipa?on!in!coMdesign!ac?vi?es,!e.g.,!architectureMapplica?on,! power!deliverympower!management!! PlajormMneutral!models!of!physical!behaviors!from!applica?ons!! Interac?on!with!compiler!and!run!?me!projects! 19

20 Summary'(cont.)'! What'is'one'thing'that'would'make'it'easier/possible'to'leverage/ use'the'results'of'other'projects'to'further'your'own'research?'! Standardiza?on!of!APIs!for!accessing/exercising!physical!models!at! mul?ple!levels!of!abstrac?on!! What'would'you'like'to'most'see'solved/addressed'other'than' what'you'are'working'on?'! Characteriza?on!of!fault!behaviors!as!a!func?on!of!physical!phenomena! and!applica?on!demand!! Higher!level!applica?on!models!to!drive!powerMthermalMreliability! analysis! 20

Energy Introspector: Coordinated Architecture-Level Simulation of Processor Physics

Energy Introspector: Coordinated Architecture-Level Simulation of Processor Physics Energy Introspector: Coordinated Architecture-Level Simulation of Processor Physics William J. Song, Saibal Mukhopadhyay, Arun Rodrigues and Sudhakar Yalamanchili School of Electrical and Computer Engineering,

More information

Manifold: A Parallel Simulation Framework for Multicore Systems

Manifold: A Parallel Simulation Framework for Multicore Systems Manifold: A Parallel Simulation Framework for Multi Systems Jun Wang, Jesse Beu, Rishiraj Bheda, Tom Conte, Zhenjiang Dong, Chad Kersey, Mitchelle Rasquinha, George Riley, William Song, He Xiao, Peng Xu,

More information

Highly Parallel Wafer Level Reliability Systems with PXI SMUs

Highly Parallel Wafer Level Reliability Systems with PXI SMUs Highly Parallel Wafer Level Reliability Systems with PXI SMUs Submitted by National Instruments Overview Reliability testing has long served as a method of ensuring that semiconductor devices maintain

More information

MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES

MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES MANAGING LIFETIME RELIABILITY, PERFORMANCE, AND POWER TRADEOFFS IN MULTICORE MICROARCHITECTURES A Dissertation Presented to The Academic Faculty By William J. Song In Partial Fulfillment Of the Requirements

More information

Power and Thermal Models. for RAMP2

Power and Thermal Models. for RAMP2 Power and Thermal Models for 2 Jose Renau Department of Computer Engineering, University of California Santa Cruz http://masc.cse.ucsc.edu Motivation Performance not the only first order design parameter

More information

The University of Texas at Austin

The University of Texas at Austin EE382 (20): Computer Architecture - Parallelism and Locality Lecture 4 Parallelism in Hardware Mattan Erez The University of Texas at Austin EE38(20) (c) Mattan Erez 1 Outline 2 Principles of parallel

More information

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Registers

More information

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false. CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

A Scheme of Predictor Based Stream Buffers. Bill Hodges, Guoqiang Pan, Lixin Su

A Scheme of Predictor Based Stream Buffers. Bill Hodges, Guoqiang Pan, Lixin Su A Scheme of Predictor Based Stream Buffers Bill Hodges, Guoqiang Pan, Lixin Su Outline Background and motivation Project hypothesis Our scheme of predictor-based stream buffer Predictors Predictor table

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance

More information

XT Node Architecture

XT Node Architecture XT Node Architecture Let s Review: Dual Core v. Quad Core Core Dual Core 2.6Ghz clock frequency SSE SIMD FPU (2flops/cycle = 5.2GF peak) Cache Hierarchy L1 Dcache/Icache: 64k/core L2 D/I cache: 1M/core

More information

VLIW Digital Signal Processor. Michael Chang. Alison Chen. Candace Hobson. Bill Hodges

VLIW Digital Signal Processor. Michael Chang. Alison Chen. Candace Hobson. Bill Hodges VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges Introduction Functionality ISA Implementation Functional blocks Circuit analysis Testing Off Chip Memory Status Things

More information

Problem Set 1 Solutions

Problem Set 1 Solutions CSE 260 Digital Computers: Organization and Logical Design Jon Turner Problem Set 1 Solutions 1. Give a brief definition of each of the following parts of a computer system: CPU, main memory, floating

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation Next Generation Itanium Processor Overview Gary Hammond Principal Architect Enterprise Platform Group Corporation August 27-30, 2001 Sam Naffziger Lead Circuit Architect Microprocessor Technology Lab HP

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Evaluation of RISC-V RTL with FPGA-Accelerated Simulation

Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, Krste Asanovic CARRV 2017 10/14/2017 Evaluation Methodologies For Computer

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Desirable features for modeling/evaluation techniques Accurate Not expensive Non-invasive User-friendly Fast Easy to change

More information

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

Good luck and have fun!

Good luck and have fun! Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.

More information

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1 Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 8 General Purpose Processors - I Version 2 EE IIT, Kharagpur 2 In this lesson the student will learn the following Architecture

More information

" # " $ % & ' ( ) * + $ " % '* + * ' "

 #  $ % & ' ( ) * + $  % '* + * ' ! )! # & ) * + * + * & *,+,- Update Instruction Address IA Instruction Fetch IF Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Program Counter Instruction Register Register File

More information

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design SRAMs to Memory Low Power VLSI System Design Lecture 0: Low Power Memory Design Prof. R. Iris Bahar October, 07 Last lecture focused on the SRAM cell and the D or D memory architecture built from these

More information

are Softw Instruction Set Architecture Microarchitecture are rdw

are Softw Instruction Set Architecture Microarchitecture are rdw Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics

More information

Mesocode: Optimizations for Improving Fetch Bandwidth of Future Itanium Processors

Mesocode: Optimizations for Improving Fetch Bandwidth of Future Itanium Processors : Optimizations for Improving Fetch Bandwidth of Future Itanium Processors Marsha Eng, Hong Wang, Perry Wang Alex Ramirez, Jim Fung, and John Shen Overview Applications of for Itanium Improving fetch bandwidth

More information

A Framework for Modeling GPUs Power Consumption

A Framework for Modeling GPUs Power Consumption A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January

More information

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks Ryzen Agenda What is Ryzen? History Features Zen Architecture SenseMI Technology Master Software Benchmarks The Ryzen Chip What is Ryzen? CPU chip family released by AMD in 2017, which uses their latest

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

CPU ARCHITECTURE. QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system.

CPU ARCHITECTURE. QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system. CPU ARCHITECTURE QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system. ANSWER 1 Data Bus Width the width of the data bus determines the number

More information

CS252 Prerequisite Quiz. Solutions Fall 2007

CS252 Prerequisite Quiz. Solutions Fall 2007 CS252 Prerequisite Quiz Krste Asanovic Solutions Fall 2007 Problem 1 (29 points) The followings are two code segments written in MIPS64 assembly language: Segment A: Loop: LD r5, 0(r1) # r5 Mem[r1+0] LD

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4) Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

Design and Analysis of Real-Time Systems Microarchitectural Analysis

Design and Analysis of Real-Time Systems Microarchitectural Analysis Design and Analysis of Real-Time Systems Microarchitectural Analysis Jan Reineke Advanced Lecture, Summer 2013 Structure of WCET Analyzers Reconstructs a control-flow graph from the binary. Determines

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

Power Measurement Using Performance Counters

Power Measurement Using Performance Counters Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

Design and Implementation of a FPGA-based Pipelined Microcontroller

Design and Implementation of a FPGA-based Pipelined Microcontroller Design and Implementation of a FPGA-based Pipelined Microcontroller Rainer Bermbach, Martin Kupfer University of Applied Sciences Braunschweig / Wolfenbüttel Germany Embedded World 2009, Nürnberg, 03.03.09

More information

Pentium 4 Processor Block Diagram

Pentium 4 Processor Block Diagram FP FP Pentium 4 Processor Block Diagram FP move FP store FMul FAdd MMX SSE 3.2 GB/s 3.2 GB/s L D-Cache and D-TLB Store Load edulers Integer Integer & I-TLB ucode Netburst TM Micro-architecture Pipeline

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-06 1 Processor and L1 Cache Interface

More information

Meltdown and Spectre: Complexity and the death of security

Meltdown and Spectre: Complexity and the death of security Meltdown and Spectre: Complexity and the death of security May 8, 2018 Meltdown and Spectre: Wait, my computer does what? May 8, 2018 Meltdown and Spectre: Whoever thought that was a good idea? May 8,

More information

CPU Structure and Function

CPU Structure and Function Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com http://www.yildiz.edu.tr/~naydin CPU Structure and Function 1 2 CPU Structure Registers

More information

CPU Structure and Function

CPU Structure and Function CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Micro-programmed Control Ch 17

Micro-programmed Control Ch 17 Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)

More information

From the table we can see that the main contribution to. EDA Publishing/THERMINIC 2011

From the table we can see that the main contribution to. EDA Publishing/THERMINIC 2011 Single-hip loud omputer Thermal odel ohammadsadegh Sadri, Andrea Bartolini, Luca Benini University of Bologna Via Risorgimento, 2, 40136 Bologna, Italy Tel:0039(0)512093787;Fax:0039(0)512093785, Email:mohammadsadegh.sadr2,a.bartolini,luca.benini@unibo.it

More information

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary Hardwired Control (4) Complex Fast Difficult to design Difficult to modify

More information

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Module 5: MIPS R10000: A Case Study Lecture 9: MIPS R10000: A Case Study MIPS R A case study in modern microarchitecture. Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

High-level Power Estimation. Naehyuck Chang Dept. of EECS/CSE Seoul National University

High-level Power Estimation. Naehyuck Chang Dept. of EECS/CSE Seoul National University High-level Power Estimation Naehyuck Chang Dept. of EECS/CSE Seoul National University naehyuck@snu.ac.kr 1 Power Macromodeling Motivation Hard to estimate the switching activity at higher levels Logic

More information

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, Krste Asanović

More information

The CPU Pipeline. MIPS R4000 Microprocessor User's Manual 43

The CPU Pipeline. MIPS R4000 Microprocessor User's Manual 43 The CPU Pipeline 3 This chapter describes the basic operation of the CPU pipeline, which includes descriptions of the delay instructions (instructions that follow a branch or load instruction in the pipeline),

More information

IF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB

IF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB EE 4720 Homework 4 Solution Due: 22 April 2002 To solve Problem 3 and the next assignment a paper has to be read. Do not leave the reading to the last minute, however try attempting the first problem below

More information

OPENSPARC T1 OVERVIEW

OPENSPARC T1 OVERVIEW Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share

More information

Full Name: NetID: Midterm Summer 2017

Full Name: NetID: Midterm Summer 2017 Full Name: NetID: Midterm Summer 2017 OAKLAND UNIVERSITY, School of Engineering and Computer Science CSE 564: Computer Architecture Please write and/or mark your answers clearly and neatly; answers that

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit

More information

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors Peter Sandon Senior PowerPC Processor Architect IBM Microelectronics All information in these materials is subject to

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory

More information

Data-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001

Data-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001 Data-flow prescheduling for large instruction windows in out-of-order processors Pierre Michaud, André Seznec IRISA / INRIA January 2001 2 Introduction Context: dynamic instruction scheduling in out-oforder

More information

CS/EE 260. Digital Computers Organization and Logical Design

CS/EE 260. Digital Computers Organization and Logical Design CS/EE 260. Digital Computers Organization and Logical Design David M. Zar Computer Science and Engineering Department Washington University dzar@cse.wustl.edu http://www.cse.wustl.edu/~dzar/class/260 Digital

More information

Lecture 1 An Overview of High-Performance Computer Architecture. Automobile Factory (note: non-animated version)

Lecture 1 An Overview of High-Performance Computer Architecture. Automobile Factory (note: non-animated version) Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Fall 2002 Edward F. Gehringer Automobile Factory (note: non-animated version) Automobile Factory (note: non-animated version)

More information

XMTSim: A Simulator of the XMT Many-core Architecture

XMTSim: A Simulator of the XMT Many-core Architecture XMTSim: A Simulator of the XMT Many-core Architecture Fuat Keceli Intel Corporation Hillsboro, OR 97124 fuat.keceli@intel.com Uzi Vishkin Department of Electrical and Computer Engineering University of

More information

CS3350B Computer Architecture CPU Performance and Profiling

CS3350B Computer Architecture CPU Performance and Profiling CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

Course on Advanced Computer Architectures

Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1

More information

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions

More information

Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015

Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 What is Intel Processor Trace? Intel Processor Trace (Intel PT) provides hardware a means to trace branching, transaction, and timing information

More information

Computer Architecture and Organization (CS-507)

Computer Architecture and Organization (CS-507) Computer Architecture and Organization (CS-507) Muhammad Zeeshan Haider Ali Lecturer ISP. Multan ali.zeeshan04@gmail.com https://zeeshanaliatisp.wordpress.com/ Lecture 4 Basic Computer Function, Instruction

More information

Integrated CPU and Cache Power Management in Multiple Clock Domain Processors

Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC

More information

Computer Architecture

Computer Architecture Computer Architecture Pipelined and Parallel Processor Design Michael J. Flynn Stanford University Technische Universrtat Darmstadt FACHBEREICH INFORMATIK BIBLIOTHEK lnventar-nr.: Sachgebiete: Standort:

More information

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company Presentation Overview PA-8500 Overview uction Fetch Capabilities

More information

Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 13:

More information

LEON4: Fourth Generation of the LEON Processor

LEON4: Fourth Generation of the LEON Processor LEON4: Fourth Generation of the LEON Processor Magnus Själander, Sandi Habinc, and Jiri Gaisler Aeroflex Gaisler, Kungsgatan 12, SE-411 19 Göteborg, Sweden Tel +46 31 775 8650, Email: {magnus, sandi, jiri}@gaisler.com

More information

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information