PIPELINING AND PROCESSOR PERFORMANCE

Size: px
Start display at page:

Download "PIPELINING AND PROCESSOR PERFORMANCE"

Transcription

1 PIPELINING AND PROCESSOR PERFORMANCE Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 1, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011 ADVANCED COMPUTER ARCHITECTURES ARQUITECTURAS AVANÇADAS DE COMPUTADORES (AAC)

2 Outline 2 Revision Single cycle processor Multi cycle processor Processor pipelining Instruction flow in a pipeline processor Execution conflicts Evaluating processor performance

3 Revision of a RISC architectures Single cycle processor 3

4 Revision of a RISC architectures Multi-cycle processor 4 Enable &OF Enable &OF EX Enable EX MEM Enable MEM WB Enable WB NEXT PC 4 JMP CTRL COND AD FLAGS Clock + PC Address Instructions Memory COND OP SEL Decoder SEL A SEL B IMM R[AA] MUX SEL A SEL B A ALU OP Flags S Data Data Memory Address WE CLK Data Clock Data INSTRUCTION INST IMM MEM WRITE SEL OUT WE DA BA AA IMM R[BA] MUX B MUX AA A BA B Asynchronous read ports Register File (RF) DA WE DATA Synchronous write ports CLK

5 Instruction flow Single cycle processor 5 Each instruction takes 1 cycle to execute Clock period limited by the worst case path of the whole processor Example: for (i=0,aux=0; i<100; i++) { } if (V[i] > aux){ aux=v[i]; } LOOP_NXT: LI R2,100 SLL R2,R2,2 LI R1,4 MOVE R3,R0 LW R4,100(R1) ADDI R1,R1,4 SUB R5,R4,R3 BLEZ R5,LOOP_END MOVE R3,R4 Some of the used instructions (e.g., LI and MOVE) do not actual exist in MIPS64 instruction set. Hence, they should be replaced with an equivalent instruction such as OR DR,R0,operand. However, they are left here to simplify the reading of the Assembly code. LOOP_END: BNE R2,R1,LOOP_NXT

6 Instruction flow Single cycle processor 6 Each instruction takes 1 cycle to execute Clock period limited by the worst case path of the whole processor Cycle 1 Cycle 2 Cycle 3 Cycle 4 LI R2,100,,EX,MEM,WB SLL R2,R2,2,,EX,MEM,WB LI R1,4,,EX,MEM,WB MOVE R3,R0,,EX,MEM,WB LOOP_NXT: LW R4,100(R1) ADDI R1,R1,4 SUB R5,R4,R3 BLEZ R5,LOOP_END MOVE R3,R4 LOOP_END: BNE R2,R1,LOOP_NXT

7 Instruction flow Multi cycle processor 7 Each instruction takes 5 cycles to execute Clock period limited by the worst case path of all stages The working frequency is higher but the instruction throughput is lower LI R2,100 SLL R2,R2,2 LI R1,4 MOVE R3,R0 LOOP_NXT: LW R4,100(R1) ADDI R1,R1,4 SUB R5,R4,R3 BLEZ R5,LOOP_END MOVE R3,R4 LOOP_END: BNE R2,R1,LOOP_NXT

8 Instruction flow Pipeline processor 8 The processor simultaneously executes a part of up to 5 different instructions each clock cycle The instruction throughput increases by up to 5x LI R2,100 SLL R2,R2,2 LI R1,4 Pipeline overview during clock cycle 7 MOVE R3,R0 LOOP_NXT: LW R4,100(R1) ADDI R1,R1,4 SUB R5,R4,R3 BLEZ R5,LOOP_END MOVE R3,R4 LOOP_END: BNE R2,R1,LOOP_NXT

9 Instruction flow Pipeline processor 9 The instruction throughput can increase by 5x (potential) Much higher performance but LI R2,100 SLL R2,R2,2 LI R1,4 MOVE R3,R0 LOOP_NXT: LW R4,100(R1) ADDI R1,R1,4 SUB R5,R4,R3 BLEZ R5,LOOP_END MOVE R3,R4 LOOP_END: BNE R2,R1,LOOP_NXT

10 Instruction flow Pipeline processor 10 The instruction throughput can increase by 5x (potential) Much higher performance but it generates conflicts that must be solved to guarantee the correct behaviour LI R2,100 R2 is valid SLL R2,R2,2 Read R2 EX MEM WB LI R1,4 R1 is valid MOVE R3,R0 R3 is valid LOOP_NXT: LW R4,100(R1) Read R1 EX MEM WB R4 is valid ADDI R1,R1,4 SUB R5,R4,R3 BLEZ R5,LOOP_END MOVE R3,R4 Read R4,R3 EX MEM WB Read R5 EX MEM WB R5 is valid LOOP_END: BNE R2,R1,LOOP_NXT

11 Instruction flow Solving conflicts from pipelining 11 The conflicts can be solved by delaying instruction issue whenever it is necessary Whenever a conflict is found the instruction pipeline is stalled The real instruction throughput from pipelining is smaller than 5x LI R2,100 R2 is valid SLL R2,R2,2 Read R2 EX MEM WB LI R1,4 R1 is valid MOVE R3,R0 R3 is valid LOOP_NXT: LW R4,100(R1) Read R1 EX MEM WB R4 is valid ADDI R1,R1,4 SUB R5,R4,R3 Read R4 EX BLEZ R5,LOOP_END MOVE R3,R4 LOOP_END: BNE R2,R1,LOOP_NXT

12 Instruction flow Solving conflicts from pipelining 12 The additional logic to detect and solve conflicts increases the clock period The performance increase is even smaller than expected The clock period must increase to allow for conflict detection and resolution LI R2,100 SLL R2,R2,2 Read R2 EX MEM WB LI R1,4 MOVE R3,R0 LOOP_NXT: LW R4,100(R1) Read R1 EX MEM W ADDI R1,R1,4 EX ME SUB R5,R4,R3 Sta BLEZ R5,LOOP_END Sta MOVE R3,R4 LOOP_END: BNE R2,R1,LOOP_NXT

13 Processor performance 13 The processor performance depends on a number of factors: Clock frequency Instruction Set Architecture (e.g., RISC vs CISC) ISA implementation (e.g., single cycle vs multi cycle vs pipeline) Benchmarks (programs) used Compiler optimizations Memory bandwidth and latency What is the best metric to assess processor performance?

14 Measuring processor performance 14 Frequency (GHz) Does not take into account architectural differences (e.g., ISA) MIPS (million instructions per second) MIPS = #Instructions Execution Time [μs] = Clock Frequency CPI 10 6 Valid only when using the exact same program, compiler and OS Requires that both processors use the same ISA 1 CISC instruction is equivalent to several RISC instructions, but takes longer to execute MFLOPS (million floating point operations per second) Has the same problems as the MIPS metric Valid only for floating point intensive programs e.g., does not make sense for H.264 video compression CPI = Cycles Per Instruction

15 Measuring processor performance 15 Use time to measure processor performance Requires the implementation (or at least simulation) of the proposed processor 1 Performance P = Execution Time P What does it mean to say: Processor P A is x times faster than processor P B x = Performance P A Performance P B = Execution Time P B Execuction Time P A Speed-up x is also called the speedup of processor P A versus processor P B

16 Measuring processor performance: Task selection 16 What are the best benchmarks? Real applications of interest to the user Different users different benchmarks Representative programs, e.g., SPEC CPU 2006 Synthetic Programs, e.g., Dhrystone More realistic Fit for real systems Simpler programs Easier to test in simulation Different metrics: Execution rhythm (tasks/second) vs latency (seconds/task)

17 17 Measuring processor performance: SPEC CPU 2006 (integer) Benchmark Lang. Application Area Description 400.Perlbench C Programming Language 401.bzip2 C Compression Derived from Perl V The workload includes SpamAssassin, MHonArc (an indexer), and specdiff (SPEC's tool that checks benchmark outputs). Julian Seward's bzip2 version 1.0.3, modified to do most work in memory, rather than doing I/O. 403.gcc C C Compiler Based on gcc Version 3.2, generates code for Opteron. 429.mcf C Combinatorial Optimization Vehicle scheduling. Uses a network simplex algorithm (which is also used in commercial products) to schedule public transport. 445.gobmk C Artificial Intelligence: Go Plays the game of Go, a simply described but deeply complex game. 456.hmmer C Search Gene Sequence Protein sequence analysis using profile hidden Markov models (profile HMMs) 458.sjeng C Artificial Intelligence: chess A highly-ranked chess program that also plays several chess variants. 462.libquantum C Physics / Quantum Computing 464.h264ref C Video Compression 471.omnetpp C++ Discrete Event Simulation Simulates a quantum computer, running Shor's polynomial-time factorization algorithm. A reference implementation of H.264/AVC, encodes a videostream using 2 parameter sets. The H.264/AVC standard is expected to replace MPEG2 Uses the OMNet++ discrete event simulator to model a large Ethernet campus network. 473.astar C++ Path-finding Algorithms Pathfinding library for 2D maps, including the well known A* algorithm. 483.xalancbmk C++ XML Processing A modified version of Xalan-C++, which transforms XML documents to other document types.

18 18 Measuring processor performance: SPEC CPU 2006 (floating point) Benchmark Lang. Application Area Description 410.bwaves Fortran Fluid Dynamics Computes 3D transonic transient laminar viscous flow. 416.gamess Fortran Quantum Chemistry. Gamess implements a wide range of quantum chemical computations. 433.milc C Quantum Chromodynamics A gauge field generating program for lattice gauge theory programs. 434.zeusmp Fortran Physics / CFD Computational fluid dynamics code for simulating of astrophysical phenomena. 435.gromacs Biochemistry / Molecular C,Fortran Dynamics 436.cactusADM C,Fortran Physics / General Relativity Solves the Einstein evolution equations using a staggered-leapfrog method 437.leslie3d Fortran Fluid Dynamics Large-Eddy Simulations with Linear-Eddy Model in 3D. 444.namd C++ Biology / Molecular Dynamics Simulates large biomolecular systems. Molecular dynamics, i.e. simulate Newtonian equations of motion for hundreds to millions of particles. The test case simulates protein Lysozyme in a solution. 447.dealII C++ Finite Element Analysis Program library targeted at adaptive finite elements and error estimation. 450.soplex C++ Linear Programming, Optimization Solves a linear program using a simplex algorithm and sparse linear algebra. 453.povray C++ Image Ray-tracing Image rendering of a 1280x1024 anti-aliased landscape. 454.calculix C,Fortran Structural Mechanics Finite element code for linear and nonlinear 3D structural applications. 459.GemsFDTD Fortran Computational Electromagnetics 465.Tonto Fortran Quantum Chemistry An open source quantum chemistry package 470.lbm C Fluid Dynamics Simulates incompressible fluids in 3D 481.wrf C,Fortran Weather Weather modeling from scales of meters to thousands of kilometers. 482.sphinx3 C Speech recognition A widely-known speech recognition system from Carnegie Mellon University Solves the 3D Maxwell equations in 3D using the finite-difference time-domain (FDTD) method.

19 19 Measuring processor performance: Averaging performance Normal All benchmarks have the same weight Weighted (W i ) Benchmarks are weighted by frequency or relevance Arithmetic Mean 1 N T i N W i T i N W i N i=1 i=1 i=1 Harmonic Mean (Less sensitive to large outliers and increases the influence of small values) 1 N N i=1 T i 1 = N i=1 n w i i=1 w i T i n i=1 N 1 T i Alternative: Instead of using Execution Time T i Use speedup regarding a standard reference, Speedup i = SPECs use a SPARStation (SUN Sparc10) as reference T i Ref T i

20 20 Measuring processor performance: Amdahl's Law Consider that we improve processor performance by better designing some part of it E.g., improve floating point calculations by 3x What is the actual improvement? Speedup = T T = T T FP + T(Non FP) The Non-FP instructions have the same execution time: T Non FP = T(Non FP) FP Instructions execution time: T(FP) T FP = = 1 Speedup (FP) 3 T(FP) T Execution time in the original processor T Execution time in the improved processor

21 21 Measuring processor performance: Amdahl's Law Consider that we improve processor performance by better designing some part of it E.g., improve floating point calculations by 3x What is the actual improvement? Speedup = T = 1 T(FP) Speedup(FP) +T(Non FP) T(FP)/T +T(Non FP)/T Speedup(FP) Lets us consider that, in the original processor, the fraction of time executing floating point instructions is α FP = T FP = 0.25 T Speedup = 1 = 1 α(fp) Speedup(FP) + 1 α(fp) = 1.2 T Execution time in the original processor T Execution time in the improved processor

22 22 Measuring processor performance: Amdahl's Law (corollary) Consider that we improve processor performance by better designing some part of it E.g., improve floating point calculations by 3x What is the maximum improvement possible? Speedup = 1 = 1 α(fp) Speedup(FP) + 1 α(fp) = 1.2 Consider that Speedup FP + Maximum achievable Speedup = 1 = 1 = 1.33(3) 1 α(fp) 0.75 T Execution time in the original processor T Execution time in the improved processor

23 Measuring processor performance: Amdahl's Law (summary) 23 Execution Time: T Improved machine = T Reference Machine 1 Fraction Improved + Fraction Improved Speedup Improved Actual Speedup: Speedup Improved machine = 1 Fraction Improved 1 + Fraction Improved Speedup Improved Maximum achievable Speedup (Speedup Improved + ): Maximum Speedup Improved Machine = 1 1 Fraction Improved

24 24 Next lesson More on processor pipelining Conflict identification Solving conflicts

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week)

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) + SEN361 Computer Organization Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) + Outline 1. Overview 1.1 Basic Concepts and Computer Evolution 1.2 Performance Issues + 1.2 Performance Issues + Designing for

More information

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved. + EKT 303 WEEK 2 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. Chapter 2 + Performance Issues + Designing for Performance The cost of computer systems continues to drop dramatically,

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 36 Performance 2010-04-23 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in

More information

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Performance Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Defining Performance (1) Which airplane has the best performance? Boeing 777 Boeing

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 38 Performance 2008-04-30 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in

More information

Architecture of Parallel Computer Systems - Performance Benchmarking -

Architecture of Parallel Computer Systems - Performance Benchmarking - Architecture of Parallel Computer Systems - Performance Benchmarking - SoSe 18 L.079.05810 www.uni-paderborn.de/pc2 J. Simon - Architecture of Parallel Computer Systems SoSe 2018 < 1 > Definition of Benchmark

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =

More information

A Fast Instruction Set Simulator for RISC-V

A Fast Instruction Set Simulator for RISC-V A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.

More information

Computer Architecture. Introduction

Computer Architecture. Introduction to Computer Architecture 1 Computer Architecture What is Computer Architecture From Wikipedia, the free encyclopedia In computer engineering, computer architecture is a set of rules and methods that describe

More information

Pipelining. CS701 High Performance Computing

Pipelining. CS701 High Performance Computing Pipelining CS701 High Performance Computing Student Presentation 1 Two 20 minute presentations Burks, Goldstine, von Neumann. Preliminary Discussion of the Logical Design of an Electronic Computing Instrument.

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell

More information

Lightweight Memory Tracing

Lightweight Memory Tracing Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for

More information

Last time. Lecture #29 Performance & Parallel Intro

Last time. Lecture #29 Performance & Parallel Intro CS61C L29 Performance & Parallel (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #29 Performance & Parallel Intro 2007-8-14 Scott Beamer, Instructor Paper Battery Developed by Researchers

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Class Project Report

Class Project Report 15-745 Class Project Report Brendan Meeder (bmeeder@cs), Jamie Morgenstern (jaimiemmt@cs), Richard Peng (richard.peng@gmail.com) April 27, 2011 Abstract Selecting an ordering for running optimizations

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University

More information

Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University

Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University 2110684 Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University Agenda Capacity Planning Determining the production capacity needed by an organization

More information

Footprint-based Locality Analysis

Footprint-based Locality Analysis Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.

More information

Adaptive prefetch for multicores

Adaptive prefetch for multicores Departament d Informàtica de Sistemes i Computadors Universitat Politècnica de València Adaptive prefetch for multicores MASTER S THESIS Máster en Ingeniería de Computadores Author Vicent Selfa Oliver

More information

Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design

Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design Computing Element Choices: Computing Element Programmability Spatial vs. Temporal Computing Main Processor Types/Applications

More information

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

Detection of Weak Spots in Benchmarks Memory Space by using PCA and CA

Detection of Weak Spots in Benchmarks Memory Space by using PCA and CA Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR *, Fazal NOORBASHA

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems

NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems Rentong Guo 1, Xiaofei Liao 1, Hai Jin 1, Jianhui Yue 2, Guang Tan 3 1 Huazhong University of Science

More information

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Onur Mutlu Chita R. Das Executive summary Problem: Current day NoC designs are agnostic to application requirements

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

TDT4255 Computer Design. Lecture 1. Magnus Jahre

TDT4255 Computer Design. Lecture 1. Magnus Jahre 1 TDT4255 Computer Design Lecture 1 Magnus Jahre 2 Outline Practical course information Chapter 1: Computer Abstractions and Technology 3 Practical Course Information 4 TDT4255 Computer Design TDT4255

More information

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Computer Performance Evaluation: Cycles Per Instruction (CPI) Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: where: Clock rate = 1 / clock cycle A computer machine

More information

Addressing End-to-End Memory Access Latency in NoC-Based Multicores

Addressing End-to-End Memory Access Latency in NoC-Based Multicores Addressing End-to-End Memory Access Latency in NoC-Based Multicores Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das The Pennsylvania State University University Park, PA, 682, USA {akbar,euk39,kandemir,das}@cse.psu.edu

More information

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012 Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso

More information

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

Sandbox Based Optimal Offset Estimation [DPC2]

Sandbox Based Optimal Offset Estimation [DPC2] Sandbox Based Optimal Offset Estimation [DPC2] Nathan T. Brown and Resit Sendag Department of Electrical, Computer, and Biomedical Engineering Outline Motivation Background/Related Work Sequential Offset

More information

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras Performance, Cost and Amdahl s s Law Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Performance-

More information

Bias Scheduling in Heterogeneous Multi-core Architectures

Bias Scheduling in Heterogeneous Multi-core Architectures Bias Scheduling in Heterogeneous Multi-core Architectures David Koufaty Dheeraj Reddy Scott Hahn Intel Labs {david.a.koufaty, dheeraj.reddy, scott.hahn}@intel.com Abstract Heterogeneous architectures that

More information

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance

MEASURING COMPUTER TIME. A computer faster than another? Necessity of evaluation computer performance Necessity of evaluation computer performance MEASURING COMPUTER PERFORMANCE For comparing different computer performances User: Interested in reducing the execution time (response time) of a task. Computer

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

Instructor Information

Instructor Information CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus

More information

ISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015

ISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 ISA-Aging (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 Bruno Cardoso Lopes, Rafael Auler, Edson Borin, Luiz Ramos, Rodolfo Azevedo, University of Campinas, Brasil

More information

Open Access Research on the Establishment of MSR Model in Cloud Computing based on Standard Performance Evaluation

Open Access Research on the Establishment of MSR Model in Cloud Computing based on Standard Performance Evaluation Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 821-825 821 Open Access Research on the Establishment of MSR Model in Cloud Computing based

More information

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of

More information

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás

CACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance This Unit CIS 501 Computer Architecture Metrics Latency and throughput Reporting performance Benchmarking and averaging Unit 2: Performance Performance analysis & pitfalls Slides developed by Milo Martin

More information

1.6 Computer Performance

1.6 Computer Performance 1.6 Computer Performance Performance How do we measure performance? Define Metrics Benchmarking Choose programs to evaluate performance Performance summary Fallacies and Pitfalls How to avoid getting fooled

More information

Linux Performance on IBM zenterprise 196

Linux Performance on IBM zenterprise 196 Martin Kammerer martin.kammerer@de.ibm.com 9/27/10 Linux Performance on IBM zenterprise 196 visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html Trademarks IBM, the IBM logo, and

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches

More information

IS360 - High Performance Computing. Basavaraj Talawar CSE, NITK

IS360 - High Performance Computing. Basavaraj Talawar CSE, NITK IS360 - High Performance Computing Basavaraj Talawar CSE, NITK Course Syllabus Definition, RISC ISA, RISC Pipeline, Performance Quantification Instruction Level Parallelism Pipeline Hazards, Combating

More information

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture Chapter 2 Note: The slides being presented represent a mix. Some are created by Mark Franklin, Washington University in St. Louis, Dept. of CSE. Many are taken from the Patterson & Hennessy book, Computer

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Data Prefetching by Exploiting Global and Local Access Patterns

Data Prefetching by Exploiting Global and Local Access Patterns Journal of Instruction-Level Parallelism 13 (2011) 1-17 Submitted 3/10; published 1/11 Data Prefetching by Exploiting Global and Local Access Patterns Ahmad Sharif Hsien-Hsin S. Lee School of Electrical

More information

Thesis Defense Lavanya Subramanian

Thesis Defense Lavanya Subramanian Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Thesis Defense Lavanya Subramanian Committee: Advisor: Onur Mutlu Greg Ganger James Hoe Ravi Iyer (Intel)

More information

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST Chapter 1 Computer Abstractions and Technology Adapted by Paulo Lopes, IST The Computer Revolution Progress in computer technology Sustained by Moore s Law Makes novel and old applications feasible Computers

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

A Front-end Execution Architecture for High Energy Efficiency

A Front-end Execution Architecture for High Energy Efficiency A Front-end Execution Architecture for High Energy Efficiency Ryota Shioya, Masahiro Goshima and Hideki Ando Department of Electrical Engineering and Computer Science, Nagoya University, Aichi, Japan Information

More information

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance

More information

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction. CMSC 4 Practice Exam w/answers General instructions. Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.. CPU performance Suppose we have the following

More information

CpE 442 Introduction to Computer Architecture. The Role of Performance

CpE 442 Introduction to Computer Architecture. The Role of Performance CpE 442 Introduction to Computer Architecture The Role of Performance Instructor: H. H. Ammar CpE442 Lec2.1 Overview of Today s Lecture: The Role of Performance Review from Last Lecture Definition and

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative Emre Kültürsay *, Mahmut Kandemir *, Anand Sivasubramaniam *, and Onur Mutlu * Pennsylvania State University Carnegie Mellon University

More information

Lecture 4: Instruction Set Architectures. Review: latency vs. throughput

Lecture 4: Instruction Set Architectures. Review: latency vs. throughput Lecture 4: Instruction Set Architectures Last Time Performance analysis Amdahl s Law Performance equation Computer benchmarks Today Review of Amdahl s Law and Performance Equations Introduction to ISAs

More information

Perceptron Learning for Reuse Prediction

Perceptron Learning for Reuse Prediction Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

HOTL: a Higher Order Theory of Locality

HOTL: a Higher Order Theory of Locality HOTL: a Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

CSE2021 Computer Organization. Computer Abstractions and Technology

CSE2021 Computer Organization. Computer Abstractions and Technology CSE2021 Computer Organization Chapter 1 Computer Abstractions and Technology Instructor: Prof. Peter Lian Department of Electrical Engineering & Computer Science Lassonde School of Engineering York University

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Review: latency vs. throughput

Review: latency vs. throughput Lecture : Performance measurement and Instruction Set Architectures Last Time Introduction to performance Computer benchmarks Amdahl s law Today Take QUIZ 1 today over Chapter 1 Turn in your homework on

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Administrivia. Instructors. Where to Get Answers

ECE 252 / CPS 220 Advanced Computer Architecture I. Administrivia. Instructors. Where to Get Answers ECE 252 / CPS 220 Advanced Computer Architecture I Fall 2007 Duke University Prof. Daniel Sorin (sorin@ee.duke.edu) Administrivia addresses, email, website, etc. list of topics expected background course

More information

The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory

The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory Lavanya Subramanian* Vivek Seshadri* Arnab Ghosh* Samira Khan*

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Near-Threshold Computing: How Close Should We Get?

Near-Threshold Computing: How Close Should We Get? Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on

More information

Computer Science 246. Computer Architecture

Computer Science 246. Computer Architecture Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Performance Metrics Averaging Amdahl s Law Benchmarks The CPU Performance Equation Optimal

More information

(Advanced) Computer Architechture. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week)

(Advanced) Computer Architechture. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) (Advanced) Computer Architechture Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) Outline 1. Overview Basic Concepts and Computer Evolution Performance Issues + 1.2 Performance Issues 1.2 Outline Designing for

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance Complex question How fast is the processor? How fast your application runs?

More information

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program. Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

! CS61C : Machine Structures. Lecture 27 Performance II & Inter-machine Parallelism. !!Instructor Paul Pearce!

! CS61C : Machine Structures. Lecture 27 Performance II & Inter-machine Parallelism. !!Instructor Paul Pearce! inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 27 Performance II & Inter-machine Parallelism 2010-08-05!!!Instructor Paul Pearce! DENSITY LIMITS IN HARD DRIVES?! Yesterday Samsung! announced

More information

Baikal-T1 Microprocessor Performance Tests

Baikal-T1 Microprocessor Performance Tests Baikal-T1 Microprocessor Performance Tests Revision list Revision Date Author Description 1.0 15.03.2017 Initial version 1.1 08.08.2017 Added SPEC CPU2006 Int, iperf results Revision list... 1 1. List

More information

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time? The bottom line: Performance Car to Bay Area Speed Passengers Throughput (pmph) Ferrari 3.1 hours 160 mph 2 320 Measuring and Discussing Computer System Performance Greyhound 7.7 hours 65 mph 60 3900 or

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics

More information

IAENG International Journal of Computer Science, 40:4, IJCS_40_4_07. RDCC: A New Metric for Processor Workload Characteristics Evaluation

IAENG International Journal of Computer Science, 40:4, IJCS_40_4_07. RDCC: A New Metric for Processor Workload Characteristics Evaluation RDCC: A New Metric for Processor Workload Characteristics Evaluation Tianyong Ao, Pan Chen, Zhangqing He, Kui Dai, Xuecheng Zou Abstract Understanding the characteristics of workloads is extremely important

More information

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1 Lecture - 4 Measurement Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1 Acknowledgements David Patterson Dr. Roger Kieckhafer 9/29/2009 2 Computer Architecture is Design and Analysis

More information

Introduction to Pipelined Datapath

Introduction to Pipelined Datapath 14:332:331 Computer Architecture and Assembly Language Week 12 Introduction to Pipelined Datapath [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 W12.1 Review:

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!

More information

HOTL: A Higher Order Theory of Locality

HOTL: A Higher Order Theory of Locality HOTL: A Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information