Computer Hardware Engineering

Size: px
Start display at page:

Download "Computer Hardware Engineering"

Transcription

1 Computer Hardware ngineering IS2, spring 25 Lecture 6: Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides version. 2 Course Structure Module : Logic Design Module : I/O Systems L L DCÖ DCÖ2 Lab:dicom L7 6 7 Lab: nios2io 9 Lab: nios2int Module 2: C and ssembly Programming Module 5: Hierarchy L L2 L8 8 Home Lab: cache L 2 Lab: nios2time L Home lab: C Module : Processor Design Module 6: Parallel Processors and Programs L5 L6 5 L9 L

2 bstractions in Computer Systems Networked Systems and Systems of Systems Computer System pplication Software Software Operating System Hardware/Software Interface Set rchitecture Microarchitecture Digital Hardware Design Logic and Building Blocks Digital Circuits nalog Circuits nalog Design and Physics Devices and Physics genda by Open Grid Scheduler / Grid ngine - Own work. Licensed under Creative Commons Zero, Public Domain

3 5 cknowledgement: The structure and several of the good examples are derived from the book Digital Design and Computer rchitecture (2) by D. M. Harris and S. L. Harris. 6 Jump Path (Revisited) Inst W 25:2 Zero 2:6 PC 2 2 :28 27: 25: 5: RegWrite Branch RegDst LUSrc LUControl 2:6 5: Sign xtend LU W MemWrite MemToReg

4 7 Control Unit (Revisited) Decoding the instruction op 5 Main Decoder RegWrite RegDst LUSrc Branch MemWrite MemToReg Jump Control signals to the data path LUOp 2 funct 6 LU Decoder LUControl 8 Performance nalysis (Revisited) xecution time (in seconds) = # instructions clock cycles instruction seconds clock cycle Number of instructions in a program (# = number of) Determined by programmer or the compiler or both. verage cycles per instruction (CPI) Determined by the microarchitecture implementation. Seconds per cycle = clock period T C. Determined by the critical path in the logic. For the single-cycle processor, each instruction takes one clock cycle. That is, CPI =. The main problem with the single-cycle processor design (last lecture) is the long critical path. Solution: Pipelining

5 Parallelism and Pipelining (/6) Definitions 9 Processing System: system that takes input and produces outputs. Token: n input that is processed by the processing system and results in an output. Latency: The time it takes for the system to process one token. Throughput: The number of tokens that can be processed per time unit. Parallelism and Pipelining (2/6) Sequential Processing xample: ssume we have a Christmas card factory with two machines (M and M2). pproach. Process tokens sequentially. In this case a token is a card. The latency is 6 = s M: Prints out the card (takes 6s) M2: Puts on a stamp (takes s) The throughput is / =. tokens per second or 6 tokens per minute. M M2 M M s

6 Parallelism and Pipelining (/6) Parallel Processing (Spatial Parallelism) xample: ssume we have a Christmas card factory with four machines. pproach 2. Process tokens in parallel using more machines. The latency is 6 = s M: Prints out the card (takes 6s) M2: Puts on a stamp (takes s) M: Prints out the card (takes 6s) M: Puts on a stamp (takes s) The throughput is 2 * / =.2 tokens per second or 2 tokens per minute. M M2 M M2 M M M M s Parallelism and Pipelining (/6) Pipelining (Temporal Parallelism) 2 xample: ssume we have a Christmas card factory with two machines. pproach. Process tokens by pipelining using only two machines. The latency is still 6 = s M: Prints out the card (takes 6s) M2: Puts on a stamp (takes s) The throughput is /6 =.666 tokens per second or tokens per minute. M M2 The factory starts the production of a new card every 6 second M M2 M M s

7 Parallelism and Pipelining (5/6) Summary pproach. Process tokens sequentially using two machines pproach 2. Process tokens in parallel using four machines pproach. Process tokens by pipelining using only two machines. M Latency: s Throughput: 6 tokens/min We improve throughput, but not latency Latency: s Throughput: 2 tokens/min Latency: s Throughput: tokens/min M2 Spatial parallelism adds extra machines, but pipelining does not Throughput improvements are limited by the slowest machine (in this case M) Parallelism and Pipelining (6/6) Performance nalysis for Pipelining Idea: We introduce a pipeline in the processor How does this affect the execution time? xecution time (in seconds) = # instructions clock cycles instruction seconds clock cycle Pipelining does not change the number of instructions Pipelining will not improve the CPI (actually, make it slightly worse) Pipelining will improve the cycle period (make the critical path shorter)

8 Towards a Pipelined (/8) 5 Recall the single-cycle data path (the logic for the j and beq instructions is hidden) PC next PC Inst W 25:2 2: :6 5: 5: Sign xtend LU W Towards a Pipelined (2/8) Fetch Stage 6 register splits the datapath into stages, forming a pipeline. First, we introduce a instruction fetch stage. PC next Inst 25:2 2:6 5: 2 W 2 2:6 5: Sign xtend LU W Fetch (F)

9 Towards a Pipelined (/8) Decode Stage 7 decode stage decodes an instruction and reads out values from the register file. PC next W 2 2 2:6 5: LU W Sign xtend Fetch (F) Decode (D) Towards a Pipelined (/8) xecute Stage 8 n execute stage performs the computation using the LU. PC next W 2 2 2:6 5: LU W Sign xtend Fetch (F) Decode (D) xecute ()

10 Towards a Pipelined (5/8) Stage 9 PC next 2 W 2 2:6 5: Reading and writing to memory is done in the memory stage. LU W Sign xtend Fetch (F) Decode (D) xecute () (M) Towards a Pipelined (6/8) Writeback Stage Can you see a problem with the writeback? PC next 2 W 2 2:6 5: Sign xtend The results are written back to the register file in the writeback stage. LU W 2 Fetch (F) Decode (D) xecute () (M) Writeback (W)

11 PC next Towards a Pipelined (7/8) Writeback Stage Note that the register file is read in the decode stage, but written to in the writeback stage 2 W 2 2:6 5: Sign xtend The address must be forwarded to the correct stage! LU W 2 Fetch (F) Decode (D) xecute () (M) Writeback (W) Towards a Pipelined (8/8) nother issue 22 Can you see another issue? PC next The program counter can be updated in the wrong stage (PC increment by or when branching). Solution not shown in the slides. W W 2 2 2:6 5: Sign xtend LU Fetch (F) Decode (D) xecute () (M) Writeback (W)

12 2 cknowledgement: The structure and several of the good examples are derived from the book Digital Design and Computer rchitecture (2) by D. M. Harris and S. L. Harris. Five-Stage Pipeline In each cycle, a new instruction is fetched, but it takes 5 cycles to complete the instruction. In each cycle all stages are handling different instructions in parallel. 2 xample. In cycle 6, the result of the sub instruction is written back to register $t. add $s, $s, $s F D M sub $t, $t, $t2 F D addi $t, $, 55 F D xori $t, $t5, F and $t6, $s, $s We can fill the pipeline because there are no dependencies between instructions xercise: What is the LU doing in cycle 5? nswer: dding together values and 55

13 Hazards (/) Read after Write (RW) add $s, $s, $s2 The add instruction writes back the value $s in cycle 5 But $s is used in the decode phase in cycle data hazard occurs when an instruction reads a register that has not yet been written to. This kind of data hazard is called read after write (RW) hazard. sub $t, $s, $t and $t2, $t, $s xori $t, $s, 2 and will also use the wrong value for $s. xercise: For MIPS, will instruction xori result in a hazard? Stand for yes, sleep for no. nswer: No. xori is OK for MIPS, because it writes on the first part of the cycle (falling edge) and reads on the second part (rising edge) Hazards (2/) Solution : Forwarding The result from the execute stage for add can be forwarded (also called bypassing) to the execute stage for sub. add $s, $s, $s Hazard detection is implemented using a hazard detection unit that gives control signals to the datapath if data should be forwarded. sub $t, $s, $t and $t2, $t, $s xori $t, $s, 2 Can all data hazards be solved using forwarding? The and instruction s hazard is solved by forwarding as well.

14 Hazards (/) Solution : Forwarding (partially) 27 xercise: Which of the instructions sub, and, and xori have data hazards? Which can be solved using forwarding? nswer: Hazards: sub and and Can use forwarding: and 2 5 The sub instruction cannot be solved using forwarding because the memory access is available at the end of cycle, but is needed in the beginning of cycle. lw $s, 2($s2) sub $t, $s, $t and $t2, $t, $s xori $t, $s, 2 The and instruction memory result can be forwarded after the memory stage to execution. xori can read the data from the write stage (writes in first part of cycle, reads in second part) Hazards (/) Solution 2: Stalling 28 Solution when forwarding does not work: stalling fter stalling, the result can be forwarded to the execute stage. 2 5 lw $s, 2($s2) sub $t, $s, $t and $t2, $t, $s xori $t, $s, 2 F D D M W F We need to stall the pipeline. Stages are repeated and the fetch of xori is delayed. Stalling results in more than one cycle per instruction. The unused stage is called a bubble.

15 (/5) ssume Branch Not Taken 29 2: beq $s, $s2, 2: sub $t, $s, $t 2 5 Computes the branch target address and compares for equality in the execute () stage. If branch taken, update the PC in the memory (M) stage. 28: and $t2, $t, $s 2C: xori $t, $s, : addi $t, $s, If the branch is taken, we need to flush the pipeline. We have a branch misprediction penalty of cycles. Can we improve this? (2/5) Improving the Pipeline dd an equality comparison for beq in the decode phase (not shown here) PC next 2 W 2 2:6 5: Sign xtend Move the branch address calculation to the decode stage LU Right now, branch comparison is done in the execute stage W Fetch (F) Decode (D) xecute () (M) Writeback (W)

16 (/5) ssume Branch Not Taken 2: beq $s, $s2, 2: sub $t, $s, $t 2 5 The decode phase can change the next PC, so that the instruction at the branch taken address is fetched. 28: and $t2, $t, $s 2C: xori $t, $s, : addi $t, $s, Branch misprediction penalty is now reduced to cycle. Note that we may now introduce another data hazard (if operands are not available in the decode stage). Can be solved with forwarding or stalling (/5) Deeper Pipelines Why do we sometimes want more stages than 5? The critical path can be shorter with less logic in the slowest stage. The processor can have higher clock frequency. For instance, Intel s Core 2 duo has more than pipeline stages. Why not always have more pipeline stages? dds hardware (registers) The branch mispredication penalty increases!

17 (/5) Deeper Pipelines How can we handle deep pipelines, Static Branch Predictors and minimize misprediction? Statically (at compile time) determine if a branch is taken or not. For instance, predict branch not taken. Dynamic Branch Predictors Dynamically (at runtime) predict if a branch will be taken or note. Operates in the fetch state. Maintains a table, called the branch target buffer, that contains hundreds or thousands of executed branch instructions, their destinations, and information if the branches were taken or not. by Open Grid Scheduler / Grid ngine - Own work. Licensed under Creative Commons Zero, Public Domain

18 5 rmv7 The most popular IS for embedded devices (9 billon devices in 2, growth 2 billion a year) More complex addressing modes than MIPS (can do shift and add of addresses in registers in one instruction) RMv7 Condition results are saved in special flags: negative, zero, carry, overflow. 6 registers, each -bit (integers) size -bit (Thumb-mode, 6-bits encoding). Conditional execution of instructions, depending on condition code. xample: RM Cortex-8, a processor at GHz, -stage pipeline, with branch predictor. 6 Standard in laptops, PCs, and in the cloud CISC instructions are more powerful than for RM and MIPS, but requires more complex hardware architecture has evolved over the last 5 years, There are 6,, and 6 bits variants. 8 general purpose registers (eax, ebx, ecx, edx, esp, ebp, esi, edi). Variable length of instruction encoding (between and 5 bytes) rithmetic operations allow destination operand to be in memory. Major manufacturers are Intel and MD.

19 7 Summary Some key take away points: Pipelining is a temporal way of achieving parallelism Pipelining processors improve performance by reducing the clock period (shorter critical path) Pipelining introduces pipeline hazards. There are two main kind of hazards: data hazards and control hazards. hazards are solved by forwarding or stalling Control hazards are solved by flushing the pipeline and improved by branch prediction. Thanks for listening!

Computer Hardware Engineering

Computer Hardware Engineering Computer Hardware Engineering IS2, spring 2 Lecture : LU and s ssociate Professor, KTH Royal itute of Technology ssistant Research Engineer, University of California, Berkeley Revision v., June 7, 2: Minor

More information

Computer Hardware Engineering

Computer Hardware Engineering 2 Coue Structure Computer Hardware Engineering IS2, spring 2 Lecture : LU and s Module : I/O Systems Module : Logic Design L DCÖ L DCÖ2 Lab:dicom L7 Module 2: C and ssembly Programming ssociate Professor,

More information

Computer Hardware Engineering

Computer Hardware Engineering Computer Hardware Engineering IS2, spring 27 Lecture 9: LU and s ssociate Professor, KTH Royal Institute of Technology Slides version. 2 Course Structure Module : C and ssembly Programming LE LE2 LE EX

More information

Computer Organization and Components

Computer Organization and Components 2 Computer Organization and Components Coue Structure Module : C and ssembly Programming IS, fall 2 Lecture 9: LU and s LE LE2 LE LE Module : Processor Design EX LB S LB2 LE9 LE ssistant Research Engineer,

More information

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design ENGN64: Design of Computing Systems Topic 4: Single-Cycle Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Computer Organization and Components

Computer Organization and Components 2 Course Structure Computer Organization and Components Module 4: Memory Hierarchy Module 1: Logic Design IS1500, fall 2014 Lecture 4: and F1 DC Ö1 F2 DC Ö2 F7b Lab: dicom F8 Module 2: C and Associate

More information

Computer Hardware Engineering

Computer Hardware Engineering 2 Course Structure Computer Hardware ngineering IS1200, spring 2015 Lecture 3: Languages Module 4: I/O Systems Module 1: Logic Design L DCÖ1 L DCÖ2 Lab:dicom L7 Module 2: C and Assembly Programming Associate

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Single Cycle MIPS Processor Stephen. Edwards Columbia University Summer 26 Illustrations Copyright 27 Elsevier The path The lw The sw R-Type s The beq The Controller Encoding

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design ENGN6: Design of Computing Systems Topic : Single-Cycle Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM

More information

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl. Lecture 4: Review of MIPS Instruction formats, impl. of control and datapath, pipelined impl. 1 MIPS Instruction Types Data transfer: Load and store Integer arithmetic/logic Floating point arithmetic Control

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 7. Microarchitecture. Copyright 2013 Elsevier Inc. All rights reserved.

Chapter 7. Microarchitecture. Copyright 2013 Elsevier Inc. All rights reserved. Chapter 7 Microarchitecture 1 Figure 7.1 State elements of MIPS processor 2 Figure 7.2 Fetch instruction from memory 3 Figure 7.3 Read source operand from register file 4 Figure 7.4 Sign-extend the immediate

More information

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CPE 335 Computer Organization. Basic MIPS Architecture Part I CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture

More information

Design of the MIPS Processor (contd)

Design of the MIPS Processor (contd) Design of the MIPS Processor (contd) First, revisit the datapath for add, sub, lw, sw. We will augment it to accommodate the beq and j instructions. Execution of branch instructions beq $at, $zero, L add

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CENG 3420 Lecture 06: Datapath

CENG 3420 Lecture 06: Datapath CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference

More information

ECE 341. Lecture # 15

ECE 341. Lecture # 15 ECE 341 Lecture # 15 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 19, 2014 Portland State University Pipelining Structural Hazards Pipeline Performance Lecture Topics Effects of Stalls and Penalties

More information

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs. Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Design of the MIPS Processor

Design of the MIPS Processor Design of the MIPS Processor We will study the design of a simple version of MIPS that can support the following instructions: I-type instructions LW, SW R-type instructions, like ADD, SUB Conditional

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital

More information

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice Instructions Page 1 of 7 Use pencil, if you have one. For multiple choice questions, circle the letter of the one best choice unless the question specifically says to select all correct choices. There

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Systems Architecture

Systems Architecture Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The

More information

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week

More information

CSC258: Computer Organization. Microarchitecture

CSC258: Computer Organization. Microarchitecture CSC258: Computer Organization Microarchitecture 1 Wrap-up: Function Conventions 2 Key Elements: Caller Ensure that critical registers like $ra have been saved. Save caller-save registers. Place arguments

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined

More information

Lecture 16: Pipeline Controls. Spring 2018 Jason Tang

Lecture 16: Pipeline Controls. Spring 2018 Jason Tang Lecture 16: Pipeline Controls Spring 2018 Jason Tang 1 Topics Designing pipelined path Controlling pipeline operations 2 Pipelining Fetch Decode Execute Write ack Time Fetch Decode Execute Write ack Fetch

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components Course Structure IS500, fall 05 Lecture :, Concurrency,, and ILP Module : C and Assembly Programming L L L L Module : Processor Design X LAB S LAB L9 L5 Assistant Research

More information

LECTURE 5. Single-Cycle Datapath and Control

LECTURE 5. Single-Cycle Datapath and Control LECTURE 5 Single-Cycle Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor.

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization CS/COE0447: Computer Organization and Assembly Language Datapath and Control Sangyeun Cho Dept. of Computer Science A simple MIPS We will design a simple MIPS processor that supports a small instruction

More information

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization A simple MIPS CS/COE447: Computer Organization and Assembly Language Datapath and Control Sangyeun Cho Dept. of Computer Science We will design a simple MIPS processor that supports a small instruction

More information

CS 3330 Exam 2 Spring 2016 Name: EXAM KEY Computing ID: KEY

CS 3330 Exam 2 Spring 2016 Name: EXAM KEY Computing ID: KEY S 3330 Spring 2016 xam 2 Variant U page 1 of 6 mail I: KY S 3330 xam 2 Spring 2016 Name: XM KY omputing I: KY Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters

More information

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM EXAM #1 CS 2410 Graduate Computer Architecture Spring 2016, MW 11:00 AM 12:15 PM Directions: This exam is closed book. Put all materials under your desk, including cell phones, smart phones, smart watches,

More information

Computer Architecture V Fall Practice Exam Questions

Computer Architecture V Fall Practice Exam Questions Computer Architecture V22.0436 Fall 2002 Practice Exam Questions These are practice exam questions for the material covered since the mid-term exam. Please note that the final exam is cumulative. See the

More information

CS 351 Exam 2, Fall 2012

CS 351 Exam 2, Fall 2012 CS 351 Exam 2, Fall 2012 Your name: Rules You may use one handwritten 8.5 x 11 cheat sheet (front and back). This is the only resource you may consult during this exam. Include explanations and comments

More information

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009 ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into

More information

Cycle Time for Non-pipelined & Pipelined processors

Cycle Time for Non-pipelined & Pipelined processors Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O September 22, 2011 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http://www-inst.eecs.berkeley.edu/~cs150

More information

Single Instructions Can Execute Several Low Level

Single Instructions Can Execute Several Low Level We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with single instructions

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Processor Design Pipelined Processor (II) Hung-Wei Tseng Processor Design Pipelined Processor (II) Hung-Wei Tseng Recap: Pipelining Break up the logic with pipeline registers into pipeline stages Each pipeline registers is clocked Each pipeline stage takes one

More information

Computer Architecture

Computer Architecture Computer Architecture An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Lecture 9 Pipeline and Cache

Lecture 9 Pipeline and Cache Lecture 9 Pipeline and Cache Peng Liu liupeng@zju.edu.cn 1 What makes it easy Pipelining Review all instructions are the same length just a few instruction formats memory operands appear only in loads

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

EECS Digital Design

EECS Digital Design EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Processor Design Pipelined Processor. Hung-Wei Tseng

Processor Design Pipelined Processor. Hung-Wei Tseng Processor Design Pipelined Processor Hung-Wei Tseng Pipelining 7 Pipelining Break up the logic with isters into pipeline stages Each stage can act on different instruction/data States/Control signals of

More information