Pipelined Processor Design

Similar documents
Pipelined Processor Design

RISC Processor Design

Computer Architecture

RISC Design: Multi-Cycle Implementation

Pipelining. CSC Friday, November 6, 2015

Processor (I) - datapath & control. Hwansoo Han

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

ECS 154B Computer Architecture II Spring 2009

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

1 Hazards COMP2611 Fall 2015 Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

ELE 655 Microprocessor System Design

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

RISC Architecture: Multi-Cycle Implementation

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

LECTURE 3: THE PROCESSOR

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Chapter 4. The Processor

Chapter 4. The Processor

Computer Organization and Structure

Processor (II) - pipelining. Hwansoo Han

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

RISC Architecture: Multi-Cycle Implementation

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Full Datapath. Chapter 4 The Processor 2

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CSEE 3827: Fundamentals of Computer Systems

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

CS3350B Computer Architecture Quiz 3 March 15, 2018

COSC 6385 Computer Architecture - Pipelining

COMPUTER ORGANIZATION AND DESIGN

Processor (multi-cycle)

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

14:332:331 Pipelined Datapath

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

COMP303 Computer Architecture Lecture 9. Single Cycle Control

CSEN 601: Computer System Architecture Summer 2014

Full Datapath. Chapter 4 The Processor 2

Chapter 4 The Processor 1. Chapter 4B. The Processor

ECE 313 Computer Organization FINAL EXAM December 13, 2000

COMPUTER ORGANIZATION AND DESIGN

ECE232: Hardware Organization and Design

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

CPE 335. Basic MIPS Architecture Part II

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

Processor: Multi- Cycle Datapath & Control

CENG 3420 Lecture 06: Datapath

Basic Pipelining Concepts

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Computer Science 141 Computing Hardware

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

The overall datapath for RT, lw,sw beq instrucution

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture 16: Pipeline Controls. Spring 2018 Jason Tang

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Lecture 7 Pipelining. Peng Liu.

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Systems Architecture

Instr. execution impl. view

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Computer Architecture and IC Design Lab

COMPUTER ORGANIZATION AND DESI

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

DLX Unpipelined Implementation

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Single Cycle CPU Design. Mehran Rezaei

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CENG 3420 Lecture 06: Pipeline

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Laboratory Exercise 6 Pipelined Processors 0.0

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

Lecture 10: Simple Data Path

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Design of the MIPS Processor (contd)

Transcription:

Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design

Courtesy: Prof. Vishwani Agrawal Mar 17, 2008 SE-273@SERC 2

Pipeline Registers PC 4 Add Instr. mem. This requires a CONTROL not too different from single-cycle 0-15 0-5 opcode 26-31 21-25 16-20 11-15 RegDst Sign ext. CONTROL 1 mux 0 Reg. File RegWrite Shift left 2 Branch Src 1 mux 0 Op Cont. zero 1 mux 0 Data mem. Mar 17, 2008 SE-273@SERC 3 MemWrite MemRead MemtoReg 0 mux 1

Pipeline Register Functions Four pipeline registers are added: Register name Data held PC+4, Instruction word (IW) PC+4, R1, R2, IW(0-15) sign ext., IW(11-15) PC+4, zero, Result, R2, IW(11-15) or IW(16-20) M[Result], Result, IW(11-15) or IW(16-20) Mar 17, 2008 SE-273@SERC 4

Pipelined Datapath PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw opcode 26-31 21-25 16-20 Reg. File Sign ext. Shift left 2 1 mux 0 zero 1 mux 0 Data mem. 0 mux 1 0-15 Mar 17, 2008 SE-273@SERC 5

Five-Cycle Pipeline CC1 CC2 CC3 CC4 CC5 Mar 17, 2008 SE-273@SERC 6

Add Instruction add $t0, $s1, $s2 Machine instruction word 000000 10001 10010 01000 00000 100000 opcode $s1 $s2 $t0 function CC1 CC2 CC3 CC4 CC5 IF ID EX MEM WB read $s1 add write $t0 read $s2 $s1+$s2 Mar 17, 2008 SE-273@SERC 7

Pipelined Datapath Executing add PC 4 11-15 for R-type 16-20 for I-type lw t0 Add Instr mem opcode 0-15 26-31 21-25 s1 Reg. File 16-20 s2 $s2 Sign ext. Shift left 2 $s1 1 mux 0 zero 1 mux 0 addr Data mem data 0 mux 1 Mar 17, 2008 SE-273@SERC 8

Load Instruction lw $t0, 1200 ($t1) 100011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IF ID EX MEM WB read $t1 add read write $t0 sign ext $t1+1200 M[addr] 1200 Mar 17, 2008 SE-273@SERC 9

Pipelined Datapath Executing lw PC 4 Add 11-15 for R-type 16-20 for I-type lw t0 Instr mem opcode 0-15 1200 26-31 21-25 16-20 t1 Reg. File Sign ext. Shift left 2 1 mux 0 zero addr Data mem Mar 17, 2008 SE-273@SERC 10 $t1 data 1 mux 0 0 mux 1

Store Instruction sw $t0, 1200 ($t1) 101011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IF ID EX MEM WB read $t1 add write sign ext $t1+1200 M[addr] 1200 (addr) $t0 Mar 17, 2008 SE-273@SERC 11

Pipelined Datapath Executing sw PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw opcode 0-15 1200 26-31 21-25 16-20 t0 t1 Reg. File Sign ext. $t0 Shift left 2 1 mux 0 zero addr Data mem Mar 17, 2008 SE-273@SERC 12 $t1 data 1 mux 0 0 mux 1

Executing a Program Consider a five-instruction segment: lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 Mar 17, 2008 SE-273@SERC 13

time Program Execution CC1 CC2 CC3 CC4 CC5 lw $10, 20($1) Program instructions sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 Mar 17, 2008 SE-273@SERC 14

CC5 IF: add $14, $5, $6 ID: lw $13, 24($1) EX: add $12, $3, $4 MEM: sub $11, $2, $3 WB: lw $10, 20($1) PC 4 Add Instr mem 11-15 for R-type 16-20 for I-type lw opcode 26-31 21-25 16-20 Reg. File Sign ext. Shift left 2 1 mux 0 zero 1 mux 0 Data mem. 0 mux 1 0-15 Mar 17, 2008 SE-273@SERC 15

Advantages of Pipeline After the fifth cycle (CC5), one instruction is completed each cycle; CPI 1, neglecting the initial pipeline latency of 5 cycles. Pipeline latency is defined as the number of stages in the pipeline, or The number of clock cycles after which the first instruction is completed. The clock cycle time is about four times shorter than that of single-cycle datapath and about the same as that of multicycle datapath. For multicycle datapath, CPI = 3.. So, pipelined execution is faster, but... Mar 17, 2008 SE-273@SERC 16

Science is always wrong. It never solves a problem without creating ten more. George Bernard Shaw Mar 17, 2008 SE-273@SERC 17

Pipeline Hazards Definition: Hazard in a pipeline is a situation in which the next instruction cannot complete execution one clock cycle after completion of the present instruction. Three types of hazards: Structural hazard (resource conflict) Data hazard Control hazard Mar 17, 2008 SE-273@SERC 18

Structural Hazard Two instructions cannot execute due to a resource conflict. Example: Consider a computer with a common data and instruction memory. The fourth cycle of a lw instruction requires memory access (memory read) and at the same time the first cycle of the fourth instruction requires instruction fetch (memory read). This will cause a memory resource conflict. Mar 17, 2008 SE-273@SERC 19

Example of Structural Hazard CC1 CC2 CC3 CC4 CC5 lw $10, 20($1) / / / / / / / / sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) Mar 17, 2008 SE-273@SERC 20 time Program instructions Common data and instr. Mem. Nedded by two instructions

Possible Remedies for Structural Hazards Provide duplicate hardware resources in datapath. Control unit or compiler can insert delays (no-op cycles) between instructions. This is known as pipeline stall or bubble. Mar 17, 2008 SE-273@SERC 21

time Stall (Bubble) for Structural Hazard CC1 CC2 CC3 CC4 CC5 lw $10, 20($1) / / / / / / / / sub $11, $2, $3 Program instructions add $12, $3, $4 Stall (bubble) lw $13, 24($1) Mar 17, 2008 SE-273@SERC 22

Data Hazard Data hazard means that an instruction cannot be completed because the needed data, to be generated by another instruction in the pipeline, is not available. Example: consider two instructions: add $s0, $t0, $t1 sub $t2, $s0, $t3 # needs $s0 Mar 17, 2008 SE-273@SERC 23

Example of Data Hazard CC1 CC2 CC3 CC4 CC5 Write s0 in CC5 add $s0, $t0, $t1 time Read s0 and t3 in CC3 We need to read s0 from reg file in cycle 3 But s0 will not be written in reg file until cycle 5 However, s0 will only be used in cycle 4 And it is available at the end of cycle 3 sub $t2, $s0, $t3 Program instructions Mar 17, 2008 SE-273@SERC 24

Forwarding or Bypassing Output of a resource used by an instruction is forwarded to the input of some resource being used by another instruction. Forwarding can eliminate some, but not all, data hazards. Mar 17, 2008 SE-273@SERC 25

Forwarding for Data Hazard CC1 CC2 CC3 CC4 CC5 add $s0, $t0, $t1 sub $t2, $s0, $t3 Mar 17, 2008 SE-273@SERC 26 time Program instructions Write s0 in CC5 Forwarding Read s0 and t3 in CC3

Forwarding Unit Hardware to reg. file FORW. MUX FORW. MUX Data Mem. MUX Control signals Source reg. IDs from opcode Forwarding Unit Mar 17, 2008 SE-273@SERC 27

Forwarding Alone May Not Work CC1 CC2 CC3 CC4 CC5 lw $s0, 20($s1) sub $t2, $s0, $t3 Mar 17, 2008 SE-273@SERC 28 time Program instructions Write s0 in CC5 Read s0 and t3 in CC3 data needed by sub (data hazard) data available from memory only at the end of cycle 4

Use Bubble and Forwarding CC1 CC2 CC3 CC4 CC5 lw $s0, 20($s1) sub $t2, $s0, $t3 Mar 17, 2008 SE-273@SERC 29 time Program instructions Write s0 in CC5 Forwarding stall (bubble)

Disable write PC Hazard Detection Unit Hardware Instruction Source reg. IDs from opcode Hazard Detection Unit Control 0 to reg. file NOP MUX FORW. MUX FORW. MUX Forwarding Unit Mar 17, 2008 SE-273@SERC 30 Data Mem. Control signals

Resolving Hazards Hazards are resolved by Hazard detection and forwarding units. Compiler s understanding of how these units work can improve performance. Mar 17, 2008 SE-273@SERC 31

Avoiding Stall by Code Reorder C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0). $t1 written lw $t2, 4($t0).. $t2 written add $t3, $t1, $t2... $t1, $t2 needed sw $t3, 12($t0).... lw $t4, 8($t0)..... $t4 written add $t5, $t1, $t4..... $t4 needed sw $t5, 16,($t0)............... Mar 17, 2008 SE-273@SERC 32

Reordered Code C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 no hazard sw $t3, 12($t0) add $t5, $t1, $t4 no hazard sw $t5, 16,($t0) Mar 17, 2008 SE-273@SERC 33

Control Hazard Instruction to be fetched is not known! Example: Instruction being executed is branch-type, which will determine the next instruction: add $4, $5, $6 beq $1, $2, 40 next instruction... 40 and $7, $8, $9 Mar 17, 2008 SE-273@SERC 34

Stall on Branch time CC1 CC2 CC3 CC4 CC5 add $4, $5, $6 Program instructions beq $1, $2, 40 Stall (bubble) next instruction or and $7, $8, $9 Mar 17, 2008 SE-273@SERC 35

Why Only One Stall? Extra hardware in ID phase: Additional to compute branch address Comparator to generate zero signal Hazard detection unit writes the branch address in PC Mar 17, 2008 SE-273@SERC 36

Ways to Handle Branch Stall or bubble Branch prediction: Heuristics Next instruction Prediction based on statistics (dynamic) Hardware decision (dynamic) Prediction error: pipeline flush Delayed branch Mar 17, 2008 SE-273@SERC 37

Delayed Branch Example Stall on branch add $4, $5, $6 beq $1, $2, skip next instruction... skip or $7, $8, $9 Delayed branch beq $1, $2, skip add $4, $5, $6 next instruction... skip or $7, $8, $9 Instruction executed irrespective of branch decision Mar 17, 2008 SE-273@SERC 38

Delayed Branch CC1 CC2 CC3 CC4 CC5 beq $1, $2, skip add $4, $5, $6 next instruction or skip or $7, $8, $9 Mar 17, 2008 SE-273@SERC 39 time Program instructions

Summary: Hazards Structural hazards Cause: resource conflict Remedies: (i) hardware resources, (ii) stall (bubble) Data hazards Cause: data unavailablity Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering Control hazards Cause: out-of-sequence execution (branch or jump) Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush Mar 17, 2008 SE-273@SERC 40

Thank You Mar 17, 2008 SE-273@SERC 41