EE2011 Computer Organization Lecture 10: Enhancing Performance with Pipelining ~ Pipelined Datapath

Similar documents
ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

ECE260: Fundamentals of Computer Engineering

ECE232: Hardware Organization and Design

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

ECS 154B Computer Architecture II Spring 2009

Pipelining. Pipeline performance

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

LECTURE 3: THE PROCESSOR

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CSEE 3827: Fundamentals of Computer Systems

Processor (II) - pipelining. Hwansoo Han

ECE331: Hardware Organization and Design

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

ECE260: Fundamentals of Computer Engineering

Designing a Pipelined CPU

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

COMP2611: Computer Organization. The Pipelined Processor

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

DLX Unpipelined Implementation

COMPUTER ORGANIZATION AND DESIGN

Chapter 4 The Processor 1. Chapter 4B. The Processor

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Computer Organization and Structure

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE331: Hardware Organization and Design

ECE 331 Hardware Organization and Design. UMass ECE Discussion 10 4/5/2018

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Outline Marquette University

COSC 6385 Computer Architecture - Pipelining

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Pipeline design. Mehran Rezaei

Full Datapath. Chapter 4 The Processor 2

CS 251, Winter 2018, Assignment % of course mark

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Codeword[1] Codeword[0]

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Slides for Lecture 15

CSE Quiz 3 - Fall 2009

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipeline Data Hazards. Dealing With Data Hazards

ECEC 355: Pipelining

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

CS 251, Winter 2019, Assignment % of course mark

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

CS422 Computer Architecture

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

COMPUTER ORGANIZATION AND DESIGN

Pipeline Control Hazards and Instruction Variations

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CS/CoE 1541 Exam 1 (Spring 2019).

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

LECTURE 9. Pipeline Hazards

HY425 Lecture 05: Branch Prediction

Lecture 10: Simple Data Path

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Appendix C. Abdullah Muzahid CS 5513

Computer Systems Architecture Spring 2016

CS 2506 Computer Organization II Test 2

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

University of Jordan Computer Engineering Department CPE439: Computer Design Lab

are Softw Instruction Set Architecture Microarchitecture are rdw

ECE 313 Computer Organization FINAL EXAM December 13, 2000

ECE232: Hardware Organization and Design

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Chapter 4. The Processor

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Pipelining. CSC Friday, November 6, 2015

ECE 2300 Digital Logic & Computer Organization. Caches

Chapter 4. The Processor

Chapter 4 (Part II) Sequential Laundry

MIPS An ISA for Pipelining

Pipelining. Maurizio Palesi

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Prerequisite Quiz January 23, 2007 CS252 Computer Architecture and Engineering

ELE 655 Microprocessor System Design

Question 1: (20 points) For this question, refer to the following pipeline architecture.

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

Design a MIPS Processor (2/2)

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Parallelism via Multithreaded and Multicore CPUs. Bradley Dutton March 29, 2010 ELEC 6200

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Thomas Polzer Institut für Technische Informatik

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Transcription:

EE2011 Computer Organization Lecture 10: Enhancing Performance with Pipelining ~ Pipelined Datapath Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw May 2018

Pipelined Datapath (Ch. 4.6) Pipelined Datapath Computer Organization, 106/2, EE/CGU, W.Y. Lin 2

Basic Idea for Pipelined Datapath ~ from Single-cycle Datapath (Fig. 4.33) WB Let s borrow the path from Single-cycle design as much as possible. Why? What do we need to add to actually split the path into stages? => Add internal registers, i.e. Pipeline Registers, to hold internal between each stage in the pipeline. Computer Organization, 106/2, EE/CGU, W.Y. Lin 3

Pipelined Datapath (Fig. 4.35) 64 bits 128 bits 97 bits 64 bits Each pipeline register has to be large enough to hold all the produced in previous cycle and being passing to the next stage. Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? Computer Organization, 106/2, EE/CGU, W.Y. Lin 4

Details of Pipeline Registers IF/ID (64 bits) IF/ID.IR : register for instruction code (32 bits) IF/ID.PC : register for Incremented Program Counter (32 bits) ID/EX (128 bits) ID/EX.RegRS : register for Rs value (32 bits) ID/EX.RegRT : register for Rt value (32 bits) ID/EX.PC : register for the passage of incremented Program Counter (32 bits) ID/EX.S32 : register for 32-bit sign-extension (32 bits) EX/MEM (97 bits) EX/MEM.BrAddr = register for the Computed Branch Target Address (32 bits) EX/MEM.Zero = register for register compared result (1 bit) EX/MEM.ALUResu = register for ALU result (32 bits) EX/MEM.RegRT = register for the passage of Rt value (32 bits) MEM/WB (64 bits) MEM/WB.DMData = register for the from memory (32 bits) MEM/WB.ALUResu = register for the ALU result passage (32 bits) Computer Organization, 106/2, EE/CGU, W.Y. Lin 5

Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined path Single-clock-cycle pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. multi-clock-cycle diagram Graph of operation over time We ll look at single-clock-cycle diagrams for load & store Computer Organization, 106/2, EE/CGU, W.Y. Lin 6

IF for Load, Store, (Fig. 4.36) Computer Organization, 106/2, EE/CGU, W.Y. Lin 7

ID for Load, Store, (Fig. 4.36) Computer Organization, 106/2, EE/CGU, W.Y. Lin 8

EX for Load (Fig. 4.37) Computer Organization, 106/2, EE/CGU, W.Y. Lin 9

MEM for Load (Fig. 4.38) Computer Organization, 106/2, EE/CGU, W.Y. Lin 10

WB for Load (Fig. 4.38) Computer Organization, 106/2, EE/CGU, W.Y. Lin 11

EX for Store (Fig. 4.39) Computer Organization, 106/2, EE/CGU, W.Y. Lin 12

MEM for Store (Fig. 4.40) Computer Organization, 106/2, EE/CGU, W.Y. Lin 13

WB for Store (Fig. 4.40) Computer Organization, 106/2, EE/CGU, W.Y. Lin 14

Example Cycle 1 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 add $1, $2, $3 4 Add 8004 IF/ID ID/EX EX/MEM MEM/WB Shift left 2 Add Add result Reg# Value 0 0 1 0 2 20 3 50 8004 8000 PC 8000 Address 0 2 3 1 0 32 Instruction memory register 1 register 2 Registers register 1 2 Zero ALU ALU result Address Data memory 4 10 5 600 6 88 Actions during the cycle I_Mem[8000] PC + 4 16 Sign 32 extend 7 1 8 2 9 3 Computer Organization, 99/2, EE/CGU, W.Y. Lin 15

Example Cycle 2 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1 8 2 9 3 8008 8004 PC lw $4, 4($5) 4 8004 Address Add Instruction memory Actions during 2nd the cycle I_Mem[8004]; PC + 4; 8008 35 5 4 4 IF/ID ID/EX EX/MEM MEM/WB 8004 0 2 3 1 0 32 Actions when 2nd clock tick IF/ID.IR =0 2 3 1 0 32; IF/ID.PC = 8004; PC = 8004; add $1, $2, $3 register 1 register 2 Registers register 8004 Actions during the 2nd cycle Reg[2] & Reg[3]; S16_to_S32(2080); 2 3 1 2080 1 2 20 50 2080 16 Sign 32 extend Shift left 2 Add Add result Zero ALU ALU result Address Data memory Computer Organization, 99/2, EE/CGU, W.Y. Lin 16

Example Cycle 3 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1 8 2 9 3 8012 8008 PC 4 8008 sw $6, 8($5) Address Add Instruction memory Actions during 3rd the cycle I_Mem[8008]; PC + 4; Actions when 3rd clock tick IF/ID.IR =35 5 4 4; IF/ID.PC = 8008; PC = 8008; 8012 43 5 6 8 IF/ID ID/EX EX/MEM MEM/WB 8008 35 5 4 4 lw $4, 4($5) register 1 register 2 Registers 2 register 8008 Actions during the 3rd cycle Reg[5] & Reg[4]; S16_to_S32(4); 5 4 0 4 1 600 10 16 Sign 32 extend Actions when 3rd clock tick ID/EX.RegRS =20; ID/EX.RegRT = 50; ID/EX.PC = 8004; ID/EX.S32 = 2080; 4 8004 20 50 2080 Shift left 2 add $1, $2, $3 8004 8320 20 2080 Add Add result 50 Zero ALU ALU result 50 16324 0 70 Address Actions during 3rd clock ID/EX.RegRS + ID/EX.RegRT; ID/EX.PC + Shf_L_2(ID/EX.S32); Data memory Computer Organization, 99/2, EE/CGU, W.Y. Lin 17

Example Cycle 4 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1 8 2 9 3 8016 8012 PC and $7, $2, $3 4 8012 Address Add Instruction memory Actions during 4th the cycle I_Mem[8012]; PC + 4; Actions when 4th clock tick IF/ID.IR =43 5 6 8; IF/ID.PC = 8012; PC = 8012; 8016 0 2 3 7 0 36 IF/ID ID/EX EX/MEM MEM/WB 8012 43 5 6 8 5 6 0 register 1 register 2 Registers 2 register 8 sw $6, 8($5) 8012 Actions during the 4th cycle Reg[5] & Reg[6]; S16_to_S32(8); 1 600 88 16 Sign 32 extend Actions when 4th clock tick ID/EX.RegRS =600; ID/EX.RegRT = 10; ID/EX.PC = 8008; ID/EX.S32 = 4; 8 8008 600 10 4 8008 Shift left 2 4 16 600 lw $4, 4($5) 4 Add Add result Zero ALU ALU result 10 8024 0 604 16324 0 70 50 70 50 add $1, $2, $3 Address Actions during 4th clock ID/EX.RegRS + ID/EX.EX.S32; ID/EX.PC + Shf_L_2(ID/EX.S32); Actions during the 4th cycle No Action Data memory 70 Actions when 4th clock tick EX/MEM.BrAddr =16324; EX/MEM.Zero = 0; EX/MEM.ALUResu = 70; EX/MEM.RegRT = 50; Computer Organization, 99/2, EE/CGU, W.Y. Lin 18

Example Cycle 5 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1->70 8 2 9 3 8020 8016 PC sub $8, $0, $9 4 8016 Address Add Instruction memory Actions during 5th the cycle I_Mem[8016]; PC + 4; Actions when 5th clock tick IF/ID.IR =0 2 3 7 0 36; IF/ID.PC = 8016; PC = 8016; 8020 0 0 9 8 0 34 IF/ID ID/EX EX/MEM MEM/WB 8016 0 2 3 7 0 36 register 1 register 2 Registers 2 register 8016 Actions during the 5th cycle Reg[2] & Reg[3]; S16_to_S32(7204); 2 3 7 70 and $7, $2, $3 7204 1 600 88 7204 16 Sign 32 extend Actions when 5th clock tick ID/EX.RegRS =600; ID/EX.RegRT = 88; ID/EX.PC = 8012; ID/EX.S32 = 8; 8012 600 88 8 8012 Shift left 2 8 sw $6, 8($5) 32 600 8 Add Add result 8044 Zero ALU ALU result 88 0 608 8024 0 604 10 lw $4, 4($5) Address Actions during 5th clock ID/EX.RegRS + ID/EX.EX.S32; ID/EX.PC + Shf_L_2(ID/EX.S32); Actions during the 5th cycle D_Mem[604]; 604 10 Data memory 70 add $1, $2, $3 Actions when 5th clock tick MEM/WB.ALUResu =70; MEM/WB.DMData = xxxx; Actions during the 5th cycle Reg[IF/ID.RegRd] = MEM/WB.ALUResu; 1234 xxxx 70 Actions when 5th clock tick EX/MEM.BrAddr =8024; EX/MEM.Zero = 0; EX/MEM.ALUResu = 604; EX/MEM.RegRT = 10; Computer Organization, 99/2, EE/CGU, W.Y. Lin 19

WB for Load Wrong register number Computer Organization, 106/2, EE/CGU, W.Y. Lin 20

Corrected Datapath (Fig. 4.41) All the information (including control signals) required by all instructions on the executing stages have to be carried!! Computer Organization, 106/2, EE/CGU, W.Y. Lin 21

Corrected Datapath for Load Computer Organization, 106/2, EE/CGU, W.Y. Lin 22

Graphically Representing Pipelines (p. 286, Fig. 4.43) Multiple-clock-cycle pipelining diagram of five instructions Form showing resource usage Computer Organization, 106/2, EE/CGU, W.Y. Lin 23

Graphically Representing Pipelines (Fig. 4.44) Traditional multiple-clock-cycle pipelining diagram of five instructions Computer Organization, 106/2, EE/CGU, W.Y. Lin 24

Graphically Representing Pipelines (Fig. 4.45) The single-clock-cycle diagram corresponding to clock cycle 5 of the pipeline. Computer Organization, 106/2, EE/CGU, W.Y. Lin 25