PS Midterm 2. Pipelining

Similar documents
Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

1048: Computer Organization

TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

What do we have so far? Multi-Cycle Datapath

Pipelining. Chapter 4

Enhanced Performance with Pipelining

Review: Computer Organization

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

Solutions for Chapter 6 Exercises

Chapter 6: Pipelining

CS 251, Winter 2019, Assignment % of course mark

1048: Computer Organization

CS 251, Winter 2018, Assignment % of course mark

Overview of Pipelining

Chapter 6: Pipelining

The extra single-cycle adders

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

Exceptions and interrupts

EEC 483 Computer Organization

EEC 483 Computer Organization

Review. A single-cycle MIPS processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Review Multicycle: What is Happening. Controlling The Multicycle Design

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

EEC 483 Computer Organization. Branch (Control) Hazards

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

The single-cycle design from last time

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

ECE260: Fundamentals of Computer Engineering

Full Datapath. Chapter 4 The Processor 2

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor (II) - pipelining. Hwansoo Han

Chapter 4 The Processor 1. Chapter 4B. The Processor

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

CS 251, Spring 2018, Assignment 3.0 3% of course mark

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

PART I: Adding Instructions to the Datapath. (2 nd Edition):

Outline Marquette University

Quiz #1 EEC 483, Spring 2019

COMPUTER ORGANIZATION AND DESIGN

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

Chapter 4 (Part II) Sequential Laundry

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

14:332:331 Pipelined Datapath

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS420/520 Homework Assignment: Pipelining

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

Lecture 7. Building A Simple Processor

Lab 8 (All Sections) Prelab: ALU and ALU Control

Computer Architecture. Lecture 6: Pipelining

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

CS 152 Computer Architecture and Engineering

CENG 3420 Lecture 06: Pipeline

CS 251, Winter 2018, Assignment % of course mark

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Full Datapath. Chapter 4 The Processor 2

CSEE 3827: Fundamentals of Computer Systems

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

ECS 154B Computer Architecture II Spring 2009

Chapter 4 The Processor 1. Chapter 4A. The Processor

Designing a Pipelined CPU

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

Computer Architecture

Advanced Computer Architecture Pipelining

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

1048: Computer Organization

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

LECTURE 9. Pipeline Hazards

Improve performance by increasing instruction throughput

Pipelining: Basic Concepts

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W12-M

Computer Architecture

ELE 655 Microprocessor System Design

Pipeline design. Mehran Rezaei

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

Multi-cycle Datapath (Our Version)

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

ECE232: Hardware Organization and Design

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Hardware Design Tips. Outline

Design a MIPS Processor (2/2)

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Thomas Polzer Institut für Technische Informatik

Pipelined Processor Design

Transcription:

PS idterm 2 Pipelining

Seqential Landry 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 2 3 4 2 3 4 2 3 4 2 Seqential landry takes 6 hors for 4 loads If they learned pipelining, how long wold landry take?

Pipelined Landry: Start work ASAP 6 P 7 8 9 idnight Time T a s k O r d e r A B C D 3 4 4 4 4 2 Pipelined landry takes 3.5 hors for 4 loads

Total Time for Eight Instrctions class fetch Register read operation access Register write Total time Load word 2 ns ns 2 ns 2 ns ns 8 ns Store word 2 ns ns 2 ns 2 ns 7 ns R-type 2 ns ns 2 ns ns 6 ns Branch 2 ns ns 2 ns 5 ns R-type instrctions: add, sb, and, or, slt

The Five Stages of the Load Instrction Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Ifetch Reg/Dec Eec em Wr Ifetch: Instrction Fetch Fetch the instrction from the Instrction emory Reg/Dec: Registers Fetch and Instrction Decode Eec: Calclate the memory address em: Read the data from the emory Wr: Write the data back to the register file

Single Cycle, ltiple Cycle, vs. Pipeline Clk Cycle Cycle 2 Single Cycle plementation: Load Store Waste Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle Clk ltiple Cycle plementation: Load Ifetch Reg Eec em Wr Store Ifetch Reg Eec em R-type Ifetch Pipeline plementation: Load Ifetch Reg Eec em Wr Store Ifetch Reg Eec em Wr R-type Ifetch Reg Eec em Wr

Why Pipeline? Sppose instrctions are eected The single cycle machine has a cycle time of 45 ns The mlticycle and pipeline machines have cycle times of ns The mlticycle machine has a CPI of 3.6 Single Cycle achine 45 ns/cycle CPI inst = 45 ns lticycle achine ns/cycle 3.6 CPI inst = 36 ns Ideal pipelined machine ns/cycle ( CPI inst + 4 cycle drain) = 4 ns Ideal pipelined vs. single cycle speedp 45 ns / 4 ns = 4.33 What has not yet been considered?

Single Cycle path IF: Instrction Fetch ID: Instrction decode / Register file read EX: Eecte / address calclation IF: emory access WB: Write back PC 4 ress Instrction emory Registers ra rb rw bsw bsa bsb Sign etent Shift left 2 Eqal r. ot emory in

Pipelined Version of the path IF/ ID ID/ EX EX / E E / WB 4 PC ress Instrction emory ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Eqal r. ot emory in

An Eample to Clarify Pipelining Since many instrctions are simltaneosly are eecting in a single cycle datapath, it can be difficlt to nderstand. The following code will be eamined: lw $, 2($) sb $, $2, $3 Time (clock cycles) I n s t r. O r d e r lw $,2($) sb $,$2,$3 CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 em Reg em Reg em Reg em Reg

Clock lw $, 2($) Instrction fetch IF/ ID ID/ EX EX / E E / WB 4 PC ress Instrction emory ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Eqal r. ot emory in

Clock 2 sb $, $2, $3 Instrction fetch lw $, 2($) Instrction decode IF/ ID ID/ EX EX / E E / WB 4 PC ress Instrction emory ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Eqal r. ot emory in

Clock 3 sb $, $2, $3 Instrction decode lw $, 2($) Eection IF/ ID ID/ EX EX / E E / WB 4 PC ress Instrction emory ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Eqal r. ot emory in

Clock 4 sb $, $2, $3 Eection lw $, 2($) emory IF/ ID ID/ EX EX / E E / WB 4 PC ress Instrction emory ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Eqal r. ot emory in

Clock 5 sb $, $2, $3 emory lw $, 2($) Write Back IF/ ID ID/ EX EX / E E / WB 4 PC ress Instrction emory ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Eqal r. ot emory in

Pipelined path with Control Signals PCSrc IF/ ID ID/ EX EX / E E / WB PC 4 ress Instrction emory RegWr ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Src 6 Control RegDst Eqal op Branch r. ot emory in emrd emwr emtoreg

An Eample to Clarify Pipelined Control Let s look at what s happing in the pipeline for the following program. lw $, 2($) sb $, $2, $3 and $2, $4, $5 or $3, $6, $7 add $4, $8, $9 Code does not have any data, control, or strctral hazard.

IF: lw $,2($) IF/ ID ID/ EX EX / E E / WB Control WB WB EX WB PC 4 ress Instrction emory Clock RegWr ra rb rw Registers bsw [5-] [2-6] [5-] bsa bsb Sign etent Shift left 2 Src Eqal Control op RegDst Branch r. ot emory in emrd emwr emtoreg

IF: sb $,$2,$3 ID: lw $,2($) IF/ ID ID/ EX EX / E E / WB lw WB Control EX WB WB PC 4 ress Instrction emory Clock 2 X 2 X RegWr ra rb rw Registers bsw [5-] [2-6] [5-] $ bsa $X bsb Sign etent 2 X Shift left 2 Src Eqal Control op RegDst Branch r. ot emory in emrd emwr emtoreg

IF: and $2,$4,$5 ID: sb $,$2,$3 EX: lw $,2($) IF/ ID ID/ EX EX / E E / WB sb WB Control EX WB WB PC 4 ress Instrction emory Clock 3 2 3 X X RegWr ra rb rw Registers bsw [5-] [2-6] [5-] $2 bsa $3 bsb Sign etent X X $ 2 Shift left 2 Src Eqal Control op RegDst Branch r. ot emory in emrd emwr emtoreg

IF: or $3,$6,$7 ID: and $2,$4,$5 EX: sb $,$2,$3 E: lw $,2($) IF/ ID ID/ EX EX / E E / WB and WB Control EX WB WB PC 4 ress Instrction emory Clock 4 4 5 X X 2 RegWr ra rb rw Registers bsw [5-] [2-6] [5-] $4 bsa $5 bsb Sign etent X X 2 $2 $3 Shift left 2 Src Eqal Control op RegDst Branch r. ot emory in emrd emwr emtoreg

IF: add $4,$8,$9 ID: or $3,$6,$7 EX: and $2,$4,$5 E: sb $,$2,$3 WB: lw $ IF/ ID ID/ EX EX / E E / WB or WB Control EX WB WB PC 4 ress Instrction emory Clock 5 6 7 X X 3 RegWr ra rb rw Registers bsw [5-] [2-6] [5-] $6 bsa $7 bsb Sign etent X X 3 $4 $5 2 Shift left 2 Src Eqal Control op RegDst Branch r. ot emory in emrd emwr emtoreg

Single emory is a Strctral Hazard Time (clock cycles) I n s t r. O r d e r Load Instr Instr 2 Instr 3 Instr 4 em Reg em Reg em Reg em Reg em em Reg em Reg Reg em Reg em Reg em Reg Detection is easy in this case! (right half highlight means read, left half write)

Soltion 3: Stall Time (clock cycles) I n s t r. O r d e r Load Instr Instr 2 nop Instr 3 Instr 4 em Reg em Reg em Reg em Reg em Reg em Reg bbble bbble bbble bbble em bbble bbble bbble bbble bbble bbble Reg em Reg em Reg em Reg

Control Hazard Soltions Predict: gess one direction then back p if wrong Predict not taken I n s t r. O r d e r Beq Load Time (clock cycles) em Reg em Reg em Reg em Reg em Reg em Reg pact: clock cycle per branch instrction if right, 2 if wrong (right - 5% of time) ore dynamic scheme: history of branch (- 9%)

Hazard on r Problem: r cannot be read by other instrctions before it is written by the add. add r, r2, r3 sb r4, r, r3 and r6, r, r7 or r8, r, r9 or r, r, r

Hazard on r: Dependencies backwards in time are hazards I n s t r. O r d e r Time (clock cycles) IF ID/RF EX E WB add r,r2,r3 sb r4,r,r3 and r6,r,r7 or r8,r,r9 or r,r,r

Hazard Soltion: Forward reslt from one stage to another I n s t r. O r d e r Time (clock cycles) IF ID/RF EX E WB add r,r2,r3 sb r4,r,r3 and r6,r,r7 or r8,r,r9 or r,r,r or instrction is OK if define read/write properly

Forwarding (or Bypassing): What abot Loads Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF EX E WB lw r,(r2) sb r4,r,r3 Can t solve with forwarding: st delay/stall instrction dependent on loads

Hazards Previos eample shows s how independent instrctions that do not se the reslts calclated by prior instrctions are eected. This is not the case with real programs. Let s look at the following code seqence. sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2) The last for instrctions are all dependent on the register $2 of the first instrction. Assme that register $2 had the vale of before the sbtract instrction and -2 afterwards.

Hazards (Cont ) CC CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 The vale of $2: /-2-2 -2-2 -2 sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2)

Hazard Detection & Forwarding It is possible to detect data hazard and then forward the proper vale to resolve the hazard. When an instrction tries to read a register in its EX stage that an earlier instrction intends to write in its WB stage. This is the case between sb-and instrction below: sb $2, $, $3 and $2, $2, $5 This hazard can be detected by simply checking: EX/E.RegisterRd = ID/EX.RegisterRs = $2

Hazard Detection & Forwarding (Cont ) Another hazard is between sb-or instrctions: sb $2, $, $3 and $2, $2, $5 or $3, $6, $2 This hazard can be detected by simply checking: E/WB.RegisterRd = ID/EX.RegisterRt = $2 There is no data hazard between sb-add and sb-sw instrctions.

Smmary of Hazard Conditions a. EX/E.RegisterRd = ID/EX.RegisterRs b. EX/E.RegisterRd = ID/EX.RegisterRt 2a. E/WB.RegisterRd = ID/EX.RegisterRs 2b. E/WB.RegisterRd = ID/EX.RegisterRt This actally refers to destination field of an instrction. It is rd field in R-type instrctions and rt field in I-type instrctions. in the EX stage chooses the correct one, therefore, EX/E and E/WB pipeline registers store this information as a rd field (EX/E.RegisterRd and E/WB.RegisterRd). Since some of the instrctions (i.e. sw, beq) do not write to register file, the above policy is inaccrate. Consider the following code seqence: sw $, ($5) add $3, $, $2 EX/E.RegisterRd = ID/EX.RegisterRs $ = +$5 This problem can be solved simply by checking RegWr signal.

Smmary of Hazard Conditions (Cont ) Another problem: What happens if $ is sed as a destination register? A non-zero vale wold not be forwarded. Therefore, hazard detection shold be the following: EX hazard: if (EX/E.RegWr and (EX/E.RegisterRd = ) and (EX/E.RegisterRd = ID/EX.RegisterRs)) ForwardA= if (EX/E.RegWr and (EX/E.RegisterRd = ) and (EX/E.RegisterRd = ID/EX.RegisterRt)) ForwardB= E hazard: if (E/WB.RegWr and (E/WB.RegisterRd = ) and (E/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA= if (E/WB.RegWr and (E/WB.RegisterRd = ) and (E/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB=

Smmary of Hazard Conditions (Cont ) Consider the following seqence: add $, $, $2 add $, $, $3 add $, $, $4... In this case, the reslt is forwarded from E stage becase the reslt in the E stage is the more recent reslt than the reslt in WB stage. Ths, the control for the E hazard: E hazard: if (E/WB.RegWr and (E/WB.RegisterRd = ) and (EX/E.RegisterRd = ID/EX.RegisterRs) and (E/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA= if (E/WB.RegWr and (E/WB.RegisterRd = ) and (EX/E.RegisterRd = ID/EX.RegisterRt) and (E/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB=

Smmary of Hazard Conditions (Cont )

The Pipelined path with Forwarding ID/ EX WB EX / E Control WB E / WB IF/ ID EX WB PC emory Registers IF/ID.RegisterRs Rs emory IF/ID.RegisterRt Rt IF/ID.RegisterRt IF/ID.RegisterRd Rt Rd EX/E.RegisterRd Forwarding nit E/WB.RegisterRd

Signed mediate Enhancement

Hazards and Stalls Program eection order (in instrctions) Time (in clock cycles) CC CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 lw $2, 2($) Forwarding does not solve the problem. We need a hazard detection nit. and $4, $2, $5 and instrction needs to be stalled one cycle. or $8, $2, $6 add $9, $4, $2 slt $, $6, $7

Hazard Detection The control for the hazard detection nit is: if (ID/EX.emRd and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) stall the pipeline Checks if the instrction is a load

Hazards and Stalls (Cont ) Program eection order (in instrctions) Time (in clock cycles) CC CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC lw $2, 2($) and $4, $2, $5 Reg Reg Dm Reg or $8, $2, $6 add $9, $4, $2 Bbble slt $, $6, $7 and and or instrctions repeat in CC4 what they did in CC3

The Pipelined path with Forwarding and Hazard Detection Unit IF/IDWrite Hazard detection nit ID/EX.emRead ID/ EX WB EX / E WB E / WB PCWrite IF/ ID Control EX WB PC emory Registers IF/ID.RegisterRs Rs emory IF/ID.RegisterRt Rt IF/ID.RegisterRt IF/ID.RegisterRd Rt Rd EX/E.RegisterRd IF/ID.RegisterRt Forwarding nit E/WB.RegisterRd

Branch Hazards Program eection order (in instrctions) r. 4 beq $, $3, 7 Time (in clock cycles) CC CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 44 and $2, $2, $5 48 or $3, $6, $2 52 add $4, $2, $2 72 lw $4, (5)$7 Actally, the nmber of instrctions needs to be flshed can be redced from 3 to instrction (shown in the following slide) when the direction of branch is mispredicted.

path for Branch (inclding HW to flsh the pipeline) 4 + IF/ ID Hazard detection nit Sign_e & shift left 2 Control + ID/ EX WB EX EX / E WB E / WB WB PC emory Registers = IF/ID.RegisterRs Rs emory IF/ID.RegisterRt Rt IF.Flsh IF/ID.RegisterRt IF/ID.RegisterRd Rt Rd EX/E.RegisterRd IF/ID.RegisterRt Forwarding nit E/WB.RegisterRd

Qestion

Qestion 2

Answer 2

Qestion 3

Answer 3