Solutions for Chapter 6 Exercises

Similar documents
TDT4255 Friday the 21st of October. Real world examples of pipelining? How does pipelining influence instruction

CS 251, Winter 2018, Assignment % of course mark

PS Midterm 2. Pipelining

Pipelining. Chapter 4

Quiz #1 EEC 483, Spring 2019

Comp 303 Computer Architecture A Pipelined Datapath Control. Lecture 13

CS 251, Winter 2019, Assignment % of course mark

What do we have so far? Multi-Cycle Datapath

Review: Computer Organization

Enhanced Performance with Pipelining

1048: Computer Organization

Chapter 6 Enhancing Performance with. Pipelining. Pipelining. Pipelined vs. Single-Cycle Instruction Execution: the Plan. Pipelining: Keep in Mind

The single-cycle design from last time

The extra single-cycle adders

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Overview of Pipelining

EEC 483 Computer Organization

The final datapath. M u x. Add. 4 Add. Shift left 2. PCSrc. RegWrite. MemToR. MemWrite. Read data 1 I [25-21] Instruction. Read. register 1 Read.

Exceptions and interrupts

Chapter 6: Pipelining

CS 251, Spring 2018, Assignment 3.0 3% of course mark

PIPELINING. Pipelining: Natural Phenomenon. Pipelining. Pipelining Lessons

Instruction fetch. MemRead. IRWrite ALUSrcB = 01. ALUOp = 00. PCWrite. PCSource = 00. ALUSrcB = 00. R-type completion

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language 345.e1

PART I: Adding Instructions to the Datapath. (2 nd Edition):

Computer Architecture. Lecture 6: Pipelining

Chapter 6: Pipelining

EXAMINATIONS 2010 END OF YEAR NWEN 242 COMPUTER ORGANIZATION

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

CS 251, Winter 2018, Assignment % of course mark

EEC 483 Computer Organization

Review. A single-cycle MIPS processor

Instruction Pipelining is the use of pipelining to allow more than one instruction to be in some stage of execution at the same time.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

EXAMINATIONS 2003 END-YEAR COMP 203. Computer Organisation

Computer Architecture

EEC 483 Computer Organization. Branch (Control) Hazards

Computer Architecture

EECS 322 Computer Architecture Improving Memory Access: the Cache

Prof. Kozyrakis. 1. (10 points) Consider the following fragment of Java code:

Review Multicycle: What is Happening. Controlling The Multicycle Design

The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

cs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

LECTURE 9. Pipeline Hazards

CS/CoE 1541 Exam 1 (Spring 2019).

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

1048: Computer Organization

Lecture 7. Building A Simple Processor

Chapter 4. The Processor

Processor (II) - pipelining. Hwansoo Han

COMP2611: Computer Organization. The Pipelined Processor

Winter 2013 MIDTERM TEST #2 Wednesday, March 20 7:00pm to 8:15pm. Please do not write your U of C ID number on this cover page.

4.13. An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

COMPUTER ORGANIZATION AND DESIGN

1048: Computer Organization

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Computer Organization and Structure

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pipelining: Basic Concepts

Full Datapath. Chapter 4 The Processor 2

Chapter 4 The Processor 1. Chapter 4B. The Processor

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

ECEC 355: Pipelining

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture 9: Microcontrolled Multi-Cycle Implementations

COMPUTER ORGANIZATION AND DESIGN

Lecture 10: Pipelined Implementations

Lecture 6: Microprogrammed Multi Cycle Implementation. James C. Hoe Department of ECE Carnegie Mellon University

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CSEE 3827: Fundamentals of Computer Systems

Lab 8 (All Sections) Prelab: ALU and ALU Control

Assignment 1 solutions

Question 1: (20 points) For this question, refer to the following pipeline architecture.

ECE232: Hardware Organization and Design

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

LECTURE 3: THE PROCESSOR

CS232 Final Exam May 5, 2001

CS/CoE 1541 Mid Term Exam (Fall 2018).

CSE Introduction to Computer Architecture Chapter 5 The Processor: Datapath & Control

Full Datapath. Chapter 4 The Processor 2

Hardware Design Tips. Outline

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W12-M

Static, multiple-issue (superscaler) pipelines

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

DEE 1053 Computer Organization Lecture 6: Pipelining

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

1 Hazards COMP2611 Fall 2015 Pipelined Processor

CSEN 601: Computer System Architecture Summer 2014

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

ECE260: Fundamentals of Computer Engineering

Transcription:

Soltions for Chapter 6 Eercises Soltions for Chapter 6 Eercises 6. 6.2 a. Shortening the ALU operation will not affect the speedp obtained from pipelining. It wold not affect the clock cycle. b. If the ALU operation takes 25% more time, it becomes the bottleneck in the pipeline. The clock cycle needs to be 25 ps. The speedp wold be 2% less. a. It takes ps * 6 instrctions = microseconds to eecte on a nonpipelined processor (ignoring start and end transients in the pipeline). b. A perfect 2-stage pipeline wold speed p the eection by 2 times. c. Pipeline overhead impacts both latency and throghpt. 6.3 See the following figre: Program eection order Time (in instrctions) add $3, $4, $6 2 4 6 8 2 4 6 IF ID EX E WB sb $5, $3, $2 IF ID EX E WB lw $7, ($5) IF ID EX E WB bbble bbble bbble bbble bbble add $8, $7, $2 IF ID EX E WB 6.4 There is a dependency throgh $3 between the first instrction and each sbseqent instrction. There is a dependency throgh $6 between the lw instrction and the last instrction. For a five-stage pipeline as shown in Figre 6.7, the dependencies between the first instrction and each sbseqent instrction can be resolved by sing forwarding. The dependency between the load and the last add instrction cannot be resolved by sing forwarding.

2 Soltions for Chapter 6 Eercises 6.6 Any part of the following figre not marked as active is inactive. PCSrc IF/ID ID/EX EX/E E/WB 4 Reg Shift left 2 reslt Branch PC ress register register 2 Registers register 2 ALUSrc Zero Zero ALU ALU reslt ress em Data emtoreg [5] 6 Sign 32 etend 6 ALU control em [26] [5] ALUOp RegDst Stage

Soltions for Chapter 6 Eercises 3 PCSrc IF/ID ID/EX EX/E E/WB 4 Reg Shift left 2 reslt Branch PC ress register register 2 Registers register 2 ALUSrc Zero Zero ALU ALU reslt ress em Data emtoreg [5] 6 Sign 32 etend 6 ALU control em [26 ] [5 ] ALUOp RegDst Stage 2

4 Soltions for Chapter 6 Eercises PCSrc IF/ID ID/EX EX/E E/WB 4 Reg Shift left 2 reslt Branch PC ress register register 2 Registers register 2 ALUSrc Zero Zero ALU ALU reslt ress em Data emtoreg [5] 6 Sign 32 etend 6 ALU control em [26 ] [5 ] ALUOp RegDst Stage 3

Soltions for Chapter 6 Eercises 5 PCSrc IF/ID ID/EX EX/E E/WB 4 Reg Shift left 2 reslt Branch PC ress register register 2 Registers register 2 ALUSrc Zero Zero ALU ALU reslt em ress Data emtoreg [5] 6 Sign 32 etend 6 ALU control em [26 ] [5 ] ALUOp RegDst Stage 4 Since this is an sw instrction, there is no work done in the WB stage. 6.2 No soltion provided. 6.3 No soltion provided. 6.4 No soltion provided.

6 Soltions for Chapter 6 Eercises 6.7 At the end of the first cycle, instrction is fetched. At the end of the second cycle, instrction reads registers. At the end of the third cycle, instrction 2 reads registers. At the end of the forth cycle, instrction 3 reads registers. At the end of the fifth cycle, instrction 4 reads registers, and instrction writes registers. Therefore, at the end of the fifth cycle of eection, registers $6 and $ are being read and register $2 will be written. 6.8 The forwarding nit is seeing if it needs to forward. It is looking at the instrctions in the forth and fifth stages and checking to see whether they intend to write to the register file and whether the register written is being sed as an ALU inpt. Ths, it is comparing 3 = 4? 3 = 2? 7 = 4? 7 = 2? 6.9 The hazard detection nit is checking to see whether the instrction in the ALU stage is an lw instrction and whether the instrction in the ID stage is reading the register that the lw will be writing. If it is, it needs to stall. If there is an lw instrction, it checks to see whether the destination is register 6 or (the registers being read). 6.2 a. There will be a bbble of cycle between a lw and the dependent add since the load vale is available after the E stage. There is no bbble between an add and the dependent lw since the add reslt is available after the EX stage and it can be forwarded to the EX stage for the dependent lw. Therefore, CPI = cycle/instrction.5. b. Withot forwarding, the vale being written into a register can only be read in the same cycle. As a reslt, there will be a bbble of 2 cycles between an lw and the dependent add since the load vale is written to the register after the E stage. Similarly, there will be a bbble of 2 cycles between an add and the dependent lw. Therefore, CPI 3.

Soltions for Chapter 6 Eercises 7 6.22 It will take 8 cycles to eecte this code, inclding a bbble of cycle de to the dependency between the lw and sb instrctions. Time (in clock cycles) Program CC CC 2 CC 3 CC 4 CC 5 CC 6 eection order (in instrctions) lw $4, ($2) I Reg D Reg CC 7 sb $6, $4, $3 I Reg D Reg add $2, $3, $5 I Reg D Reg Program eection order (in instrctions) Time (in clock cycles) CC CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 lw $4, ($2) I Reg D Reg sb $6, $4, $3 I Reg Reg D Reg add $2, $3, $5 I I Reg D Reg bbble

8 Soltions for Chapter 6 Eercises 6.23 Inpt Nmber of bits Usage ID/EX.RegisterRs 5 operand reg nmber, compare to see if match ID/EX.RegisterRt 5 operand reg nmber, compare to see if match EX/E.RegisterRd 5 destination reg nmber, compare to see if match EX/E.Reg TRUE if writes to the destination reg E/WB.RegisterRd 5 destination reg nmber, compare to see if match E/WB.Reg TRUE if writes to the destination reg Otpt Nmber of bits Usage ForwardA 2 forwarding signal ForwardB 2 forwarding signal 6.29 No soltion provided. 6.3 The performance for the single-cycle design will not change since the clock cycle remains the same. For the mlticycle design, the nmber of cycles for each instrction class becomes the following: loads: 7, stores: 6, ALU instrctions: 5, branches: 4, jmps: 4. CPI =.25 * 7 +. * 6 +.52 * 5 +. * 4 +.2 * 4 = 5.47. The cycle time for the mlticycle design is now ps. The average instrction becomes 5.47 * = 547 ps. Now the mlticycle design performs better than the single-cycle design. 6.33 See the following figre. IF IF2 ID EX E E2 WB when defined by lw sed in i => 2-cycle stall sed in i2 => -cycle stall sed in i3 => forward when defined by R-type sed in i => forward sed in i2 => forward sed in i3 => forward

Soltions for Chapter 6 Eercises 9 6.34 Branches take cycle when predicted correctly, 3 cycles when not (inclding one more access cycle). So the average clock cycle per branch is.75 * +.25 * 3 =.5. For loads, if the instrction immediately following it is dependent on the load, the load takes 3 cycles. If the net instrction is not dependent on the load bt the second following instrction is dependent on the load, the load takes two cycles. If neither two following instrctions are dependent on the load, the load takes one cycle. The probability that the net instrction is dependent on the load is.5. The probability that the net instrction is not dependent on the load, bt the second following instrction is dependent, is.5 *.25 =.25. The probability that neither of the two following instrctions is dependent on the load is.375. Ths the effective CPI for loads is.5 * 3 +.25 * 2 +.375 * = 2.25. Using the date from the eample on page 425, the average CPI is.25 * 2.25 +. * +.52 * +. *.5 +.2 * 3 =.47. Average instrction time is.47 * ps = 47 ps. The relative performance of the restrctred pipeline to the single-cycle design is 6/47 = 4.8. 6.35 The opportnity for both forwarding and hazards that cannot be resolved by forwarding eists when a branch is dependent on one or more reslts that are still in the pipeline. Following is an eample: lw $, $2() add $, $, beq $, $2, 7 6.36 Prediction accracy = % * PredictRight/TotalBranches a. Branch : prediction: T-T-T, right: 3, wrong: Branch 2: prediction: T-T-T-T, right:, wrong: 4 Branch 3: prediction: T-T-T-T-T-T, right: 3, wrong: 3 Branch 4: prediction: T-T-T-T-T, right: 4, wrong: Branch 5: prediction: T-T-T-T-T-T-T, right: 5, wrong: 2 Total: right: 5, wrong: Accracy = % * 5/25 = 6%

Soltions for Chapter 6 Eercises b. Branch : prediction: N-N-N, right:, wrong: 3 Branch 2: prediction: N-N-N-N, right: 4, wrong: Branch 3: prediction: N-N-N-N-N-N, right: 3, wrong: 3 Branch 4: prediction: N-N-N-N-N, right:, wrong: 4 Branch 5: prediction: N-N-N-N-N-N-N, right: 2, wrong: 5 Total: right:, wrong: 5 Accracy = % * /25 = 4% c. Branch : prediction: T-T-T, right: 3, wrong: Branch 2: prediction: T-N-N-N, right: 3, wrong: Branch 3: prediction: T-T-N-T-N-T, right:, wrong: 5 Branch 4: prediction: T-T-T-T-N, right: 3, wrong: 2 Branch 5: prediction: T-T-T-N-T-T-N, right: 3, wrong: 4 Total: right: 3, wrong: 2 Accracy = % * 3/25 = 52% d. Branch : prediction: T-T-T, right: 3, wrong: Branch 2: prediction: T-N-N-N, right: 3, wrong: Branch 3: prediction: T-T-T-T-T-T, right: 3, wrong: 3 Branch 4: prediction: T-T-T-T-T, right: 4, wrong: Branch 5: prediction: T-T-T-T-T-T-T, right: 5, wrong: 2 Total: right: 8, wrong: 7 Accracy = % * 8/25 = 72% 6.37 No soltion provided. 6.38 No soltion provided. 6.39 Rearrange the instrction seqence sch that the instrction reading a vale prodced by a load instrction is right after the load. In this way, there will be a stall after the load since the load vale is not available till after its E stage. lw $2, ($6) add $4, $2, $3 lw $3, 2($7) add $6, $3, $5 sb $8, $4, $6 lw $7, 3($8) beq $7, $8, Loop

Soltions for Chapter 6 Eercises 6.4 Yes. When it is determined that the branch is taken (in WB), the pipeline will be flshed. At the same time, the lw instrction will stall the pipeline since the load vale is not available for add. Both flsh and stall will zero the control signals. The flsh shold take priority since the lw stall shold not have occrred. They are on the wrong path. One soltion is to add the flsh pipeline signal to the Hazard Detection Unit. If the pipeline needs to be flshed, no stall will take place. 6.4 The store instrction can read the vale from the register if it is prodced at least 3 cycles earlier. Therefore, we only need to consider forwarding the reslts prodced by the two instrctions right before the store. When the store is in EX stage, the instrction 2 cycles ahead is in WB stage. The instrction can be either a lw or an ALU instrction. assign EXErt = EXEIR[2:6]; assign bypassvfromwb = (IDEXop == SW) & (IDEXrt!= ) & ( ((EWBop == LW) & (IDEXrt == EWBrt)) ((EWBop ==ALUop) & (IDEXrt == EWBrd)) ); This signal controls the store vale that goes into EX/E register. The vale prodced by the instrction cycle ahead of the store can be bypassed from the E/WB register. Thogh the vale from an ALU instrction is available cycle earlier, we need to wait for the load instrction anyway. assign bypassvfromwb2 = (EXEop == SW) & (EXErt!= ) & (!bypassvfromwb) & ( ((EWBop == LW) & (EXErt == EWBrt)) ((EWBop == ALUop) & (EXErt == EWBrd)) ); This signal controls the store vale that goes into the and E/WB register. 6.42 assign bypassafrome = (IDEXrs!= ) & ( ((EXEop ==LW) & (IDEXrs == EXErt)) ((EXEop == ALUop) & (IDEXrs == EXErd)) ); assign bypassafromwb = (IDEXrs!= ) & (!bypassafrome) & ( ((EWBop == LW) & (IDEXrs == EBrt)) ((EWBop == ALUop) & (IDEXrs == EBrd)) ):

2 Soltions for Chapter 6 Eercises 6.43 The branch cannot be resolved in ID stage if one branch operand is being calclated in EX stage (assme there is no dmb branch having two identical operands; if so, it is a jmp), or to be loaded (in EX and E). assign branchstallinid = (IFIDop == BEQ) & ( ((IDEXop == ALUop) & ((IFIDrs == IDEXrd) (IFIDrt == IDEXrd)) ) // al in EX ((IDEXop == LW) & ((IFIDrs == IDEXrt) (IFIDrt == IDEXrt)) ) // lw in EX ((EXEop == LW) & ((IFIDrs == EXErt) (IFIDrt == EXErt)) ) ); // lw in E Therefore, we can forward the reslt from an ALU instrction in E stage, and an ALU or lw in WB stage. assign bypassida = (EXEop == ALUop) & (IFIDrs == EXErd); assign bypassidb = (EXEop == ALUop) & (IFIDrt == EXErd); Ths, the operands of the branch become the following: assign IDAin = bypassida? EXEALUot : Regs[IFIDrs]; assign IDBin = bypassidb? EXEALUot : Regs[IFIDrt]; And the branch otcome becomes: assign takebranch = (IFIDop == BEQ) & (IDAin == IDBin); 6.44 For a delayed branch, the instrction following the branch will always be eected. We only need to pdate the PC after fetching this instrction. if(~stall) begin IFIDIR <= Iemory[PC]; PC <= PC+4; end; if(takebranch) PC <= PC + {6{IFIDIR[5]} +4; end;

Soltions for Chapter 6 Eercises 3 6.45 modle PredictPC(crrentPC, netpc, miss, pdate, destination); inpt crrentpc, pdate, destination; otpt netpc, miss; integer inde, tag; //52 entries, direct-map reg[3:] brtargetbf[:5], brtargetbftag[:5]; inde = (crrentpc>>2) & 5; tag = crrentpc>>(2+9); if(pdate) begin //pdate the destination and tag brtargetbf[inde]=destination; brtargetbftag[inde]=tag; end; else if(tag==brtargetbftag[inde]) begin //a hit! netpc=brtargetbf[inde]; miss=false; end; else miss=true; endmodle; 6.46 No soltion provided. 6.47 Loop: lw $2, ($) lw $5, 4($) sb $4, $2, $3 sb $6, $5, $3 sw $4, ($) sw $6, 4($) addi $, $, 8 bne $, $3, Loop

4 Soltions for Chapter 6 Eercises 6.48 The code can be nrolled twice and reschedled. The leftover part of the code can be handled at the end. We will need to test at the beginning to see if it has reached the leftover part (other soltions are possible. Loop: addi $, $, 2 bgt $, $3, Leftover lw $2, 2($) lw $5, 8($) lw $7, 4($) sb $4, $2, $3 sb $6, $5, $3 sb $8, $7, $3 sw $4, 2($) sw $6, 8($) sw $8, 4($) bne $, $3, Loop jmp Finish Leftover: lw $2, 2($) sb $4, $2, $3 sw $4, 2($) addi $, $, 8 beq $, $3, Finish lw $5, 4($) sb $6, $5, $3 sw $6, 4($) Finish:... 6.49 al or branch lw/sw Loop: addi $2, $, lw $2, ($) lw $5, 4($) sb $4, $2, $3 lw $7, 8($) sb $6, $5, $3 lw $8, 2($) sb $, $7, $3 sw $4, ($) sb $2, $8, $3 sw $6, 4($) addi $, $, 6 sw $, 8($2) bne $, $3, Loop sw $2, 2($2) 6.5 The pipe stages added for wire delays do not prodce any sefl work. With imperfect pipelining de to pipeline overhead, the overhead associated with these stages redces throghpt. These etra stages increase the control logic compleity since more pipe stages need to be covered. When considering penalties for branches mispredictions, etc., adding more pipe stages increase penalties and eection latency. 6.5 No soltion provided.