ECE Exam II - Solutions November 8 th, 2017

Similar documents
ECE Exam II - Solutions October 30 th, :35 pm 5:55pm

Chapter 4 The Processor 1. Chapter 4B. The Processor

Processor (II) - pipelining. Hwansoo Han

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

ECE260: Fundamentals of Computer Engineering

Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Question 1: (20 points) For this question, refer to the following pipeline architecture.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

LECTURE 9. Pipeline Hazards

COMPUTER ORGANIZATION AND DESIGN

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Full Datapath. Chapter 4 The Processor 2

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CSEE 3827: Fundamentals of Computer Systems

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

COMPUTER ORGANIZATION AND DESIGN

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

CS 251, Winter 2018, Assignment % of course mark

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Computer Architecture CS372 Exam 3

ECE Sample Final Examination

CS/CoE 1541 Mid Term Exam (Fall 2018).

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

高雄大學資訊工程系計算機組織期末考. and (MEM/WB.RegRd=ID/EX.RegRt))

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Thomas Polzer Institut für Technische Informatik

Chapter 4. The Processor

Outline Marquette University

Chapter 4. The Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4. The Processor

Computer Organization and Structure

EIE/ENE 334 Microprocessors

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Final Exam Fall 2007

LECTURE 3: THE PROCESSOR

CS 251, Winter 2019, Assignment % of course mark

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Pipelining. CSC Friday, November 6, 2015

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

Chapter 4. The Processor

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

ECE154A Introduction to Computer Architecture. Homework 4 solution

Final Exam Spring 2017

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Final Exam Fall 2008

CENG 3420 Lecture 06: Pipeline

Processor Design Pipelined Processor (II) Hung-Wei Tseng

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Chapter 4. The Processor. Jiang Jiang

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

5008: Computer Architecture HW#2

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

ELE 655 Microprocessor System Design

14:332:331 Pipelined Datapath

DEE 1053 Computer Organization Lecture 6: Pipelining

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CIS 662: Midterm. 16 cycles, 6 stalls

CS/CoE 1541 Exam 1 (Spring 2019).

COSC 6385 Computer Architecture - Pipelining

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Chapter 4. The Processor

Chapter 6 Exercises with solutions

COMPUTER ORGANIZATION AND DESI

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

ECE/CS 552: Pipeline Hazards

CS232 Final Exam May 5, 2001

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Pipeline Data Hazards. Dealing With Data Hazards

ECE260: Fundamentals of Computer Engineering

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

ECE331: Hardware Organization and Design

ECE 2300 Digital Logic & Computer Organization. Caches

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Chapter 4 The Processor 1. Chapter 4A. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

COSC121: Computer Systems. ISA and Performance

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Processor Design Pipelined Processor. Hung-Wei Tseng

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

Lecture 9 Pipeline and Cache

CS3350B Computer Architecture Quiz 3 March 15, 2018

CMSC411 Fall 2013 Midterm 1

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

cs470 - Computer Architecture 1 Spring 2002 Final Exam open books, open notes

Transcription:

ECE 3056 Exam II - Solutions November 8 th, 2017

1. (15 pts) To the base pipeline we add data forwarding to EX, data hazard detection and stall generation, and branches implemented in MEM and predicted not taken with flushing. For a load-to-use hazard a check will generate a stall = 1 signal when this hazard is detected. a. (5 pts) Write the Boolean expression to generate this stall signal. Use the signal notation from the figures, defining any other variables if you feel you need them. If (ID/EX.MemRead = 1) & ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) Stall =1 An equivalent solution is possible by checking between EX and MEM and inserting the stall in the MEM stage b. (15 pts) How should the data path be modified to use this stall signal to ensure correct implementation of stalls for the load-to-use dependency? The following specific actions are necessary when performing the check between ID and EX. Stall the IF/ID register and the PC. This can be done by performing the logical AND of the stall signal and the clock at the PC and IF/ID register The control signals generated in ID should be zero to create a stall cycle in EX at the next clock. If the check is being performed between EX and MEM equivalent actions have to performed including stalling the EX stage.

2. (20 pts) To the base pipeline we add data forwarding to EX with data hazard detection (load-to-use) with stall generation. Now imagine that the implementation of the branch has been moved to ID and predicted not taken with flushing but there is no forwarding to ID. Consider the following code sequence at the stages indicated in cycle i. sub $8, $12, $11 -- WB add $4, $8, $9 add $4, $4, $7 beq $4, $6, loop -- MEM -- EX -- ID subi $2, $2, 4 -- IF This code will not execute correctly on this pipeline a) (5 pts) Show what the correct state of the pipeline should be at cycle i+1, i.e. what should be in each stage. IF ID EX MEM WB subi $2, $2, 4 beq $4, $6, loop <stall> add $4, $4, $7 add $4, $8, $9 b) (15 pts) Describe a general hardware solution to ensure correct execution when there are data hazards on branches. Clearly define i) the checks to be made, and ii) how they are implemented, i.e., what are the main functional blocks. It is probably easier to show additions to one of the figures. Check for hazards and insert the correct number of stalls. Hazards can occur when beq is in ID and rformat or lw instructions are in EX or MEM. Stalls ensure the producer instruction gets to WB since reads and writes can happen in the same cycle. Check dependencies between EX and ID and insert stall in EX if necessary: for both rformat and lw dependencies. Checks compare source and destination registers and check for instruction types, i.e., beq in ID and rformat or lw in EX. If ((ID/EX.ALUOp = 10) & (IF/ID.Opcode = beq)) & ((ID/EX.RegisterRd = IF/ID.RegisterRs) or (ID/EX.RegisterRd = IF/ID.RegisterRt)) Stall = 1; If (ID/EX.MemRead = 1) & ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) Stall =1; (The question notes that this check is already present).

Similar checks are performed between MEM and ID and may insert a stall in EX. Both EX and MEM checks could be concurrently positive - generate the logical OR of the outcome of the checks in each stage. There are also alternative combinations of variables for these checks but they look very similar. These checks will insert 1 or 2 stalls. Alternatively, one could add forwarding to ID from MEM and add checks between EX and ID. However, you still need a check between MEM and ID for lw.

3. (15 pts) To the base pipeline we add data forwarding to EX, data hazard detection with stall generation, and branches are implemented in MEM and predicted not taken with flushing. Consider the following code sequence in the pipeline at cycle 10. In the attached forwarding figure, ForwardA is the top mux and ForwardB is the bottom mux in the collateral. The inputs are numbered 00, 01, 10 from top to bottom. add $1, $2, $3 add $1, $1, $4 add $1, $1, $5 add $1, $1, $6 add $1, $1, $7 -- WB -- MEM -- EX -- ID -- IF a) (5 pts) What should be the value of ForwardA and ForwardB during cycles 10 and 11? ForwardA@ cycle 10 10 ForwardB@ cycle 10 00 ForwardA@ cycle 11 _10 ForwardB@ cycle 11 00 The MEM stage will always have the latest value for $1. Hence rs will be forwarded from MEM. The mux value from the figure is 10. The rt value is unique, has not been updated, and therefore will be whatever was read form the register file hence the mux value (from the figure) will be 00. b) (10 pts) Now consider that we pipeline some stages in the interest of increasing the clock rate. For each approach separately, what are the number of clock cycles for the branch and load-to-use penalties? IF ID EX MEM WB Branch Penalty Load-to-Use Penalty i. MEM is pipelined to 3 stages 3 3_ ii. IF is pipelined to two stages 4 1_ iii. EX is pipelined to two stages 4 2 Pipelining MEM to 3 stages will not add to the penalty since the since a branch instruction will not need the result from MEM. The and gate will operate on the equality test result from EX in

the first cycle of MEM. However, the load-to-use penalty will need the result from MEM and hence will increase by 2 cycles. When IF is pipelined to 2 stages the branch penalty will increase by 1 cycle but the load-to-use does not increase since this test is between ID and EX. If EX is pipelined to 2 stages the branch penalty will increase by 1 cycle since the branch test has to complete and all a preceding stages flushed when taken. The load-to use will increase by 1 cycle. The hazard test can be performed in the first cycle of EX. However, there has to be a 2 cycle gap between the lw and the dependent instruction. When lw is in WB, the dependent instruction should be in the first cycle of EX.

4. (20 pts) To the base pipeline we add data forwarding to EX, data hazard detection with stall generation, and branches are implemented in MEM and predicted not taken with flushing. Consider the execution of the code below. First instruction is at 0x00400000. 1 add $7, $0, $0 2 addi $5, $0, 72 3 addi $3, $0, x1028 4 loop: lw $4, 0($3) 5 lw $9, -8($3) 6 lw $6, -4($3) 7 add $7, $6, $7 8 add $7, $9, $7 9 add $7, $4, $7 10 addi $3, $3, -12 11 addi $5, $5, -12 12 bne $5, $0, loop a. (10 pts) Identify all data and control hazards, i.e., I à J where I and J are instruction numbers Data hazard from 6à7 Control hazard on the bne The question is asking for hazards not dependencies. b. (10 pts) Fill in the values below for cycle 9 (start counting at 0!). Use the notation in the base pipeline in the collateral. Pipeline Signal Instruction in the stage Value IF/ID.PC4 add $7, $9, $7 0x00400020 ID/EX.ALUSrc add $7, $6, $7 0 ID/EX.RegDst add $7, $6, $7 1 EX/MEM.MEM.Address <stall> 0x1024 MEM/WB.MemToReg lw $6, -4($3) 1

5. (20 pts) Consider a 32 Kbyte direct mapped cache with 64 byte lines operating with 32-bit addresses. i. (5 pts) How many lines are there in the cache (2 15 / 6 ) = 2 9 = 512 lines ii. (5 pts) Show the breakdown of the address bits into fields used to address the cache. 17 99 6 iii. (10 pts) Consider a read to address 0x004000F8 that misses in the cache and is processed to bring the corresponding line into the cache. For each of the following addresses, if it is the next reference to the cache, identify whether it will be a reference to a word in the same line (Y/N). i. 0x00401000 N ii. 0x00400020 N iii. 0x004000C4 Y

6. (10 pts) To the base pipeline we add data forwarding to EX, branches are implemented in MEM, and support for jumps are implemented in ID. There is no hardware support for hazard detection or associated flushing/stall generation. Compiler support maintains correctness via the insertion of nops for any possible data or control hazards. Executing programs result in the following statistics. 40% of branches are taken and 20% of all load operations produce a hazard. The compiler can reorganize the code to fill 60% of branch delay slots, 30% of the jump delay slots, and 50% of load delay slots. What is the improvement in execution time achieved by such instruction scheduling? Assume a base CPI of 1.0. You may leave answers in expression form. Instruction Frequency Loads 22% Stores 13% ALU 42% Operations Branches 18% Jumps 5% EX 1 = #I * CPI 1 * clock_cycle CPI 1 = CPI base + 0.22 * 0.2 * 1 + 0.18 * 3 + 0.05 * 1 = 1.0 + 0.634 = 1.634 Note that the branch taken probability does not matter. There is no hazard detection so every branch will incur a 3-cycle penalty. The load-to-use and jump penalties are 1 cycle. CPI 2 = CPI base + 0.22 * 0.2 * 0.5 * 1 + 0.18 * 0.4 *3 + 0.05 * 0.7 * 1 = 1.0 + 0.273 = 1.273 Without flushing every branch will incur a penalty, but this penalty has been reduced by 60% with instruction scheduling. Similarly the load-to-use and jump-penalties have been reduced. EX 2 = #I * CPI 2 * clock_cycle Improvement in execution time = EX 1 - EX 2

Forwarding Paths