CS 251, Winter 2018, Assignment % of course mark

Similar documents
CS 251, Winter 2019, Assignment % of course mark

Chapter 4. The Processor

Computer Architecture CS372 Exam 3

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

COMPUTER ORGANIZATION AND DESIGN

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE260: Fundamentals of Computer Engineering

Final Exam Fall 2007

Chapter 4 The Processor 1. Chapter 4B. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

ECS 154B Computer Architecture II Spring 2009

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Chapter 4 The Processor 1. Chapter 4A. The Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

LECTURE 3: THE PROCESSOR

Full Datapath. Chapter 4 The Processor 2

CSEE 3827: Fundamentals of Computer Systems

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Thomas Polzer Institut für Technische Informatik

Processor (II) - pipelining. Hwansoo Han

Full Datapath. Chapter 4 The Processor 2

Computer Organization and Structure

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

ECE331: Hardware Organization and Design

LECTURE 9. Pipeline Hazards

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

ELE 655 Microprocessor System Design

ECE 331 Hardware Organization and Design. UMass ECE Discussion 10 4/5/2018

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CS232 Final Exam May 5, 2001

Final Exam Fall 2008

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

ECE Exam II - Solutions November 8 th, 2017

Chapter 4. The Processor

Chapter 4. The Processor

Chapter 4. The Processor

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

ECE Exam II - Solutions October 30 th, :35 pm 5:55pm

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

Final Exam Spring 2017

ECE154A Introduction to Computer Architecture. Homework 4 solution

Chapter 4. The Processor

EIE/ENE 334 Microprocessors

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

Outline Marquette University

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

14:332:331 Pipelined Datapath

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

CS/CoE 1541 Mid Term Exam (Fall 2018).

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

Question 1: (20 points) For this question, refer to the following pipeline architecture.

DEE 1053 Computer Organization Lecture 6: Pipelining

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

LECTURE 10. Pipelining: Advanced ILP

ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

Pipelining. lecture 15. MIPS data path and control 3. Five stages of a MIPS (CPU) instruction. - factory assembly line (Henry Ford years ago)

ECE 2300 Digital Logic & Computer Organization. Caches

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CS232 Final Exam May 5, 2001

CS 2506 Computer Organization II Test 2

Chapter 4. The Processor. Jiang Jiang

ENCM 369 Winter 2018 Lab 9 for the Week of March 19

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

CENG 3420 Lecture 06: Pipeline

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

Lecture 9 Pipeline and Cache

ECE260: Fundamentals of Computer Engineering

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

CS 351 Exam 2 Mon. 11/2/2015

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS3350B Computer Architecture Winter 2015

CSE 378 Midterm 2/12/10 Sample Solution

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

CS/CoE 1541 Exam 1 (Spring 2019).

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

CS3350B Computer Architecture Quiz 3 March 15, 2018

Chapter 4. The Processor

ECEC 355: Pipelining

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

ECE Sample Final Examination

Transcription:

CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a pipelined datapath where branching is determined in the ID stage. You must consider branch data hazards that might exist between the branch instruction and an instruction immediately before the branch. A one clock cycle delay is needed if the instruction immediately before the branch is an R-format instruction or an addi/subi instruction and a data dependency exits. A two clock cycle delay is needed if the instruction immediately before the branch is a load word instruction and a load-use hazard exists. You may assume if a branch data hazard exists, the datapath will add in the necessary stalls (NOPs). (a) (5 pts) Assume the datapath implements data forwarding and load-use stalling but does not implement Branch Flushing. Indicate any instructions that have any hazard between itself and a prior instruction using (*) beside that instruction. Rearrange the code to remove the load-use hazards and branch data hazards if they exist. Fill the branch delay slot if possible. If code rearrangement cannot be used, you may use NOPs. * Original Rearranged Code 100 addi $1, $0, 50 104 add $5, $0, $0 108 lw $2, 100($4) 112 lw $3, 200($4) 116 sw $3, 300($4) 120 addi $4,$4, 4 124 add $5, $5, $2 128 addi $1, $1, -1 132 bne $1, $0, -7 136 add $8, $5, $0 1

(b) (5 points) This question is asking for calculations for the original sequence of instructions above running on a pipelined datapath where branch is determined in the ID stage. You should assume that Branch Flushing exists for instructions that are not needed following the branch and that the datapath implements a one clock cycle stall if a Branch Data hazard exists. You should also assume that data forwarding and load-use stalling exist in the datapath. (i) What is the total number of instructions that are flushed? (ii) State the total number of clock cycles required to run the original sequence of code including pipeline start-up time. 2

2. (6 points) Given the following execution times for individual components on the Pipelined datapath find the minimum time that can be assigned to the clock cycle length (i.e., in class we always used a 200ps clock cycle for the pipelined datapath). You may assume Branch is determined in the MEM Stage for this question. Memory accesses take 120ps Register File access is 70ps (read or write) ALU computations 150ps, Adders: 150ps Sign Extension 5ps, Shift Left by two: 5ps MUXes: 10ps, Writing to Intermediate Pipeline Registers (IF/ID etc.) Negligible. Reading data from any Pipeline registers is Negligible Control Unit decode of instruction opcode bits: 10ps ALU Control: 5ps Assume all other components are negligible and many operations occur in parallel. Complete the following table giving the minimum time needed for the stage to execute correctly. Be careful with the ID stage. Min Time IF ID EX MEM WB State the shortest clock cycle time we could allow on the Pipelined Datapath : 3

3. (6 points) Given a simple high level loop: for (register int k=1; k<11; k++) A[k] = A[k-1] + k; The following MIPS code implements the above high level code fragment. It is run on the pipelined datapath that performs branch in the MEM stage, has data forwarding and loaduse stalls, and implements branch flushing for unwanted instructions following the branch. 096 addi $1,$0,1 # k 100 addi $2,$0,0 # index into A 104 addi $3,$0,11 # end value of k 108 lw $4,200($2) 112 add $4,$4,$1 116 sw $4,204($2) 120 addi $2,$2,4 124 addi $1,$1,1 128 bne $1,$3,-6 132 slt $2, $4, $0 136 add $1, $1, $1 140 add $2, $1, $2 (a) (2 points) How many total clock cycles (including flushed instructions and stalls) does the above code need to execute ending after executing line 140? Be sure to include the time to start-up the pipeline. (b) (4 points) Rewrite this code using code rearrangment to solve any possible hazards. If hazards cannot be solved completely you may use NOPs. Note: Instructions that are not part of the loop should not be moved into the loop for any reason. Line number 096 100 104 108 112 116 120 124 128 132 136 140 144 148 152 Rearranged Code 4

4. (7 points) The datapath on the next page shows the hardware needed to execute branch in the ID stage. The zero bit ANDed with the Branch control bit is missing from this diagram; however you may assume it exists and all the necessary hardware to take a branch in ID exists in the datapath. As noted in question 01 of the assignment, when branching is determined in the ID stage, data hazards may now exist between the branch instruction and an instruction that immediately precedes it. In class we discussed data hazards between instructions in the EX stage and instructions in the MEM or WB stages. A copy of a condition to detect a data hazard between two instructions has been copied from the course notes and is given below: (if (MEM/WB.RegWrite) and (MEM/WB.RegisterRd!= 0) and (EX/MEM.RegisterRd!= ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) This condition detects a data hazard between an instruction in the WB stage and an instruction in the EX stage. a) (4 points) State the conditions necessary to detect a data hazard between a branch instruction in the ID stage and an R-format instruction in the EX stage. You need to only state the necessary conditions to detect a data hazard for the $rt register in the ID stage. There are no forwarding control bits that need to be set. b) (3 points) If a branch data hazard exist between a branch instruction in ID and an R-format instruction in EX, state how many stalls would be required between the two instructions. You may assume the necessary forwarding hardware was added to allow forwarding to the ID stage from the EX/MEM or MEM/WB pipeline registers. State which instruction would need to be stalled, which instruction would need to move forward and how would you implement the stall. 5

Pipelined datapath with Forwarding, Branch in ID stage. This is the WRONG datapath to use for question 2! 6

5. (15 points) Here is a series of address references given as 4-bit word addresses in both decimal and binary; we also list the relative time at which these references occur: Addr 0 1 2 3 4 0 8 0 4 0 8 5 4 2 0 Binary 0000 0001 0010 0011 0100 0000 1000 0000 0100 0000 1000 0101 0100 0010 0000 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Below are four different 8-word caches (similar to Figure 5.14 of the text). For each cache type, assuming the cache is initially empty, show the final contents of the cache, and in the table at the bottom, show how many cache hits and misses there are for each type of cache. Write your solution in the tables below, assuming the above word address are 4-bit binary numbers. You should write the binary form of the tag in the tables below, except for the fully associative cache, where you may write the decimal form of the tag. In the data column, write M[3] for data at memory address 3, M[8] for data at memory address 8. Assume a LRU replacement scheme. When inserting an element into the cache, if there are multiple empty slots for that index, you should put the new element in the left-most empty slot. Direct mapped Block Tag Data 0 1 2 3 4 5 6 7 Four-way set associative Two-way set associative Set Tag Data Tag Data 0 1 2 3 Set Tag Data Tag Data Tag Data Tag Data 0 1 Fully associative Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Write the number of cache hits and misses for each scheme in the table below: Direct Mapped Two-way s.a. Four-way s.a. Fully Associate Hits Misses 7

6. (6 points) Suppose we have a 16-word, 2-way set associative cache that is partially filled as indicated below (a missing tag indicates that the cache entry is invalid; a tag indicates that the cache entry is valid). Only the Tags are shown in the cache (i.e., we have omitted the data stored in the cache). index tag0 tag1 000 00 10 001 00 01 010 11 00 011 00 00 100 00 01 101 11 10 110 01 111 00 (a) (1 point) What is wrong with the cache entries in this cache? (b) (4 points) Assume the cache starts partially full as shown above. accesses, fill in the table with cache hits or misses. Given the following word Binary Addr Hit/miss 00 000 01 000 00 100 11 101 10 110 10 100 Miss (c) (1 points) We labeled the last cache access of the previous question as a Miss. After fetching this word from memory, we will need to replace one word in the cache. Assuming we have executed the sequence of memory accesses listed in the previous part of this question, which of tag0 and tag1 would you replace? Justify your answer. 8

7. (2 points) AMAT (Average memory access time) is the average (expected) time it takes for a memory access considering both hits and misses. It can be calculated using the formula: AMAT = hitt ime + missrate missp enalty. The miss rate is the percentage of memory accesses that are not in cache. The miss penalty is the additional time it takes for memory access to the next higher level in the hierarchy. Suppose you only have a level 1 cache, and that level 1 cache hits in 1 clock cycle with a miss rate of 4%. The cost to access main memory (the miss penalty) is 120 clock cycles. AMAT= 8. (3 points) CPI is a measure of clock cycles per instruction that is used to compare Instruction Set Architectures based on a particular instruction mix. Assume an instruction mix of 14% Load words, 10% Store words, 60% R-format, 10% Branch, 6% Jumps Given a Pipelined datapath where branch is determined in the MEM stage and the datapath implements data forwarding, load-use stalling and branch flushing when necessary. Assume half of all branch instructions cause flushing of unwanted instructions following the branch. A quarter of all load-words are followed by a use and generate a load-use hazard. The jump instruction is determined in the ID stage and all jump instructions will require flushing 1 instruction behind it. State the average CPI and be sure to show your work. CPI = 9

9. BONUS (5 points) Below is a diagram showing the Forwarding Unit in the pipelined datapath. The inputs and outputs of the forwarding unit are indicated in the diagram. You need to only consider a data hazard between the instruction in the EX stage of the pipeline and an instruction in the MEM stage. In the space below, implement the circuit that will partially implement the ForwardB signal to the multiplexor before the ALU. You only need to generate the signal to detect a hazard between the $rt source register of the instruction in the EX stage and an instruction in the MEM stage of the pipeline. You must show and correctly label all of the necessary inputs and outputs that you use and indicate with a slash the width of each input/output. You may use any of the gates that we discussed in class. (You may not use decoders, multiplexors or comparators). This question will be marked all or nothing, meaning it must be exactly correct for 5 bonus marks; otherwise you will receive zero. 10

The remaining questions will NOT be used to compute your assignment mark; they are included here as additional questions you may want to try to aid your understanding of the course material. Exercises from the textbook: 5.2.1, 5.2.2, 5.3, 5.7.1, 5.7.2, 5.7.3. 11