EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

Similar documents
Final Exam Fall 2007

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

5008: Computer Architecture HW#2

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Please state clearly any assumptions you make in solving the following problems.

CS146 Computer Architecture. Fall Midterm Exam

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

Updated Exercises by Diana Franklin

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

CS433 Homework 3 (Chapter 3)

EE 4683/5683: COMPUTER ARCHITECTURE

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

CS/CoE 1541 Mid Term Exam (Fall 2018).

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Pipeline Architecture RISC

Midterm I March 21 st, 2007 CS252 Graduate Computer Architecture

CS252 Graduate Computer Architecture Midterm 1 Solutions

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Four Steps of Speculative Tomasulo cycle 0

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

CS152 Computer Architecture and Engineering. Complex Pipelines

CSE 490/590 Computer Architecture Homework 2

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Computer Organization and Structure

Pipelining. CSC Friday, November 6, 2015

2. [3 marks] Show your work in the computation for the following questions involving CPI and performance.

Computer Architecture, EIT090 exam

Preventing Stalls: 1

Week 11: Assignment Solutions

The Processor: Instruction-Level Parallelism

Computer System Architecture Final Examination Spring 2002

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Advanced Instruction-Level Parallelism

CS420/520 Homework Assignment: Pipelining

Instr. execution impl. view

Processor (IV) - advanced ILP. Hwansoo Han

Instruction-Level Parallelism (ILP)

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

Question 1: (20 points) For this question, refer to the following pipeline architecture.

CS252 Graduate Computer Architecture

CS 251, Winter 2019, Assignment % of course mark

Alexandria University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

COSC 6385 Computer Architecture - Pipelining

Static, multiple-issue (superscaler) pipelines

Computer Architecture CS372 Exam 3

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Computer Architecture. Lecture 6.1: Fundamentals of

What is Pipelining? RISC remainder (our assumptions)

Chapter 7. Microarchitecture. Copyright 2013 Elsevier Inc. All rights reserved.

CS 341l Fall 2008 Test #2

Computer Architecture Practical 1 Pipelining

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

ISA Instruction Operation

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

Course on Advanced Computer Architectures

CS / ECE 6810 Midterm Exam - Oct 21st 2008

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Solutions to exercises on Instruction Level Parallelism

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

Copyright 2012, Elsevier Inc. All rights reserved.

CS146: Computer Architecture Spring 2004 Homework #2 Due March 10, 2003 (Wednesday) Evening

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Final Exam Fall 2008

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

ECE 313 Computer Organization FINAL EXAM December 13, 2000

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Computer Architecture I Midterm I (solutions)

Advanced Computer Architecture

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Chapter 4. The Processor

Good luck and have fun!

Structure of Computer Systems

Transcription:

EXAM #1 CS 2410 Graduate Computer Architecture Spring 2016, MW 11:00 AM 12:15 PM Directions: This exam is closed book. Put all materials under your desk, including cell phones, smart phones, smart watches, headphones, laptops, tablets, e-readers, etc. All questions are marked with their point value. There should be plenty of workspace provided in the exam booklet, but if you need extra pages, you may use blank pieces of paper. Show work: Be sure to show all work and turn in any extra pages that you use. If you do not show your work, you may not receive full or partial credit for a correct or wrong answer. Write legibly. If your handwriting cannot be read, then you will not receive credit for an answer. 5-stage MIPS pipeline: Recall that the MIPS 5-stage pipeline is Fetch (IF), Decode/Register Read (ID), Execute (EX), Memory (MEM) and Writeback (WB). You should assume for all questions, unless stated otherwise, the pipeline has full forwarding, branches are resolved in the Decode/Register Read stage (i.e., the branch comparison is done here). Be careful: When needed, the problems will state whether to assume a delayed branch or branch prediction. 1

1. 20 points each. write T or F to indicate whether the statement is true or false. A RISC instruction set usually has variable length instructions. A RISC instruction set usually has displacement and scaled addressing modes. An accumulator instruction set usually minimizes memory accesses. A register-memory instruction set architecture is typical of CISC. A typical CISC instruction set permits only two source operands. Branches usually use PC-relative addressing rather than absolute addressing. Instruction-level parallelism is a form of task-level parallelism. SIMD execution is a form of data-level parallelism. The number of anti- and output dependences limits the loop unroll factor. Unrolling a loop cannot harm program performance. 2. The CPU time equation is CPU time = IC + CPI + CC. For each term in the equation, explain what aspects of the computer architecture or the microarchitecture affect the term. 3 points. IC (instruction count): 4 points. CPI (clock cycles per instruction): 4 points. CC (clock cycle time): 2

3. 7 points. Let s consider a 4-stage pipeline that has the stages Fetch, Decode/Read, Execute, Writeback. Branches are resolved (i.e., the branch comparison is computed) in Decode/Read. Loads and stores compute their effective address in Execute and access memory in Writeback. Assume the instruction set is the same as MIPS. Where is forwarding needed in this pipeline? You may answer by giving a list of forwarding paths: indicate the producer and consumer stages where you need to forward from/to. Alternatively, you may draw the pipeline with all forwarding paths clearly shown. 4. 6 points. Let s again consider the same pipeline from the previous problem. Where are pipeline interlocks required? Recall that an interlock is a situation where a hazard must be handled by stalling the pipeline. In answering this question, list the pipeline stage involved in the interlock and briefly explain why an interlock is required. 3

5. Consider the classic 5-stage MIPS pipeline (Fetch, Decode, Execute, Memory, Writeback). Suppose branch prediction is used (there is no delay slot instruction); the branch target address is known in the Fetch stage and a branch can be resolved in Decode, where the registers are read. The branch prediction (correct) accuracy is 80% for the code below. Further assume the clock cycle time is 1000 MHz; there is full forwarding; indexed addressing mode is available (reg + reg = effective address); and, an instruction needs only 1 cycle in Execute (including the mul). The number on the left of the code is an instruction number. ;; for (i = 0; i < 10; i++) ;; A[i] = A[i] + A[i] * k; ;; assume registers are initialized prior to loop ;; R1 is k, R2 is base address of A ;; R4 is index i (as word, i.e., i * 4) 00 loop: lw R5,(R2+R4) ; R5 = A[i] 01 mul R6,R5,R1 ; R6 = R5 * k 02 add R6,R5,R6 ; R6 = R5 + R6 03 sw R6,(R2+R4) ; A[i] = R6 04 addi R4,R4,4 ; increment word offset for index i 05 sub R7,R4,40 ; iterate 10 times (10 * 4 bytes = 40) 06 bnez R7,loop ; branch to loop if R5!= 0 For the loop above, answer: 2 points. What is the instruction count? 4 points. What is the CPI? 4 points. What is the CPU time? 4

6. 10 points. Suppose the 5-stage MIPS pipeline is changed into a 7-stage pipeline. In the 7-stage pipeline, the original Decode stage is split into 2 stages: Decode, followed by Register Read. The original Memory stage is split into 2 separate stages: Access, followed by Data Delivery. A read data value is available after Data Delivery. Assume there are forwarding paths between all producer and consumer stages with two exceptions. First, a store (sw) can execute in EX without the source value to be stored. The effective address can be computed in EX but the register value to be stored can not be forwarded to EX from a later stage. Second, there is no forwarding path from Access back to Access. When forwarding cannot be done, the pipeline stalls until the hazard clears. Fill in the table with the movement of the instructions. In the table, use the abbreviations in parentheses: Fetch (F), Decode (D), Register Read (R), Execute (E), Access (A), Data Delivery (DD), Writeback (W), Stall (S). The first instruction is done as an example. Complete only the 15 cycles shown. Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 lw R5,(R2+R4) F D R E A DD W mul R6,R5,R1 add R6,R5,R6 sw R6,(R2+R4) addi R4,R4,4 sub R7,R4,40 bnez R7,loop 7. 5 points. For your completed table, explain why the pipeline is stopped for any of the stalls that you might have listed. 5

8. 6 points. Consider the code below and indicate the instructions (01 to 07) that are parallel with respect to one another. Two parallel instructions i and j can be executed as i before j or j before i. In the table, write Y if a pair of instructions are parallel. Otherwise, write a N. The number on the left of the code is the instruction number used in the table. ;; int A[12]; ;; for (i = 1 to 10) do { ;; A[i] = A[i] + A[i+1]; ;; A[i-1] = 2*A[i+1] + k; } ;; assume R1 is k, R2 is address A[i], R3 is address A[10] 01 loop: lw R5,0(R2) ; R5 = A[i] 02 lw R6,4(R2) ; R6 = A[i+1] 03 add R5,R5,R6 ; R5 = R5 + R6 04 sw R5,0(R2) ; A[i] = R5 05 add R6,R6,R6 ; R6 = 2*A[i+1] 06 add R6,R6,R1 ; R6 = R6 + k 07 sw R6,-4(R2) ; A[i-1] = R6 08 sub R5,R3,R2 ; check for last array element 09 addi R2,R2,4 ; increment A address by word size 10 bnez R5,loop ; branch to loop if R5!= 0 11 nop 01 02 03 04 05 06 07 01 N 02 N 03 N 04 N 05 N 06 N 07 N 9. 6 points. Consider the code from the previous problem. Using register renaming, rewrite instructions 01 to 10 to remove all name dependences. Assume registers R15 to R31 are available to use for renaming. When you rename registers, start with R15. Do not reorder the instructions. 6

10. 10 points. For your renamed code from the previous problem, schedule the whole loop (instructions 01 to 11) to remove as many stalls as possible. You may reorder/modify the instructions. Assume the 5-stage MIPS pipeline with delayed branches. 7

11. 9 points. Consider a 1-bit and a 2-bit predictor. In the table below, indicate the prediction for the branch and whether the preidction is correct. For the 1-bit predictor use the notation T for taken, and NT for not taken states. For the 2-bit predictor, use ST for stronglytaken state, T for weakly-taken state, SNT for strongly-not-taken state, and NT for weakly-not-taken state. Assuming the first entry in the table, fill in the predictor s state after branch resolution for each of the remaining branch outcomes. Outcome 1-bit predictor Prediction Next State Correct? 2-bit predictor Prediction Next State Correct? Taken N T No NT ST No Taken Taken Taken Taken 8