CS 351 Final Review Quiz

Similar documents
CS 351 Final Exam Solutions

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

CS252 Graduate Computer Architecture Midterm 1 Solutions

Computer Architecture EE 4720 Final Examination

Computer Architecture and Engineering CS152 Quiz #4 April 11th, 2016 Professor George Michelogiannakis

EECC551 Exam Review 4 questions out of 6 questions

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

LECTURE 10. Pipelining: Advanced ILP

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

Computer System Architecture Final Examination Spring 2002

CS430 Computer Architecture

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Course on Advanced Computer Architectures

Instruction-Level Parallelism and Its Exploitation

exam length Exam Review 1 exam focus exam format

Advanced Instruction-Level Parallelism

CS146 Computer Architecture. Fall Midterm Exam

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Computer Architecture Memory hierarchies and caches

Final Exam May 8th, 2018 Professor Krste Asanovic Name:

Computer Architecture, EIT090 exam

Last lecture. Some misc. stuff An older real processor Class review/overview.

Shared Symmetric Memory Systems

Computer Architecture and Engineering. CS152 Quiz #1. February 19th, Professor Krste Asanovic. Name:

CS252 Prerequisite Quiz. Solutions Fall 2007

Pipelining Exercises, Continued

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

Coherence and Consistency

Computer System Architecture Final Examination

CS 614 COMPUTER ARCHITECTURE II FALL 2004

Pipelining. CSC Friday, November 6, 2015

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

6.004 Tutorial Problems L22 Branch Prediction

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

SUPERSCALAR AND VLIW PROCESSORS

Processor (IV) - advanced ILP. Hwansoo Han

HY425 Lecture 09: Software to exploit ILP

Computer Architecture and Engineering. CS152 Quiz #4 Solutions

HY425 Lecture 09: Software to exploit ILP

East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

Lecture 19: Instruction Level Parallelism

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

Case Study IBM PowerPC 620

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Computer Architecture and Engineering. CS152 Quiz #2. March 3rd, Professor Krste Asanovic. Name:

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Static, multiple-issue (superscaler) pipelines

The Processor: Instruction-Level Parallelism

Advanced Computer Architecture

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods

Processor: Superscalars Dynamic Scheduling

CS422 Computer Architecture

Multi-threaded processors. Hung-Wei Tseng x Dean Tullsen

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

Computer System Architecture Quiz #2 April 5th, 2019

Preventing Stalls: 1

/ / / Net Speedup. Percentage of Vectorization

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lec 25: Parallel Processors. Announcements

Alexandria University

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Chapter 4 The Processor (Part 4)

COMPUTER ORGANIZATION AND DESI

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS 251, Winter 2019, Assignment % of course mark

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Hardware-based Speculation

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Computer Architecture and Engineering CS152 Quiz #4 April 11th, 2012 Professor Krste Asanović

ungraded and not collected

Multi-cycle Instructions in the Pipeline (Floating Point)

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

Computer Organization

CPE 631 Advanced Computer Systems Architecture: Homework #2

Final Exam Fall 2007

Computer Systems Architecture

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 19 Summary

Super Scalar. Kalyan Basu March 21,

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 2: Memory Hierarchy Design Part 2

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

5008: Computer Architecture HW#2

Chapter 5. Multiprocessors and Thread-Level Parallelism

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Transcription:

CS 351 Final Review Quiz Notes: You must explain your answers to receive partial credit. You will lose points for incorrect extraneous information, even if the answer is otherwise correct. Question 1: Short answer [25 points]. a. Define the term basic block. Are long basic blocks or short basic blocks better for optimization? Why? b. Write a snippet of code (MIPS or high- level pseudocode) exhibiting DLP. What kinds of ISAs allow DLP to be exploited in hardware? 1

c. Explain the recent emergence of multicore processors. What type of parallelism do they exploit? d. Many of the fastest supercomputers in the world are clusters rather than special- purpose machines. What are the advantages of using clusters? What are the disadvantages? e. What is false sharing? Show pseudocode that exhibits false sharing for a dual- core machine with 8- word cache blocks. 2

Question 2: Compiler optimizations [20 points]. For the following C code: for (int i=0; i<n; i++) { a[i] = b[i] + c[i]; d[i] = e[i] + f[i]; } Assume the following: The arrays are integer arrays, where an integer is 1 word long. N is very large. The loop counter i, the array size N, and the base addresses of the six arrays are kept in registers at all times. The code runs on a machine whose L1 cache is fully associative and consists of four 4- word blocks. (a) What is the cache hit rate of this code, assuming that the cache is initially empty? (b) Restructure the code to increase the cache hit rate. (c) What is this optimization called? 3

(d) What is the cache hit rate of the modified code? (e) This question does not refer to the snippet of code used in Parts (a)- (d). In MIPS or a higher- level language, show an example of code motion. 4

Question 3: Exploiting ILP [25 points]. For the following MIPS code: 1) LW $t1, 0($t2) 2) ADDI $t1, $t1, 1 3) SW $t1, 0($t2) 4) ADDI $t2, $t2, 4 5) SUB $t4, $t3, $t2 6) ADDI $t4, $t4, 1 (a) List the dependencies and antidependencies. For each, give the 2 instructions and the register involved, and state whether it is a true dependency or an antidependency. (b) Rename the instructions using physical registers $p0 through $p63. 5

(c) Assume a 5- stage pipeline that works as follows: Stage 1: Fetch 2 instructions from memory. Stage 2: Decode and rename the instructions; choose up to 2 ready instructions to issue to the execution units. Stage 3: (identical to MIPS EX stage, except that 2 instructions can be processed at once) Stage 4: (identical to MIPS MEM stage, except that 2 instructions can be processed at once) Stage 5: Write back to the physical register file; update the reorder buffer; officially complete up to 2 instructions by removing them from the reorder buffer. Assume that an instruction can start Stage 3 in the cycle immediately after its operands are produced. That is, assume that the issue logic can figure out that an instruction s operands will be ready in time for the next cycle. List the instructions (by number) that will be fetched in Cycle 1: List the instructions (by number) that will be fetched in Cycle 2: List the instructions (by number) that will be decoded and renamed in Cycle 2. Label each instruction either with READY or with the physical register(s) it must wait for before it can execute. Mark instructions that will advance to Stage 3 in the next cycle with a *. List the instructions (by number) that will be fetched in Cycle 3: List the instructions (by number) that will be decoded and renamed in Cycle 3: 6

List the instructions that are currently in the instruction window during Cycle 3. Label each instruction either with READY or with the physical register(s) it must wait for before it can execute. If an instruction s operands will be ready at the end of Cycle 3, mark it as READY. Mark instructions that will advance to Stage 3 in the next cycle with a *. List the instructions (by number) that will be decoded and renamed in Cycle 4: List the instruction(s) (by number) whose results will be ready at the end of Cycle 4: List the instructions that are currently in the instruction window during Cycle 4. Label each instruction either with READY or with the physical register(s) it must wait for before it can execute. If an instruction s operands will be ready at the end of Cycle 4, mark it as READY. Mark instructions that will advance to Stage 3 in the next cycle with a *. List the instructions that are currently in the instruction window during Cycle 5. Label each instruction either with READY or with the physical register(s) it must wait for before it can execute. If an instruction s operands will be ready at the end of Cycle 5, mark it as READY. List the instruction(s) (by number) that will reach the reorder buffer at the end of Cycle 5. Will these instruction(s) be able to officially commit, or will they remain in the reorder buffer? 7

8

Question 4: Cache coherence [20 points]. For this problem, see the state machine on the final page of this test, which is identical to Figure 9.3.4 in your textbook. (a)fill out the following chart for a two- processor system executing the following sequence of instructions and adhering to the cache protocol in the diagram. Assume the following: The data from addresses 110 and 120 are initially stored in both processors caches and marked as clean/shared. The cache block size is 1 word. Before and After refer to the block s state in cache before the instruction is executed and after it is executed, respectively. Processor: Instruction P0: read 120 P0: write 120 P1: write 120 P1: read 110 P0: write 110 P1: read 120 Hit or miss? Block s state in P0 cache (before) Block s state in P0 cache (after) Block s state in P1 cache (before) Block s state in P1 cache (after) (b) The MESI protocol is similar to the protocol in the diagram, except that it has an additional state called Exclusive. If a block is marked Exclusive, that means that it is present in only one processor s cache and not shared in any other processor s cache, and that it is clean (has not been written). What is the advantage of this additional state? 9

Question 5: Parallel decomposition [10 points]. (a) What does it mean for a parallel computation to have a reduction? (b) To find the maximum element of an array of size N using 4 processors in parallel, where processor P0 initially contains the array elements, you can follow this algorithm: (1) Send N/4 elements to each of the other processors (2) Have all processors find the maximum of their N/4 elements (3) Have processors P0 and P1 communicate to find the larger of their individual maximum elements; in parallel, have P2 and P3 do the same thing. (4) Compare the results from step (3) to find the overall maximum element. Alternatively, you could just do the entire computation on P0. We call this the sequential algorithm. If it takes X cycles to send/receive one element between processors, and Y cycles to compare two elements, how much time does the sequential algorithm take? How much time does each step of the parallel algorithm take? Using the above results, what relationship between X and Y must be true for the parallel algorithm to be preferable to the sequential algorithm? (e.g. X < Y) 10

Processor write miss Invalid (not valid cache block) Processor miss (write dirty block to memory) Processor read miss Processor miss (Send invalidate) Shared (clean) Processor write hit Processor read hit Modified (dirty) Processor read hit or write hit a. Cache state transitions using signals from the processor Another processor has a read miss or a write miss for this block (seen on bus); write back old block Invalid (not valid cache block) Invalidate or another processor has a write miss for this block (seen on bus) Shared (clean) Modified (dirty) b. Cache state transitions using signals from the bus FIGURE 9.3.4 A write-invalidate cache coherence protocol. Part a of the diagram shows state transitions based on actions of the processor associated with this cache; part b shows transitions based on actions of other processors seen as operations on the bus. There is really only one state machine in a cache block, although there are two represented here to clarify when a transition occurs. The black arrows and actions specified in black text would be found in caches without coherency; the colored arrows and actions are added to achieve cache coherency. 11