University of Toronto Faculty of Applied Science and Engineering

Size: px
Start display at page:

Download "University of Toronto Faculty of Applied Science and Engineering"

Transcription

1 Print: First Name:Solution Last Name: Student Number: University of Toronto Faculty of Applied Science and Engineering Final Examination December 16, 2013 ECE552F Computer Architecture Examiner Natalie Enright Jerger 1. There are 6 questions and 12 pages. Do all questions. The total number of marks is 101. The duration of the test is 2.5 hours. 2. ALL WORK IS TO BE DONE ON THESE SHEETS! Use the back of the pages if you need more space. Be sure to indicate clearly if your work continues elsewhere. 3. Please put your final solution in the box if one is provided. 4. Clear and concise answers will be considered more favourably than ones that ramble. Do not fill space just because it exists! 5. You may use two 8.5x11 aid sheets. 6. You may use faculty approved non-programmable calculators. 7. Always give some explanations or reasoning for how you arrived at your solutions to help the marker understand your thinking. 1 [18] 2 [25] 3 [18] 4 [16] 5 [9] 6 [15] Total [101] Page 1 of 12

2 1. From Lab [2 marks] (a) Multiple virtual networks are often used for cache coherence. Briefly explan what purpose these virtual networks serve. (Lab 6) Multiple virtual networks prevent protocol-level deadlock by ensuring that different coherence message types do not block each other in the network or queues. Prevents cyclic dependences from forming between in-flight coherence messages. [5 marks] (b) Why are transient states needed in coherence protocols? Give one example of transient state you implemented in Lab 6 and explain why this state was needed? 1st part: Transient states are needed because transitions between protocol states are not atomic. 2nd part: Many valid answers [3 marks] (c) Give an example of a data structure where a stride-prefetcher would work perfectly but a next-line prefetcher would fail (would not produce useful prefetches). (Lab 5) Many possible answers. Consider: struct a { int x; int y[31]; // assumes 16 ints per line } array[n]; for (int i = 0; i < N; i++) { array[i].x = i; } Page 2 of 12

3 [3 marks] (d) If two instructions compete for a resource in Tomasulo in the same cycle, which instruction would you choose to access the resource first? Why? (Lab 4) The older instruction. This will prevent starvation and will likely be better for performance as the older instruction may have dependent instructions waiting on it [5 marks] (e) Write a short microbenchmark that you would use to validate that you are correctly tracking load-to-use dependences in a 6-stage in-order pipeline. The stages of this pipeline are: Fetch, Decode, Execute1, Execute2, Memory, Writeback. Operands are needed at the start of Execute1 to compute the correct value. Correct syntax is not important for your microbenchmark but it must be clear what your code is doing (use comments as needed) (Lab 1). Many correct answers. Consider: LOOP: ADDI R1, 1 -> R1 LW [R3] -> R2 ADD R2, R3 -> R4 // 2-cycle stall, twice LW [R3] -> R2 ADD R2, R3 -> R4 LW [R3] -> R2 SUB R5, R5 -> R5 ADD R2, R3 -> R4 // 1 cycle stall, once LW [R3] -> R2 SUB R5, R5 -> R5 SUB R6, R6 -> R6 ADD R2, R3 -> R4 // no stall BNE R1, R7, LOOP Page 3 of 12

4 2. Multiprocessors (a) Consider the following code executed on processors P1 and P2: Initially: A = B = 0 P1 A = 1 B = 1 P2 Print B Print A [5 marks] Considering a sequentially consistent memory model, list all valid combinations that can be printed by P2. If certain combinations are not possible, provide a brief explanation as to why not. Possible Combinations: (B, A) (0, 0), (0, 1), (1, 1) Not possible: (1, 0) - If B prints 1, than A will also have to print 1 since the update to B by P1 occurs after the update to A. [7 marks] (b) Using load locked (LL) and store conditional (SC), write the assembly code to implement an atomic compare and swap: CAS Rx Ry X, where the value of Rx is first compared to the value of X and if they are equal, the values in Ry and X are swapped. X is located in memory and the address of X is in r3 CAS: LL R1, [R3] ADD Ry, R0 -> R2 BNE R1, Rx, exit ADD R1, R0 -> R2 ADD Ry, R0 -> R1 exit: SC R1, [R3] BEQZ R1, CAS ADD R2, R0 -> Ry return Page 4 of 12

5 (c) On the next page, you are given cache, memory and coherence state. This represents the initial state for each subpart of this question. Do NOT use your answer from part i in part ii or part iii. Each subpart is independent. This multiprocessor uses a directory coherence protocol; it has 4 processors. Each processor has a direct mapped cache with 2 sets; each set holds two words. The 4th bit in the address indicates the set. To simplify the format, the cache address tag contains the full address and each word shows only two hex characters with the least significant word on the right. The directory coherence states are M (modified), S (shared) and U (uncached) while the cache has states M, S and I. Each part of this question signifies a sequence of one or more CPU operations of the form: P#: op address [ value] where P# designates the CPU (P0, P1, P2, P3), op is the CPU operation (e.g. read or write), address denotes the memory address, and value indicates the new word to be assigned on a write operation. What is the resulting state (coherence state, address tags and data) of the caches and memory (including directory state) after the given sequence of actions? Show only the blocks that change, for example, P0.Set0: (S, 0x110, AB 33) indicates that CPU P0 s set 0 has the final state of S, address of 0x110 and data contents of 33 (address 0x110) and AB (address 0x114). Use a similar format to show changes in the directory. Also, what value is sent to the processor by a read operation? Write comments to help the marker understand your thinking. Page 5 of 12

6 Directory and Memory contents Address State Sharers Data 0x100 U EF 01 0x108 S 1, x110 S 0, 1, 3 AB 33 0x118 M x120 S x128 M x130 U x138 U Cache Contents P0 State Addr Data P1 State Addr Data Set 0 S 0x110 AB 33 Set 0 S 0x110 AB 33 Set 1 M 0x Set 1 S 0x P2 State Addr Data P3 State Addr Data Set 0 S 0x Set 0 S 0x110 AB 33 Set 1 S 0x Set 1 M 0x [2 marks] i. P3: write 0x Dir: (S, 0x110, (0, 1), AB 33) Dir: (M, 0x130, 3, 78 11) P3.Set0 (M, 0x130, 78 84) [6 marks] ii. P0: read 0x128 P1: read 0x128 Dir: (U, 0x118, -, 34 04) P0.Set0 (S, 0x128, 03 02) Dir: (S, 0x108, 2, 10 20) Dir: (S, 0x128, (0, 1, 3), 03 02) P1.Set0: (S, 0x128, 03 02) returns 02 to both P0 and P1 [5 marks] iii. P1: write 0x P1: read 0x110 P2: read 0x110 Dir: (S, 0x110, (1, 2) AB 11) Dir: (U, 0x120, -, 56 22) P0.Set0 (I, 0x110, AB 33) P3.Set0 (I, 0x110, AB 33) P1.Set0 (S, 0x110 AB 11) P2.Set0 (S, 0x110 AB 11) reads return 11 to both P1 and P2 Page 6 of 12

7 3. Dynamic Scheduling [18 marks] (a) Assume that you have a single-issue processor that uses MIPS R10K dynamic scheduling with re-order buffer as discussed in lecture. There are 3 reservation stations (Int 1, Int 2, Int 3) for integer operations and 3 integer execution units. Integer units are capable of doing addition, subtraction and multiplication. There are 2 load reservations stations, 1 store reservation station and 1 CDB. The ROB is initially empty and has 32 entries. Addition and subtraction take 2 cycles and multiplication takes 6 cycles. They write the CDB in the cycle after execution is complete. A reservation station is available to a new instruction on the cycle after the instruction in the reservation station writes the CDB. If multiple instructions are ready to write the CDB in the same cycle, priority is given to the instruction dispatched earliest. Instructions waiting for operands can complete issue in the same cycle that the data appears on the CDB. Memory instructions (load and store) take 4 cycles to compute the address and access memory. The address calculation does not use the integer units. Memory instructions write the CDB in the cycle after they finish accessing memory. Assume you have 10 physical registers. Initially R1-R6 are mapped to P1-P6 and P7 through P10 are free. A physical register can be reused by another instruction the cycle after it is freed. All reservation stations, execution units and ROB entries are free/available at the start of this code sequence. Consider the following code: LD [R2+0] R1 MULT R1, R2 R4 LD [R5] R6 ADD R6, R2 R6 ST R6 [R2+8] ADD R3, R4 R1 Complete the following table for this code sequence. For each column, record the cycle at which the instruction completes this stage. Also fill in the old (T old ) and new (T) register mapping for each instruction. Write comments to help the marker; clearly indicate what you are doing. Instruction D S X C R T T old Comment LD [R2+0] R P7 P1 MULT R1, R2 R P8 P4 LD [R5] R P9 P6 ADD R6, R2 R P10 P9 ST R6 [R2+8] ADD R3, R4 R P1 P7 Page 7 of 12

8 4. Caches [4 marks] (a) You re given a benchmark and a cache simulator, but you cannot modify either. The simulator outputs the total number of cache misses in the benchmark. As inputs to the simulator, you can configure the cache size (4B to infinite), the line size (4B to 128B) and the associativity (1-way to fully). You are asked to estimate how many capacity misses there would be if you were to use a 4-way 16kB cache with 64B lines. Describe how you would use the simulator to do this. A: fully associative cache - 16Kb, 64B lines gives you cold + capacity misses B: infinite size, fully associative, 64B lines gives you cold misses Capacity = A - B Page 8 of 12

9 [12 marks] (b) Consider the following trace of accesses to a set. Each letter represents a cache block: a b c a d e f f b e f e a b g c f e Assume a fully associative cache that can hold 3 cache blocks. The cache is cold at the start of the trace. Fill in the table below to show the contents of the cache after the access indicated in that row for each of the three replacement policies: LRU, MRU (most recently used), Optimum. The first row has been filled in for you. What is the miss rate with each replacement policies: LRU, MRU and Optimum? Also indicate if the access is a miss (enter Y/N). Cache block LRU MRU Optimum accessed cache contents Miss? cache contents Miss? cache contents Miss? a a, -, - Y a, -, - Y a, -, - Y b a,b Y a,b Y a,b Y c a,b,c Y a,b,c Y a,b,c Y a a,b,c N a,b,c N a,b,c N d a,c,d Y b,c,d Y a,b,d Y e a,d,e Y b,c,e Y a,b,e Y f d,e,f Y b,c,f Y b,e,f Y f d,e,f N b,c,f N b,e,f N b e,f,b Y b,c,f N b,e,f N e e,f,b N c,f,e Y b,e,f N f e,f,b N c,f,e N b,e,f N e e,f,b N c,f,e N b,e,f N a e,f,a Y c,f,a Y b,f,a Y b e,a,b Y c,f,b Y b,f,a N g a,b,g Y c,f,g Y b,f,g Y c b,g,c Y c,f,g N f,g,c Y f g,c,f Y c,f,g N f,g,c N e c,f,e Y c,g,e Y f,c,e Y Miss Rate 13/18=72.2% 11/18= 61.1% 10/18=55.6% Page 9 of 12

10 5. Pipelining [2 marks] (a) Consider a single-cycle CPU implementation. When the stages are split by functionality, the stages do not require exactly the same amount of time. The original machine had a clock cycle of 8ns. After the stages were split, the measured times were F (Fetch): 2.0 ns; D (Decode): 1.5 ns; E (Execute): 1.4 ns; M (Memory): 2.1 ns; W (Writeback): 1.0 ns. The total pipeline register delay is 0.2 ns. i. What is the clock cycle time of the 5-stage pipelined machine? [2 marks] Cycle time = 2.3 ns ii. If you could split one of the 5 stages into two stages, which stage would you select and why? The longest stage (Memory) because this would reduce the cycle time [2 marks] iii. What negative impact on performance might arise from splitting 1 stage into two stages? Pipeline is deeper so flushing the pipeline becomes more expensive. The cost of RAW hazards increases [3 marks] iv. If the pipelined machine had an infinite number of stages (the amount of work per stage can be divided into infinitely small chunks), what would its speedup be over the single-cycle machine (Ignore any stall cycles)? Speedup = 40 Page 10 of 12

11 6. Control flow (a) Consider the following code where a and b can each have a value of 0 or 1. For this code, a branch is consider taken T if the code in the if clause would execute and not taken N otherwise.. int my_func(int a, int b) { int c = 0; int d = 1; if (a == 0) { // 1st if c = 1; } if (b == 0) { // 2nd if d = 0; } } if (c == d) { // 3rd if return 1; } else { return 0; } [7 marks] i. Explain the branch prediction mechanism you would use to accurately predict the 3rd if statement. Your explanation could include a discussion of local history, global history and PC indexing bits. Note: There are many other branch instructions in this program besides those given in the code above. The 3rd if statement is taken if and only if the previous two if-statements have different outcomes. We need global history to correlate the outcome of B1 and B2 with the outcome of B3. For example, you could use 2 bits of global history and then use PC bit to index into private predictor tables to minimize aliasing with other branches. Values Branch Outcomes a b B1 B2 B3 0 0 T T N 0 1 T N T 1 0 N T T 1 1 N N N Page 11 of 12

12 [4 marks] ii. How many times would you need to call my func (with different values of a and b) in order to fully train your predictor? Clearly state any assumptions you make to arrive at your answer. There are four possible combinations for a,b. Assuming there is no aliasing, it depends on the initial state of the 2-bit saturation counter. If you assume it starts from a weakly taken position, it would need to see the input values once for the onces that result in a taken outcome (a=0,b=1) and (a=1,b=0) and see the (a,b) combinations that result in a not-taken outcome twice (weakly taken weakly not taken strongly not taken). This is 6 in total to guarantee correct predictions. Multiple correct answers [4 marks] (b) Calculate the CPI for a 6-stage pipelined processor where the branch prediction is verified in stage 4 and the branch target is calculated in stage 2 (there is no BTB). 25% of instructions are branches. 40% of branches are taken. Your branch direction predictor has an accuracy of 75%. 40% of correctly predicted branches are taken. There are no data hazards. Taken: 40%, 1 cycle to get target (correct), 3 cycles penalty (incorrect) Not taken: 60%, 0 cycle to get target (correct), 3 cycles penalty (incorrect) = CPI = Page 12 of 12

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:......... SOLUTION............... Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Computer System Architecture Final Examination Spring 2002

Computer System Architecture Final Examination Spring 2002 Computer System Architecture 6.823 Final Examination Spring 2002 Name: This is an open book, open notes exam. 180 Minutes 22 Pages Notes: Not all questions are of equal difficulty, so look over the entire

More information

Good luck and have fun!

Good luck and have fun! Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours CS433 Final Exam Prof Josep Torrellas December 12, 2006 Time: 2 hours Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 6 Questions. Please budget your time. 3. Calculators

More information

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)

More information

Final Exam Fall 2007

Final Exam Fall 2007 ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false. CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , ) Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections 3.4-3.5, 3.8-3.14) 1 1-Bit Prediction For each branch, keep track of what happened last time and use

More information

Problem 1 (logic design)

Problem 1 (logic design) Problem 1 (logic design) For this problem, you are to design and implement a sequential multiplexor that works as follows. On each clock cycle, interpret the current input as a selector from the most recent

More information

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: SOLUTION Notes: CS 152 Computer Architecture and Engineering CS 252 Graduate Computer Architecture Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: I am taking CS152 / CS252 This is a closed

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

EE557--FALL 2001 MIDTERM 2. Open books

EE557--FALL 2001 MIDTERM 2. Open books NAME: SOLUTIONS STUDENT NUMBER: EE557--FALL 2001 MIDTERM 2 Open books Q1: /16 Q2: /12 Q3: /8 Q4: /8 Q5: /8 Q6: /8 TOTAL: /60 Grade: /25 1 QUESTION 1(Tomasulo with ROB) 16 points Consider the following

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

6.823 Computer System Architecture

6.823 Computer System Architecture 6.823 Computer System Architecture Problem Set #4 Spring 2002 Students are encouraged to collaborate in groups of up to 3 people. A group needs to hand in only one copy of the solution to a problem set.

More information

2. [3 marks] Show your work in the computation for the following questions involving CPI and performance.

2. [3 marks] Show your work in the computation for the following questions involving CPI and performance. CS230 Spring 2018 Assignment 3 Due Date: Wednesday, July 11, 2017, 11:59 p.m. Weight: 7% of the course grade 1. (a) [3 marks] Write a MIPS program that takes a string as input from the user. Assume that

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are

More information

HW1 Solutions. Type Old Mix New Mix Cost CPI

HW1 Solutions. Type Old Mix New Mix Cost CPI HW1 Solutions Problem 1 TABLE 1 1. Given the parameters of Problem 6 (note that int =35% and shift=5% to fix typo in book problem), consider a strength-reducing optimization that converts multiplies by

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

Branch prediction ( 3.3) Dynamic Branch Prediction

Branch prediction ( 3.3) Dynamic Branch Prediction prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to

More information

Updated Exercises by Diana Franklin

Updated Exercises by Diana Franklin C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address

More information

CS / ECE 6810 Midterm Exam - Oct 21st 2008

CS / ECE 6810 Midterm Exam - Oct 21st 2008 Name and ID: CS / ECE 6810 Midterm Exam - Oct 21st 2008 Notes: This is an open notes and open book exam. If necessary, make reasonable assumptions and clearly state them. The only clarifications you may

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

CMSC411 Fall 2013 Midterm 2 Solutions

CMSC411 Fall 2013 Midterm 2 Solutions CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 12 Mahadevan Gomathisankaran March 4, 2010 03/04/2010 Lecture 12 CSCE 4610/5610 1 Discussion: Assignment 2 03/04/2010 Lecture 12 CSCE 4610/5610 2 Increasing Fetch

More information

HY425 Lecture 05: Branch Prediction

HY425 Lecture 05: Branch Prediction HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4) 1 Cache Design You have already written your civic registration number (personnummer) on the cover page in the format YyMmDd-XXXX. Use the following formulas to calculate the parameters of your caches:

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

Cache Organizations for Multi-cores

Cache Organizations for Multi-cores Lecture 26: Recap Announcements: Assgn 9 (and earlier assignments) will be ready for pick-up from the CS front office later this week Office hours: all day next Tuesday Final exam: Wednesday 13 th, 7:50-10am,

More information

EECS 470 Midterm Exam Answer Key Fall 2004

EECS 470 Midterm Exam Answer Key Fall 2004 EECS 470 Midterm Exam Answer Key Fall 2004 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Part I /23 Part

More information

Instruction-Level Parallelism. Instruction Level Parallelism (ILP)

Instruction-Level Parallelism. Instruction Level Parallelism (ILP) Instruction-Level Parallelism CS448 1 Pipelining Instruction Level Parallelism (ILP) Limited form of ILP Overlapping instructions, these instructions can be evaluated in parallel (to some degree) Pipeline

More information

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

EECS 470 Final Exam Fall 2005

EECS 470 Final Exam Fall 2005 EECS 470 Final Exam Fall 2005 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Section 1 /30 Section 2 /30

More information

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. he memory is word addressable he size of the cache is 8 blocks; each block is 4 words (32 words cache).

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012 Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.

More information

CSE 490/590 Computer Architecture Homework 2

CSE 490/590 Computer Architecture Homework 2 CSE 490/590 Computer Architecture Homework 2 1. Suppose that you have the following out-of-order datapath with 1-cycle ALU, 2-cycle Mem, 3-cycle Fadd, 5-cycle Fmul, no branch prediction, and in-order fetch

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

The basic structure of a MIPS floating-point unit

The basic structure of a MIPS floating-point unit Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from

More information

Midterm Exam 1 Wednesday, March 12, 2008

Midterm Exam 1 Wednesday, March 12, 2008 Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

CS Mid-Term Examination - Fall Solutions. Section A.

CS Mid-Term Examination - Fall Solutions. Section A. CS 211 - Mid-Term Examination - Fall 2008. Solutions Section A. Ques.1: 10 points For each of the questions, underline or circle the most suitable answer(s). The performance of a pipeline processor is

More information

ECE/CS 757: Homework 1

ECE/CS 757: Homework 1 ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)

More information

CS/CoE 1541 Mid Term Exam (Fall 2018).

CS/CoE 1541 Mid Term Exam (Fall 2018). CS/CoE 1541 Mid Term Exam (Fall 2018). Name: Question 1: (6+3+3+4+4=20 points) For this question, refer to the following pipeline architecture. a) Consider the execution of the following code (5 instructions)

More information

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections ) Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections 2.3-2.6) 1 Correlating Predictors Basic branch prediction: maintain a 2-bit saturating counter for each

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4 PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI

More information

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions) EE457 Out of Order (OoO) Execution Introduction to Dynamic Scheduling of Instructions (The Tomasulo Algorithm) By Gandhi Puvvada References EE557 Textbook Prof Dubois EE557 Classnotes Prof Annavaram s

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Final Review Shuai Wang Department of Computer Science and Technology Nanjing University Computer Architecture Computer architecture, like other architecture, is the art

More information

CS232 Final Exam May 5, 2001

CS232 Final Exam May 5, 2001 CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Course on Advanced Computer Architectures

Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,

More information

EECS 470 Midterm Exam Winter 2015

EECS 470 Midterm Exam Winter 2015 EECS 470 Midterm Exam Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: Page # Points 2 /20 3 /15 4 /9 5

More information

Please state clearly any assumptions you make in solving the following problems.

Please state clearly any assumptions you make in solving the following problems. Computer Architecture Homework 3 2012-2013 Please state clearly any assumptions you make in solving the following problems. 1 Processors Write a short report on at least five processors from at least three

More information

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information