University of Toronto Faculty of Applied Science and Engineering
|
|
- Emerald Hudson
- 6 years ago
- Views:
Transcription
1 Print: First Name:Solution Last Name: Student Number: University of Toronto Faculty of Applied Science and Engineering Final Examination December 16, 2013 ECE552F Computer Architecture Examiner Natalie Enright Jerger 1. There are 6 questions and 12 pages. Do all questions. The total number of marks is 101. The duration of the test is 2.5 hours. 2. ALL WORK IS TO BE DONE ON THESE SHEETS! Use the back of the pages if you need more space. Be sure to indicate clearly if your work continues elsewhere. 3. Please put your final solution in the box if one is provided. 4. Clear and concise answers will be considered more favourably than ones that ramble. Do not fill space just because it exists! 5. You may use two 8.5x11 aid sheets. 6. You may use faculty approved non-programmable calculators. 7. Always give some explanations or reasoning for how you arrived at your solutions to help the marker understand your thinking. 1 [18] 2 [25] 3 [18] 4 [16] 5 [9] 6 [15] Total [101] Page 1 of 12
2 1. From Lab [2 marks] (a) Multiple virtual networks are often used for cache coherence. Briefly explan what purpose these virtual networks serve. (Lab 6) Multiple virtual networks prevent protocol-level deadlock by ensuring that different coherence message types do not block each other in the network or queues. Prevents cyclic dependences from forming between in-flight coherence messages. [5 marks] (b) Why are transient states needed in coherence protocols? Give one example of transient state you implemented in Lab 6 and explain why this state was needed? 1st part: Transient states are needed because transitions between protocol states are not atomic. 2nd part: Many valid answers [3 marks] (c) Give an example of a data structure where a stride-prefetcher would work perfectly but a next-line prefetcher would fail (would not produce useful prefetches). (Lab 5) Many possible answers. Consider: struct a { int x; int y[31]; // assumes 16 ints per line } array[n]; for (int i = 0; i < N; i++) { array[i].x = i; } Page 2 of 12
3 [3 marks] (d) If two instructions compete for a resource in Tomasulo in the same cycle, which instruction would you choose to access the resource first? Why? (Lab 4) The older instruction. This will prevent starvation and will likely be better for performance as the older instruction may have dependent instructions waiting on it [5 marks] (e) Write a short microbenchmark that you would use to validate that you are correctly tracking load-to-use dependences in a 6-stage in-order pipeline. The stages of this pipeline are: Fetch, Decode, Execute1, Execute2, Memory, Writeback. Operands are needed at the start of Execute1 to compute the correct value. Correct syntax is not important for your microbenchmark but it must be clear what your code is doing (use comments as needed) (Lab 1). Many correct answers. Consider: LOOP: ADDI R1, 1 -> R1 LW [R3] -> R2 ADD R2, R3 -> R4 // 2-cycle stall, twice LW [R3] -> R2 ADD R2, R3 -> R4 LW [R3] -> R2 SUB R5, R5 -> R5 ADD R2, R3 -> R4 // 1 cycle stall, once LW [R3] -> R2 SUB R5, R5 -> R5 SUB R6, R6 -> R6 ADD R2, R3 -> R4 // no stall BNE R1, R7, LOOP Page 3 of 12
4 2. Multiprocessors (a) Consider the following code executed on processors P1 and P2: Initially: A = B = 0 P1 A = 1 B = 1 P2 Print B Print A [5 marks] Considering a sequentially consistent memory model, list all valid combinations that can be printed by P2. If certain combinations are not possible, provide a brief explanation as to why not. Possible Combinations: (B, A) (0, 0), (0, 1), (1, 1) Not possible: (1, 0) - If B prints 1, than A will also have to print 1 since the update to B by P1 occurs after the update to A. [7 marks] (b) Using load locked (LL) and store conditional (SC), write the assembly code to implement an atomic compare and swap: CAS Rx Ry X, where the value of Rx is first compared to the value of X and if they are equal, the values in Ry and X are swapped. X is located in memory and the address of X is in r3 CAS: LL R1, [R3] ADD Ry, R0 -> R2 BNE R1, Rx, exit ADD R1, R0 -> R2 ADD Ry, R0 -> R1 exit: SC R1, [R3] BEQZ R1, CAS ADD R2, R0 -> Ry return Page 4 of 12
5 (c) On the next page, you are given cache, memory and coherence state. This represents the initial state for each subpart of this question. Do NOT use your answer from part i in part ii or part iii. Each subpart is independent. This multiprocessor uses a directory coherence protocol; it has 4 processors. Each processor has a direct mapped cache with 2 sets; each set holds two words. The 4th bit in the address indicates the set. To simplify the format, the cache address tag contains the full address and each word shows only two hex characters with the least significant word on the right. The directory coherence states are M (modified), S (shared) and U (uncached) while the cache has states M, S and I. Each part of this question signifies a sequence of one or more CPU operations of the form: P#: op address [ value] where P# designates the CPU (P0, P1, P2, P3), op is the CPU operation (e.g. read or write), address denotes the memory address, and value indicates the new word to be assigned on a write operation. What is the resulting state (coherence state, address tags and data) of the caches and memory (including directory state) after the given sequence of actions? Show only the blocks that change, for example, P0.Set0: (S, 0x110, AB 33) indicates that CPU P0 s set 0 has the final state of S, address of 0x110 and data contents of 33 (address 0x110) and AB (address 0x114). Use a similar format to show changes in the directory. Also, what value is sent to the processor by a read operation? Write comments to help the marker understand your thinking. Page 5 of 12
6 Directory and Memory contents Address State Sharers Data 0x100 U EF 01 0x108 S 1, x110 S 0, 1, 3 AB 33 0x118 M x120 S x128 M x130 U x138 U Cache Contents P0 State Addr Data P1 State Addr Data Set 0 S 0x110 AB 33 Set 0 S 0x110 AB 33 Set 1 M 0x Set 1 S 0x P2 State Addr Data P3 State Addr Data Set 0 S 0x Set 0 S 0x110 AB 33 Set 1 S 0x Set 1 M 0x [2 marks] i. P3: write 0x Dir: (S, 0x110, (0, 1), AB 33) Dir: (M, 0x130, 3, 78 11) P3.Set0 (M, 0x130, 78 84) [6 marks] ii. P0: read 0x128 P1: read 0x128 Dir: (U, 0x118, -, 34 04) P0.Set0 (S, 0x128, 03 02) Dir: (S, 0x108, 2, 10 20) Dir: (S, 0x128, (0, 1, 3), 03 02) P1.Set0: (S, 0x128, 03 02) returns 02 to both P0 and P1 [5 marks] iii. P1: write 0x P1: read 0x110 P2: read 0x110 Dir: (S, 0x110, (1, 2) AB 11) Dir: (U, 0x120, -, 56 22) P0.Set0 (I, 0x110, AB 33) P3.Set0 (I, 0x110, AB 33) P1.Set0 (S, 0x110 AB 11) P2.Set0 (S, 0x110 AB 11) reads return 11 to both P1 and P2 Page 6 of 12
7 3. Dynamic Scheduling [18 marks] (a) Assume that you have a single-issue processor that uses MIPS R10K dynamic scheduling with re-order buffer as discussed in lecture. There are 3 reservation stations (Int 1, Int 2, Int 3) for integer operations and 3 integer execution units. Integer units are capable of doing addition, subtraction and multiplication. There are 2 load reservations stations, 1 store reservation station and 1 CDB. The ROB is initially empty and has 32 entries. Addition and subtraction take 2 cycles and multiplication takes 6 cycles. They write the CDB in the cycle after execution is complete. A reservation station is available to a new instruction on the cycle after the instruction in the reservation station writes the CDB. If multiple instructions are ready to write the CDB in the same cycle, priority is given to the instruction dispatched earliest. Instructions waiting for operands can complete issue in the same cycle that the data appears on the CDB. Memory instructions (load and store) take 4 cycles to compute the address and access memory. The address calculation does not use the integer units. Memory instructions write the CDB in the cycle after they finish accessing memory. Assume you have 10 physical registers. Initially R1-R6 are mapped to P1-P6 and P7 through P10 are free. A physical register can be reused by another instruction the cycle after it is freed. All reservation stations, execution units and ROB entries are free/available at the start of this code sequence. Consider the following code: LD [R2+0] R1 MULT R1, R2 R4 LD [R5] R6 ADD R6, R2 R6 ST R6 [R2+8] ADD R3, R4 R1 Complete the following table for this code sequence. For each column, record the cycle at which the instruction completes this stage. Also fill in the old (T old ) and new (T) register mapping for each instruction. Write comments to help the marker; clearly indicate what you are doing. Instruction D S X C R T T old Comment LD [R2+0] R P7 P1 MULT R1, R2 R P8 P4 LD [R5] R P9 P6 ADD R6, R2 R P10 P9 ST R6 [R2+8] ADD R3, R4 R P1 P7 Page 7 of 12
8 4. Caches [4 marks] (a) You re given a benchmark and a cache simulator, but you cannot modify either. The simulator outputs the total number of cache misses in the benchmark. As inputs to the simulator, you can configure the cache size (4B to infinite), the line size (4B to 128B) and the associativity (1-way to fully). You are asked to estimate how many capacity misses there would be if you were to use a 4-way 16kB cache with 64B lines. Describe how you would use the simulator to do this. A: fully associative cache - 16Kb, 64B lines gives you cold + capacity misses B: infinite size, fully associative, 64B lines gives you cold misses Capacity = A - B Page 8 of 12
9 [12 marks] (b) Consider the following trace of accesses to a set. Each letter represents a cache block: a b c a d e f f b e f e a b g c f e Assume a fully associative cache that can hold 3 cache blocks. The cache is cold at the start of the trace. Fill in the table below to show the contents of the cache after the access indicated in that row for each of the three replacement policies: LRU, MRU (most recently used), Optimum. The first row has been filled in for you. What is the miss rate with each replacement policies: LRU, MRU and Optimum? Also indicate if the access is a miss (enter Y/N). Cache block LRU MRU Optimum accessed cache contents Miss? cache contents Miss? cache contents Miss? a a, -, - Y a, -, - Y a, -, - Y b a,b Y a,b Y a,b Y c a,b,c Y a,b,c Y a,b,c Y a a,b,c N a,b,c N a,b,c N d a,c,d Y b,c,d Y a,b,d Y e a,d,e Y b,c,e Y a,b,e Y f d,e,f Y b,c,f Y b,e,f Y f d,e,f N b,c,f N b,e,f N b e,f,b Y b,c,f N b,e,f N e e,f,b N c,f,e Y b,e,f N f e,f,b N c,f,e N b,e,f N e e,f,b N c,f,e N b,e,f N a e,f,a Y c,f,a Y b,f,a Y b e,a,b Y c,f,b Y b,f,a N g a,b,g Y c,f,g Y b,f,g Y c b,g,c Y c,f,g N f,g,c Y f g,c,f Y c,f,g N f,g,c N e c,f,e Y c,g,e Y f,c,e Y Miss Rate 13/18=72.2% 11/18= 61.1% 10/18=55.6% Page 9 of 12
10 5. Pipelining [2 marks] (a) Consider a single-cycle CPU implementation. When the stages are split by functionality, the stages do not require exactly the same amount of time. The original machine had a clock cycle of 8ns. After the stages were split, the measured times were F (Fetch): 2.0 ns; D (Decode): 1.5 ns; E (Execute): 1.4 ns; M (Memory): 2.1 ns; W (Writeback): 1.0 ns. The total pipeline register delay is 0.2 ns. i. What is the clock cycle time of the 5-stage pipelined machine? [2 marks] Cycle time = 2.3 ns ii. If you could split one of the 5 stages into two stages, which stage would you select and why? The longest stage (Memory) because this would reduce the cycle time [2 marks] iii. What negative impact on performance might arise from splitting 1 stage into two stages? Pipeline is deeper so flushing the pipeline becomes more expensive. The cost of RAW hazards increases [3 marks] iv. If the pipelined machine had an infinite number of stages (the amount of work per stage can be divided into infinitely small chunks), what would its speedup be over the single-cycle machine (Ignore any stall cycles)? Speedup = 40 Page 10 of 12
11 6. Control flow (a) Consider the following code where a and b can each have a value of 0 or 1. For this code, a branch is consider taken T if the code in the if clause would execute and not taken N otherwise.. int my_func(int a, int b) { int c = 0; int d = 1; if (a == 0) { // 1st if c = 1; } if (b == 0) { // 2nd if d = 0; } } if (c == d) { // 3rd if return 1; } else { return 0; } [7 marks] i. Explain the branch prediction mechanism you would use to accurately predict the 3rd if statement. Your explanation could include a discussion of local history, global history and PC indexing bits. Note: There are many other branch instructions in this program besides those given in the code above. The 3rd if statement is taken if and only if the previous two if-statements have different outcomes. We need global history to correlate the outcome of B1 and B2 with the outcome of B3. For example, you could use 2 bits of global history and then use PC bit to index into private predictor tables to minimize aliasing with other branches. Values Branch Outcomes a b B1 B2 B3 0 0 T T N 0 1 T N T 1 0 N T T 1 1 N N N Page 11 of 12
12 [4 marks] ii. How many times would you need to call my func (with different values of a and b) in order to fully train your predictor? Clearly state any assumptions you make to arrive at your answer. There are four possible combinations for a,b. Assuming there is no aliasing, it depends on the initial state of the 2-bit saturation counter. If you assume it starts from a weakly taken position, it would need to see the input values once for the onces that result in a taken outcome (a=0,b=1) and (a=1,b=0) and see the (a,b) combinations that result in a not-taken outcome twice (weakly taken weakly not taken strongly not taken). This is 6 in total to guarantee correct predictions. Multiple correct answers [4 marks] (b) Calculate the CPI for a 6-stage pipelined processor where the branch prediction is verified in stage 4 and the branch target is calculated in stage 2 (there is no BTB). 25% of instructions are branches. 40% of branches are taken. Your branch direction predictor has an accuracy of 75%. 40% of correctly predicted branches are taken. There are no data hazards. Taken: 40%, 1 cycle to get target (correct), 3 cycles penalty (incorrect) Not taken: 60%, 0 cycle to get target (correct), 3 cycles penalty (incorrect) = CPI = Page 12 of 12
University of Toronto Faculty of Applied Science and Engineering
Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationUniversity of Toronto Faculty of Applied Science and Engineering
Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationUniversity of Toronto Faculty of Applied Science and Engineering
Print: First Name:......... SOLUTION............... Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationUniversity of Toronto Faculty of Applied Science and Engineering
Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationCS433 Homework 2 (Chapter 3)
CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..
More informationCS433 Homework 2 (Chapter 3)
CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationComputer System Architecture Final Examination Spring 2002
Computer System Architecture 6.823 Final Examination Spring 2002 Name: This is an open book, open notes exam. 180 Minutes 22 Pages Notes: Not all questions are of equal difficulty, so look over the entire
More informationGood luck and have fun!
Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationCS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25
CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem
More informationLecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest
More informationCS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours
CS433 Final Exam Prof Josep Torrellas December 12, 2006 Time: 2 hours Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 6 Questions. Please budget your time. 3. Calculators
More informationFor this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units
CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)
More informationFinal Exam Fall 2007
ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd
More informationCS252 Graduate Computer Architecture Midterm 1 Solutions
CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationLecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )
Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections 3.4-3.5, 3.8-3.14) 1 1-Bit Prediction For each branch, keep track of what happened last time and use
More informationProblem 1 (logic design)
Problem 1 (logic design) For this problem, you are to design and implement a sequential multiplexor that works as follows. On each clock cycle, interpret the current input as a selector from the most recent
More informationCISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions
CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationSOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:
SOLUTION Notes: CS 152 Computer Architecture and Engineering CS 252 Graduate Computer Architecture Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: I am taking CS152 / CS252 This is a closed
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationEE557--FALL 2001 MIDTERM 2. Open books
NAME: SOLUTIONS STUDENT NUMBER: EE557--FALL 2001 MIDTERM 2 Open books Q1: /16 Q2: /12 Q3: /8 Q4: /8 Q5: /8 Q6: /8 TOTAL: /60 Grade: /25 1 QUESTION 1(Tomasulo with ROB) 16 points Consider the following
More informationControl Hazards. Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationControl Hazards. Branch Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More information6.823 Computer System Architecture
6.823 Computer System Architecture Problem Set #4 Spring 2002 Students are encouraged to collaborate in groups of up to 3 people. A group needs to hand in only one copy of the solution to a problem set.
More information2. [3 marks] Show your work in the computation for the following questions involving CPI and performance.
CS230 Spring 2018 Assignment 3 Due Date: Wednesday, July 11, 2017, 11:59 p.m. Weight: 7% of the course grade 1. (a) [3 marks] Write a MIPS program that takes a string as input from the user. Assume that
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationTHE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This
More informationENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013
ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of
More informationHardware-based Speculation
Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationComputer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović
Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are
More informationHW1 Solutions. Type Old Mix New Mix Cost CPI
HW1 Solutions Problem 1 TABLE 1 1. Given the parameters of Problem 6 (note that int =35% and shift=5% to fix typo in book problem), consider a strength-reducing optimization that converts multiplies by
More informationLoad1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1
Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]
More informationBranch prediction ( 3.3) Dynamic Branch Prediction
prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to
More informationUpdated Exercises by Diana Franklin
C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address
More informationCS / ECE 6810 Midterm Exam - Oct 21st 2008
Name and ID: CS / ECE 6810 Midterm Exam - Oct 21st 2008 Notes: This is an open notes and open book exam. If necessary, make reasonable assumptions and clearly state them. The only clarifications you may
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationCMSC411 Fall 2013 Midterm 2 Solutions
CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has
More informationComputer Architecture CS372 Exam 3
Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 12 Mahadevan Gomathisankaran March 4, 2010 03/04/2010 Lecture 12 CSCE 4610/5610 1 Discussion: Assignment 2 03/04/2010 Lecture 12 CSCE 4610/5610 2 Increasing Fetch
More informationHY425 Lecture 05: Branch Prediction
HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationS = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)
1 Cache Design You have already written your civic registration number (personnummer) on the cover page in the format YyMmDd-XXXX. Use the following formulas to calculate the parameters of your caches:
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationControl Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.
Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation
More informationHardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.
Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)
More informationCache Organizations for Multi-cores
Lecture 26: Recap Announcements: Assgn 9 (and earlier assignments) will be ready for pick-up from the CS front office later this week Office hours: all day next Tuesday Final exam: Wednesday 13 th, 7:50-10am,
More informationEECS 470 Midterm Exam Answer Key Fall 2004
EECS 470 Midterm Exam Answer Key Fall 2004 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Part I /23 Part
More informationInstruction-Level Parallelism. Instruction Level Parallelism (ILP)
Instruction-Level Parallelism CS448 1 Pipelining Instruction Level Parallelism (ILP) Limited form of ILP Overlapping instructions, these instructions can be evaluated in parallel (to some degree) Pipeline
More informationReduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction
ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationChapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism
More informationEECS 470 Final Exam Fall 2005
EECS 470 Final Exam Fall 2005 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Section 1 /30 Section 2 /30
More informationQuestion 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the
Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. he memory is word addressable he size of the cache is 8 blocks; each block is 4 words (32 words cache).
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationProcessor: Superscalars Dynamic Scheduling
Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),
More informationAdvanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012
Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code
More informationCS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19
More informationA Cache Hierarchy in a Computer System
A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationChapter 4 The Processor 1. Chapter 4D. The Processor
Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline
More informationCS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More informationCSE 490/590 Computer Architecture Homework 2
CSE 490/590 Computer Architecture Homework 2 1. Suppose that you have the following out-of-order datapath with 1-cycle ALU, 2-cycle Mem, 3-cycle Fadd, 5-cycle Fmul, no branch prediction, and in-order fetch
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More informationThe basic structure of a MIPS floating-point unit
Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from
More informationMidterm Exam 1 Wednesday, March 12, 2008
Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm
More informationCS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
More informationCS Mid-Term Examination - Fall Solutions. Section A.
CS 211 - Mid-Term Examination - Fall 2008. Solutions Section A. Ques.1: 10 points For each of the questions, underline or circle the most suitable answer(s). The performance of a pipeline processor is
More informationECE/CS 757: Homework 1
ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)
More informationCS/CoE 1541 Mid Term Exam (Fall 2018).
CS/CoE 1541 Mid Term Exam (Fall 2018). Name: Question 1: (6+3+3+4+4=20 points) For this question, refer to the following pipeline architecture. a) Consider the execution of the following code (5 instructions)
More informationLecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )
Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections 2.3-2.6) 1 Correlating Predictors Basic branch prediction: maintain a 2-bit saturating counter for each
More informationMemory Hierarchies 2009 DAT105
Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement
More informationInstruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI
More informationReferences EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)
EE457 Out of Order (OoO) Execution Introduction to Dynamic Scheduling of Instructions (The Tomasulo Algorithm) By Gandhi Puvvada References EE557 Textbook Prof Dubois EE557 Classnotes Prof Annavaram s
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Final Review Shuai Wang Department of Computer Science and Technology Nanjing University Computer Architecture Computer architecture, like other architecture, is the art
More informationCS232 Final Exam May 5, 2001
CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationCourse on Advanced Computer Architectures
Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationInstruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties
Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,
More informationEECS 470 Midterm Exam Winter 2015
EECS 470 Midterm Exam Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: Page # Points 2 /20 3 /15 4 /9 5
More informationPlease state clearly any assumptions you make in solving the following problems.
Computer Architecture Homework 3 2012-2013 Please state clearly any assumptions you make in solving the following problems. 1 Processors Write a short report on at least five processors from at least three
More informationCS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More information