University of Toronto Faculty of Applied Science and Engineering

Size: px
Start display at page:

Download "University of Toronto Faculty of Applied Science and Engineering"

Transcription

1 Print: First Name: SOLUTION Last Name: Student Number: University of Toronto Faculty of Applied Science and Engineering Midterm Examination November 3, 2014 ECE552F Computer Architecture Examiner Natalie Enright Jerger 1. There are 4 questions and 8 pages. Do all questions. The total number of marks is 48. The duration of the test is 50 minutes. 2. ALL WORK IS TO BE DONE ON THESE SHEETS! Use the back of the pages if you need more space. Be sure to indicate clearly if your work continues elsewhere. 3. Please put your final solution in the box if one is provided. 4. Clear and concise answers will be considered more favourably than ones that ramble. Do not fill space just because it exists! 5. You may use a single 8.5x11 aid sheet. 6. You may use faculty approved non-programmable calculators. 7. Always give some explanations or reasoning for how you arrived at your solutions to help the marker understand your thinking. 8. State your assumptions. Show your work. Use your time wisely as not all questions will require the same amount of time. If you think that assumptions must be made to answer a question, state them clearly. If there are multiple possibilities, comment that there are, explain why and then provide at least one possible answer and state the corresponding assumptions. 9. Only exams written in pen can be considered for remarking. Page 1 of 8

2 This page is for grading purposes only. The marks breakdown is given for each question. 1 [11] 2 [19] 3 [8] 4 [10] Total [48] Page 2 of 8

3 1. Pipelining [3 marks] (a) Branches represent 30% of dynamic instructions. Branches are statically predicted not-taken. 40% of branches are taken. Loads make up 30% of dynamic instructions. 65% of loads are followed immediately by a dependent ALU instruction in the dynamic instruction sequence. Consider a 4-stage pipeline where the Execute and Memory Access stages have been combined into 1 stage (called XM). The branch outcome is known at the end of the XM stage. Full forwarding exists in this pipeline. Calculate the CPI for this pipeline implementation. CP I = CPI = 1.24 Page 3 of 8

4 (b) This part of the question assumes the typical 5-stage in-order pipeline used in class (F, D, X, M, W). The table below gives the fraction of instructions that have a particular type of RAW data dependence. The type of RAW data dependence is identified by the stage that produces the result (X or M) and the instruction that consumes the result (1 st instruction that follows the one that produces the result, 2 nd instruction that follows, or both). Assume that the register write is done in the first half of the clock cycle and the register read is done in the second half of the cycle. X to 1 st M to 1 st X to 2 nd M to 2 nd X to 1 st Other RAW Only Only Only Only and M to 2 nd Dependences 5% 20% 5% 10% 10% 0% [8 marks] Let us assume that we cannot afford to have three-input muxes that are needed for full forwarding. We have to decide if it is better to implement MX forwarding or WX forwarding. Which of the two options results in fewer data stalls cycles? You must show your work to justify your answer. An answer without any justification will not receive full marks. WX forwarding is better. WX forwarding leads to 0.5 stall cycles while MX forwarding leads to 0.65 stall cycles. To receive full marks, answer should enumerate stall conditions and/or do calculations on stall cycles MX forwarding WX Forwarding Case 1: F D X M W F D X M W F D X M W F D d* X M W Case 2: F D X M W F D X M W F d* d* D X M W F D d* X M W Case 3: F D X M W F D X M W F D X M W F D X M W F d* D X M W F D X M W Case 4: F D X M W F D X M W F D X M W F D X M W F d* D X M W F D X M W Case 5: F D X M W F D X M W F D X M W F D d* X M W F d* D X M W F p* D X M W 0 * * * * * * * * * 0.1 =.35 1 * 0.1 = 0.65 Page 4 of 8

5 2. Dynamic Scheduling [16 marks] (a) Consider a MIPSR10K processor with the following execution latencies and one reservation station/functional unit for each type of instruction: 5 cycles for an add 12 cycles for a multiply 20 cycles for a divide Suppose the segment of the program with the four instructions is i1: DIV R3, R5 -> R2 i2: ADD R1, R4 -> R3 i3: MULT R2, R6 -> R4 i4: ADD R2, R3 -> R4 Considering the MIPSR10K implementation, if i1 is issued at cycle 0, in what cycle will each instruction complete and retire? The ROB size is infinite, as are the number of physical registers. Assumptions: Multiple instructions can issue in the same cycle (i3, i4). Instructions can issue in same cycle as dependent instruction broadcasts tag on CDB. Complete Retire i i i i [3 marks] (b) Explain how WAW hazards are avoided in Tomasulo s algorithm. WAW hazards are avoided through register map table. If the ID in the map table does not match the ID of the instruction writing the CDB, then that instruction does not update the register file. Answer must be more detailed than register renaming Page 5 of 8

6 [8 marks] 3. Consider two possible improvements to a processor design. The first improvement can speed up floating point arithmetic instructions by a factor of 8. The second improvement can speed up load and store instructions by a factor of 3. Let F fp and F ls be the fraction of execution time spent on floating point and load/store instructions respectively. The executions of these two sets of instructions are non-overlapping in time. What should the relation be between the fractions F fp and F ls such that a machine built with the first improvement outperforms a machine built with the second improvement. 1 1 F fp + F fp 8 f fp f ls > > 1 1 F ls + F ls 3 Page 6 of 8

7 4. Branch Prediction Consider the following code sequence. Assume that each instruction is encoded in one 32-bit word. Address Instruction 0x0038 L3:... 0x003C... 0x0040 SUBI R1, 2 -> R3 0x0044 BNEZ R3, L1 0x0048 ADD R0, R0 -> R1 0x004C L1: SUBI R2, 2 -> R3 0x0050 BNEZ R3, L2 0x0054 ADD R0, R0 -> R2 0x0058 L2: SUB R1, R2 -> R3 0x005c BEQZ R3, L3 [4 marks] (a) Show the contents of a 4-entry branch target buffer (BTB) after one execution of the code starting at PC = 0x0040. Assume the BTB is initially empty. Discard the lowest-order PC bits that never change and use the next set of bits to index into the BTB. Assume the initial register values are such that every branch is taken. Each entry can hold 16 bits of information. Entry # (index) 0 Contents 0x x004C 2 3 0x0038 Page 7 of 8

8 [6 marks] (b) Consider the case where we use a global branch direction predictor with a 3-bit global history register. Execution of the 13th iteration of the code on the previous page is about to start. Provide an example of i) the value of feasible 3-bit global branch history, and ii) the value of an infeasible global branch history. To receive marks, you must justify your answers. Simply writing some combination of N and T values without any explanation is not sufficient Other answers are possible provided correct/sufficient justification is given i. Feasible 3-bit global branch history Assume R1 initially has the value 2 and R2 initially has the value 2. The first two branches will be not taken and the third branch will be taken leading to a feasible global history of NNT ii. Infeasible global branch history From the previous answer, we can see that if both of the first two branches are not taken, the third branch can never be not taken so an infeasible history is: NNN. Page 8 of 8

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:Solution...................... Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..

More information

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012 Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code

More information

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)

More information

EE557--FALL 2001 MIDTERM 2. Open books

EE557--FALL 2001 MIDTERM 2. Open books NAME: SOLUTIONS STUDENT NUMBER: EE557--FALL 2001 MIDTERM 2 Open books Q1: /16 Q2: /12 Q3: /8 Q4: /8 Q5: /8 Q6: /8 TOTAL: /60 Grade: /25 1 QUESTION 1(Tomasulo with ROB) 16 points Consider the following

More information

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4 PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

CMSC411 Fall 2013 Midterm 2 Solutions

CMSC411 Fall 2013 Midterm 2 Solutions CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has

More information

Static vs. Dynamic Scheduling

Static vs. Dynamic Scheduling Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor

More information

Computer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution

Computer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution CSCE 6 (Fall 07) Computer Architecture Homework Set # COVER SHEET Please turn in with your own solution Eun Jung Kim Write your answers on the sheets provided. Submit with the COVER SHEET. If you need

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , ) Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections 3.4-3.5, 3.8-3.14) 1 1-Bit Prediction For each branch, keep track of what happened last time and use

More information

Good luck and have fun!

Good luck and have fun! Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

Dynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm:

Dynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm: LECTURE - 13 Dynamic Scheduling Better than static scheduling Scoreboarding: Used by the CDC 6600 Useful only within basic block WAW and WAR stalls Tomasulo algorithm: Used in IBM 360/91 for the FP unit

More information

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your

More information

EECS 470 Midterm Exam Answer Key Fall 2004

EECS 470 Midterm Exam Answer Key Fall 2004 EECS 470 Midterm Exam Answer Key Fall 2004 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Part I /23 Part

More information

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS Advanced Computer Architecture- 06CS81 Hardware Based Speculation Tomasulu algorithm and Reorder Buffer Tomasulu idea: 1. Have reservation stations where register renaming is possible 2. Results are directly

More information

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs. Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

CSE 490/590 Computer Architecture Homework 2

CSE 490/590 Computer Architecture Homework 2 CSE 490/590 Computer Architecture Homework 2 1. Suppose that you have the following out-of-order datapath with 1-cycle ALU, 2-cycle Mem, 3-cycle Fadd, 5-cycle Fmul, no branch prediction, and in-order fetch

More information

The basic structure of a MIPS floating-point unit

The basic structure of a MIPS floating-point unit Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from

More information

6.823 Computer System Architecture

6.823 Computer System Architecture 6.823 Computer System Architecture Problem Set #4 Spring 2002 Students are encouraged to collaborate in groups of up to 3 people. A group needs to hand in only one copy of the solution to a problem set.

More information

Tomasulo Loop Example

Tomasulo Loop Example Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop This time assume Multiply takes 4 clocks Assume 1st load takes 8 clocks, 2nd load takes 1 clock Clocks for SUBI,

More information

CS152 Computer Architecture and Engineering. Complex Pipelines

CS152 Computer Architecture and Engineering. Complex Pipelines CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used?

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used? 2.10 [20] < 2.2, 2.5> For each LEGv8 instruction in Exercise 2.9 (copied below), show the value of the opcode (Op), source register (Rn), and target register (Rd or Rt) fields. For the I-type instructions,

More information

Floating Point/Multicycle Pipelining in DLX

Floating Point/Multicycle Pipelining in DLX Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations in one or two cycles is impractical since it requires: A much longer CPU clock cycle, and/or

More information

Metodologie di Progettazione Hardware-Software

Metodologie di Progettazione Hardware-Software Metodologie di Progettazione Hardware-Software Advanced Pipelining and Instruction-Level Paralelism Metodologie di Progettazione Hardware/Software LS Ing. Informatica 1 ILP Instruction-level Parallelism

More information

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo

More information

CS 2410 Mid term (fall 2018)

CS 2410 Mid term (fall 2018) CS 2410 Mid term (fall 2018) Name: Question 1 (6+6+3=15 points): Consider two machines, the first being a 5-stage operating at 1ns clock and the second is a 12-stage operating at 0.7ns clock. Due to data

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

EECS 470 Midterm Exam Winter 2008 answers

EECS 470 Midterm Exam Winter 2008 answers EECS 470 Midterm Exam Winter 2008 answers Name: KEY unique name: KEY Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: #Page Points 2 /10

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2018 Static Instruction Scheduling 1 Techniques to reduce stalls CPI = Ideal CPI + Structural stalls per instruction + RAW stalls per instruction + WAR stalls per

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

CS Mid-Term Examination - Fall Solutions. Section A.

CS Mid-Term Examination - Fall Solutions. Section A. CS 211 - Mid-Term Examination - Fall 2008. Solutions Section A. Ques.1: 10 points For each of the questions, underline or circle the most suitable answer(s). The performance of a pipeline processor is

More information

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3 CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #: 16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam October 16, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours CS433 Final Exam Prof Josep Torrellas December 12, 2006 Time: 2 hours Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 6 Questions. Please budget your time. 3. Calculators

More information

Alexandria University

Alexandria University Alexandria University Faculty of Engineering Computer and Communications Department CC322: CC423: Advanced Computer Architecture Sheet 3: Instruction- Level Parallelism and Its Exploitation 1. What would

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Instruction Level Parallelism Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson /

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

DAT105: Computer Architecture Study Period 2, 2009 Exercise 3 Chapter 2: Instruction-Level Parallelism and Its Exploitation

DAT105: Computer Architecture Study Period 2, 2009 Exercise 3 Chapter 2: Instruction-Level Parallelism and Its Exploitation Study Period 2, 2009 Exercise 3 Chapter 2: Instruction-Level Parallelism and Its Exploitation Mafijul Islam Department of Computer Science and Engineering November 19, 2009 Study Period 2, 2009 Goals:

More information

/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution

/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution 16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam Solution 1. (8 points) UEvaluating instructions Assume the following initial state prior to executing the instructions below. Note

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers

More information

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop

More information

Final Exam Fall 2007

Final Exam Fall 2007 ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd

More information

Advanced Computer Architecture. Chapter 4: More sophisticated CPU architectures

Advanced Computer Architecture. Chapter 4: More sophisticated CPU architectures Advanced Computer Architecture Chapter 4: More sophisticated CPU architectures Lecturer: Paul H J Kelly Autumn 2001 Department of Computing Imperial College Room 423 email: phjk@doc.ic.ac.uk Course web

More information

RECAP. B649 Parallel Architectures and Programming

RECAP. B649 Parallel Architectures and Programming RECAP B649 Parallel Architectures and Programming RECAP 2 Recap ILP Exploiting ILP Dynamic scheduling Thread-level Parallelism Memory Hierarchy Other topics through student presentations Virtual Machines

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

ECSE 425 Lecture 15: More Dynamic Scheduling

ECSE 425 Lecture 15: More Dynamic Scheduling ECSE 425 Lecture 15: More Dynamic Scheduling H&P Chapter 2 2011 PaBerson, Gross, Hayward, Arbel, Vu, Meyer Textbook figures 2007 Elsevier Science AdministraPve Notes Midterm 1 50 minutes, in class, October

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

Course on Advanced Computer Architectures

Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1

More information

EECC551 Exam Review 4 questions out of 6 questions

EECC551 Exam Review 4 questions out of 6 questions EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving

More information

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections ) Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6) 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3

More information

Updated Exercises by Diana Franklin

Updated Exercises by Diana Franklin C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences

More information

Computer Architecture EE 4720 Final Examination

Computer Architecture EE 4720 Final Examination Name Computer Architecture EE 4720 Final Examination 1 May 2017, 10:00 12:00 CDT Alias Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Exam Total (20 pts) (15 pts) (20 pts) (15 pts) (30 pts) (100 pts)

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Computer System Architecture Final Examination Spring 2002

Computer System Architecture Final Examination Spring 2002 Computer System Architecture 6.823 Final Examination Spring 2002 Name: This is an open book, open notes exam. 180 Minutes 22 Pages Notes: Not all questions are of equal difficulty, so look over the entire

More information

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted fact sheet, with a restriction: 1) one 8.5x11 sheet, both sides, handwritten

More information

ILP: Instruction Level Parallelism

ILP: Instruction Level Parallelism ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal

More information

CS / ECE 6810 Midterm Exam - Oct 21st 2008

CS / ECE 6810 Midterm Exam - Oct 21st 2008 Name and ID: CS / ECE 6810 Midterm Exam - Oct 21st 2008 Notes: This is an open notes and open book exam. If necessary, make reasonable assumptions and clearly state them. The only clarifications you may

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

HY425 Lecture 05: Branch Prediction

HY425 Lecture 05: Branch Prediction HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Solutions to exercises on Instruction Level Parallelism

Solutions to exercises on Instruction Level Parallelism Solutions to exercises on Instruction Level Parallelism J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering

More information

EECS 470 Midterm Exam

EECS 470 Midterm Exam EECS 470 Midterm Exam Winter 2014 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: NOTES: # Points Page 2 /12 Page 3

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Branch prediction ( 3.3) Dynamic Branch Prediction

Branch prediction ( 3.3) Dynamic Branch Prediction prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to

More information

COSC4201 Instruction Level Parallelism Dynamic Scheduling

COSC4201 Instruction Level Parallelism Dynamic Scheduling COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism

More information

Scoreboard information (3 tables) Four stages of scoreboard control

Scoreboard information (3 tables) Four stages of scoreboard control Scoreboard information (3 tables) Instruction : issued, read operands and started execution (dispatched), completed execution or wrote result, Functional unit (assuming non-pipelined units) busy/not busy

More information