EECS 470 Midterm Exam Winter 2009
|
|
- Brenda Bailey
- 6 years ago
- Views:
Transcription
1 EECS 70 Midterm Exam Winter 2009 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 18 2 / 12 3 / 29 / 21 5 / 20 otal / 100 Bonus / 3 NOES: Closed book/notes Calculators are allowed, but no PDAs, Portables, Cell phones, etc. Don t spend too much time on any one problem. You have about 90 minutes for the exam (avg. 18 minutes per problem). here are 9 pages including this one. Please ensure you have all pages. Be sure to show work and explain what you ve done when asked to do so. 1/9
2 1) Short answer [18 points] a) Does pipelining improve latency or throughput? [3 points] throughput b) Give 2 reasons why we don t build processors with massive register files (e.g., tens of thousands of registers). [6 points] Requires too many bits in opcode Register file accesses would be extremely slow c) Give an example of a microprocessor component that exploits locality. [3 points] Cache d) Most instruction sets include both PC-relative branches and branch-to-register instructions. i) State an advantage of PC-relative branches and give an example of a software construct where they are used. [3 points] PC-relative takes fewer bits in opcode; allows relocatable code. Used for loops. ii) State an advantage of branch-to-register instructions and give an example of a software construct where they are used. [3 points] Allows branching arbitrary distances; allows dynamic targets, look up tables for targets. Used for function calls, virtual functions, switch statements, function pointers. 2/9
3 2) Performance Analysis [12 points] a) Suppose 80% of a program is parallelizable (performance scales linearly with the number of cores), while the other 20% is serial (must run on one core). What is the speedup over a uniprocessor when running the program on a quad-core machine? [6 points] 1/ ( ( ) + ( 0.8 / ) ) = 2.5 b) Suppose you run two applications one after the other on a Core 2 Duo. he two applications contain the same number of instructions. he first application runs at instructions per cycle (IPC), while the IPC of the second application is 2. What is the overall average IPC? [6 points] 2 / ( ¼ + ½ ) = /9
4 3) Reorder Buffers in the P6 microarchitecture [29 points] c) Briefly explain the purpose of a reorder buffer. [3 points] Enables precise state for speculation/exceptions d) What effect does a reorder buffer have on performance? [3 points] It reduces performance due to new structural hazard e) Draw a diagram showing the contents of a single re-order buffer () entry for a P6- like microarchitecture (i.e., one having an architectural register file). Identify all the fields stored within the entry and label the width (in bits) of each. Don't forget to include any "instruction status" bits used by any pipeline stages. Assume a 32-bit machine with 32 architectural registers, a 6-entry (you only need to draw one entry) and 16 reservation stations. [5 points] Value 32 (1) PC and/or calculated target 32 (1) Dest Reg Name 5 (1) Executed 1 (1) Exception/Mispredict 1 (1) (optional: opcode) /9
5 Inputs f) Finish the drawing below to indicate the input and output ports of the module from part (c) for a 2-wide superscalar machine. For each port, label its width and indicate during which pipeline stage the port is used (assume the P6 pipeline stages discussed in class: Fetch, Decode, issue, execute, Complete, Retire). Assume that the head and tail pointers are maintained within the module, and that the does not support early branch resolution. (Note: this problem is significantly harder than it looks at first glance; think carefully about all the signals required to get instructions in and out of the. I suggest doing this problem last.) [18 points] Inputs Outputs Dispatch Enable (2) - Dispatch (0.5) Dest Register x 2 (5) - Dispatch (2) PC x 2 - Dispatch (0.5) (optional: opcode - Dispatch) (optional: retire enable x2) (optional: squash) CDB Value x2 (32)- Complete (2) CDB ag x2 (6) - Complete (1) CDB Write Enable x2 (0.5) CDB Exception/Mispredict x2-complete(0.5) Source Operand ag x (6) - Dispatch (1) (Optional: Clock) Outputs: Full - Dispatch (1) Almost Full - Dispatch (0.5) Source Op. Value x (32) - Dispatch (2) Correct PC (32) - Retire (0.5) Retirement Value x2 (32) - Retire (2) Retirement Register x2 (5)- Retire (2) Head complete bits x2 - Retire (0.5) Head Mispredict/Exception x2 - Retire(0.5) ail pointer/ next tag Dispatch (1) (Optional: head pointer) Bonus) In 1-2 sentences, explain how a history buffer differs from a reorder buffer. [+3 bonus] 5/9
6 ) Handling RAW Memory Dependences. [20 points] a) Consider the following sequence of load and store instructions (the first operand contains the address for the load or store, the second is the source/destination register for the value): (1) store [r1], r16 (2) store [r2], r17 (3) load [r], r18 () store [r5], r19 (5) load [r6], r20 i) Explain the necessary/sufficient conditions to execute instruction #5 nonspeculatively. [3 points] Stores (1),(2),() have calculated their addresses; Load (5) s address can be calculated (its input registers are ready). ii) Suppose we want to issue load (5) earlier, speculatively. What are the conditions to issue the load to the memory system? [3 points] It s address has been calculated. iii) What, precisely, are we speculating? (I.e., what is the hardware guessing about the values of the registers accessed by the loads and stores?). [3 points] hat r6 is different from r5,r2,r1 6/9
7 iv) Describe a sequence of events where load #5 is issued speculatively, but the speculation fails (i.e., the conditions you specify in your answer to (iii) turn out to be false). [3 points] r5 resolves, load issues, then r2 resolves to same value. v) What does the processor core have to do to fix the mis-speculation? [3 points] Squash and re-execute load 5 and all subsequent instructions b) A Memory Dependence Predictor is a piece of hardware that tries to reduce the frequency of mis-speculation events like the scenario you described in your answer to (a.iv). You can think of the predictor as a black box that takes in some information about a load instruction and tries to guess if the load should execute speculatively or not. Internally, the predictor stores some state about what it has observed in the past. Propose a highlevel design for a memory dependence predictor. In particular, describe what the inputs to your predictor black box and what state it contains. Briefly argue why your design will provide high prediction accuracy and require reasonable resources. [6 points] Save the PCs of load instructions that receive their value via forwarding in the table. Predict that the load should not execute speculatively if it has an entry in the table. 7/9
8 5) MIPS R10K Microarchitecture. [20 points] On the next pages, you will find a set of charts showing a snapshot of a MIPS R10K-like microarchitecture after one cycle executing a sequence of instructions. You must advance this machine 5 additional clock cycles (to the end of cycle #6). Use the cycle-by-cycle state tables to record the contents of each hardware structure at the end of each clock cycle. Assume the following: Assume the machine is a 2-wide superscalar (i.e., it can issue, complete, and retire at most 2 instructions per cycle). If there are conflicts among instructions, the machine always selects the oldest instructions first. Ignore the fetch stage. Assume all instructions have been fetched and are ready for dispatch whenever the out-of-order core allows. his machine has architectural registers, a 5-entry, reservation stations, and 9 physical registers. here are 2 add functional units with a 1-cycle latency, and 1 fully-pipelined multiply functional unit with a 2-cycle latency (fully-pipelined means the multiply unit has 2 pipeline stages; it can issue a new multiply each cycle, however, multiplies take 2 cycles to execute). Assume there is no bypassing in the X stage, but C will bypass to S through the physical register file. Assume reservation stations are freed as early as possible and can be reused as soon as they are freed. Note that there are 6 instructions, but the cycle-by-cycle tables only have space for the 5 entries. Be sure to wrap back to the top of the if you dispatch the 6 th instruction. Here is the instruction sequence: (1) R3 = R1 * R2 (2) R = R * 10 (3) R1 = R3 + R2 () R2 = R2 + 5 (5) R3 = R + R2 (6) R1 = R1 + R3 Pay attention to the cycle number on each chart be sure you fill them out in correct order! If you make a mistake and need additional blank copies of the fill-in sheet, ask the exam proctor. Make sure the old sheets are torn up and the new ones are stapled to your exam!!! 8/9
9 SOLUION R10K Cycle # 1 ht # Insn old S X C t 1 R3=R1*R2 p5 p3 h 2 R=R*10 p6 p 3 5 Map able Reg + r1 p1+ r2 p2+ r3 p5 p7,p8,p9 R10K Cycle # ht # Insn old S X C t 1 R3=R1*R2 p5 p R=R*10 p6 p 3-3 R1=R3+R2 p7 p1 R2=R2+5 p8 p2 3 h 5 R3=R+R2 p9 p5 Map able Reg + r1 p7 r2 p8 r3 p9 # op R3=R1*R2 p5 p1+ p2+ 2 R=R*10 p6 p+ 3 # op R3=R+R2 p9 p6 p8 2 3 R1=R3+R2 p7 p5 p2+ R10K Cycle # 2 ht # Insn old S X C t 1 R3=R1*R2 p5 p3 2 2 R=R*10 p6 p 3 R1=R3+R2 p7 p1 h R2=R2+5 p8 p2 5 Map able Reg + r1 p7 r2 p8 r3 p5 p9 R10K Cycle # 5 ht # Insn old S X C t 1 R3=R1*R2 p5 p R=R*10 p6 p R1=R3+R2 p7 p1 5 R2=R2+5 p8 p2 3 5 h 5 R3=R+R2 p9 p5 Map able Reg + r1 p7 r2 p8+ r3 p9 p5 p8 # op R3=R1*R2 p5 p1+ p2+ 2 R=R*10 p6 p+ 3 R1=R3+R2 p7 p5 p2+ R2=R2+5 p8 p2+ # op R3=R+R2 p9 p6 p R1=R3+R2 p7 p5+ p2+ R10K Cycle # 3 ht # Insn old S X C t 1 R3=R1*R2 p5 p R=R*10 p6 p 3 3 R1=R3+R2 p7 p1 R2=R2+5 p8 p2 3 h 5 R3=R+R2 p9 p5 Map able Reg + r1 p7 r2 p8 r3 p9 R10K Cycle # 6 ht # Insn old S X C h 1 R1=R1+R3 p3 p7 t 2 R=R*10 p6 p R1=R3+R2 p7 p1 5 6 R2=R2+5 p8 p R3=R+R2 p9 p5 6 Map able Reg + r1 p3 r2 p8+ r3 p9 + p6 # op R3=R+R2 p9 p6 p8 2 R=R*10 p6 p+ 3 R1=R3+R2 p7 p5 p2+ R2=R2+5 p8 p2+ # op R3=R+R2 p9 p6+ p8+ 2 R1=R1+R3 p3 p3 p9 3 9/9
EECS 470 Midterm Exam Winter 2008 answers
EECS 470 Midterm Exam Winter 2008 answers Name: KEY unique name: KEY Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: #Page Points 2 /10
More informationEECS 470 Midterm Exam
EECS 470 Midterm Exam Winter 2014 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: NOTES: # Points Page 2 /12 Page 3
More informationEECS 470 Midterm Exam Answer Key Fall 2004
EECS 470 Midterm Exam Answer Key Fall 2004 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Part I /23 Part
More informationEECS 470 Midterm Exam Winter 2015
EECS 470 Midterm Exam Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: Page # Points 2 /20 3 /15 4 /9 5
More informationEECS 470 Midterm Exam
EECS 470 Midterm Exam Fall 2009 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: NOTES: # Points Page 2 /18 Page 3 /15
More informationEECS 470 Midterm Exam Fall 2014
EECS 470 Midterm Exam Fall 2014 Name: uniqname: Rewrite and sign the honor code below: I have neither given nor received aid on this exam nor observed anyone else doing so. Signature: Scores: Page # Points
More informationEECS 470 Midterm Exam Fall 2006
EECS 40 Midterm Exam Fall 2 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Page 2 /18 Page 3 /13 Page 4 /15
More informationEECS 470 Lecture 7. Branches: Address prediction and recovery (And interrupt recovery too.)
EECS 470 Lecture 7 Branches: Address prediction and recovery (And interrupt recovery too.) Warning: Crazy times coming Project handout and group formation today Help me to end class 12 minutes early P3
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More informationEECS 470 Final Exam Fall 2013
EECS 470 Final Exam Fall 2013 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: Page# Points 2 /21 3 /8 4 /12 5 /10 6
More informationEECS 470 Final Exam Fall 2005
EECS 470 Final Exam Fall 2005 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Section 1 /30 Section 2 /30
More informationEECS 470 Final Exam Fall 2015
EECS 470 Final Exam Fall 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: Page # Points 2 /17 3 /11 4 /13 5 /10
More informationHANDLING MEMORY OPS. Dynamically Scheduling Memory Ops. Loads and Stores. Loads and Stores. Loads and Stores. Memory Forwarding
HANDLING MEMORY OPS 9 Dynamically Scheduling Memory Ops Compilers must schedule memory ops conservatively Options for hardware: Hold loads until all prior stores execute (conservative) Execute loads as
More informationEECS 570 Final Exam - SOLUTIONS Winter 2015
EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32
More informationMultiple Instruction Issue and Hardware Based Speculation
Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we
More informationCS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25
CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem
More informationEECS 470 Final Exam Winter 2012
EECS 470 Final Exam Winter 2012 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Page 2 /11 Page 3 /13 Page
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More informationChapter. Out of order Execution
Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until
More informationCS433 Homework 2 (Chapter 3)
CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationCS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I
More informationLecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue 1 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction
More informationCS433 Homework 2 (Chapter 3)
CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationE0-243: Computer Architecture
E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation
More informationb) Register renaming c) CDB, register file, and ROB d) 0,1,X (output of a gate is never Z)
1) a) Issuing stores to memory and (maybe) writing results back to register file (depends if we have a distinct physical register file). Instruction dispatch is usually done in program order, but can be
More informationCS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your
More informationCS152 Computer Architecture and Engineering. Complex Pipelines
CS152 Computer Architecture and Engineering Complex Pipelines Assigned March 6 Problem Set #3 Due March 20 http://inst.eecs.berkeley.edu/~cs152/sp12 The problem sets are intended to help you learn the
More informationComputer Architecture, Fall 2010 Midterm Exam I
15740-18740 Computer Architecture, Fall 2010 Midterm Exam I Instructor: Onur Mutlu Teaching Assistants: Evangelos Vlachos, Lavanya Subramanian, Vivek Seshadri Date: October 11, 2010 Instructions: Name:
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationGood luck and have fun!
Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.
More informationHardware-based Speculation
Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences
More informationLoad1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1
Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]
More informationCPSC 313, 04w Term 2 Midterm Exam 2 Solutions
1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a
More informationPhoto David Wright STEVEN R. BAGLEY PIPELINES AND ILP
Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationece4750-t11-ooo-execution-notes.txt ========================================================================== ece4750-l12-ooo-execution-notes.txt ==========================================================================
More informationRecitation #6 Arch Lab (Y86-64 & O3 recap) October 3rd, 2017
18-600 Recitation #6 Arch Lab (Y86-64 & O3 recap) October 3rd, 2017 Arch Lab Intro Last week: Part A: write and simulate Y86-64 programs his week: Part B: optimize a Y86 program Recap of O3 Intuition on
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationUG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects
Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer
More informationEXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu
Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points
More information1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11
The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 ANSWER KEY November 23 rd, 2010 Name: University of Michigan uniqname: (NOT your student ID
More informationArchitectures for Instruction-Level Parallelism
Low Power VLSI System Design Lecture : Low Power Microprocessor Design Prof. R. Iris Bahar October 0, 07 The HW/SW Interface Seminar Series Jointly sponsored by Engineering and Computer Science Hardware-Software
More informationCS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines
CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per
More informationPipelining and Vector Processing
Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationReorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)
Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationThe Problem with P6. Problem for high performance implementations
CDB. CDB.V he Problem with P6 Martin, Roth, Shen, Smith, Sohi, yson, Vijaykumar, Wenisch Map able + Regfile value R value Head Retire Dispatch op RS 1 2 V1 FU V2 ail Dispatch Problem for high performance
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationEECS 470 Final Project Report
EECS 470 Final Project Report Group No: 11 (Team: Minion) Animesh Jain, Akanksha Jain, Ryan Mammina, Jasjit Singh, Zhuoran Fan Department of Computer Science and Engineering University of Michigan, Ann
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More information6.823 Computer System Architecture
6.823 Computer System Architecture Problem Set #4 Spring 2002 Students are encouraged to collaborate in groups of up to 3 people. A group needs to hand in only one copy of the solution to a problem set.
More information15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 8: Issues in Out-of-order Execution Prof. Onur Mutlu Carnegie Mellon University Readings General introduction and basic concepts Smith and Sohi, The Microarchitecture
More informationChapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed
More informationAs the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.
Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction
More informationCS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12
Assigned 2/28/2018 CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 http://inst.eecs.berkeley.edu/~cs152/sp18
More informationComputer Architecture EE 4720 Final Examination
Name Computer Architecture EE 4720 Final Examination 7 May 2008, 10:00 12:00 CDT Alias Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Exam Total (10 pts) (30 pts) (20 pts) (15 pts) (15 pts)
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationFor this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units
CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)
More informationComputer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures
More informationHandout 2 ILP: Part B
Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP
More informationCMSC411 Fall 2013 Midterm 2 Solutions
CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has
More informationNAME: Problem Points Score. 7 (bonus) 15. Total
Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 NAME: Problem Points Score 1 40
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationBranch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines
6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III
More informationEE 660: Computer Architecture Superscalar Techniques
EE 660: Computer Architecture Superscalar Techniques Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda Speculation and Branches
More informationUniversity of Toronto Faculty of Applied Science and Engineering
Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More informationCS Mid-Term Examination - Fall Solutions. Section A.
CS 211 - Mid-Term Examination - Fall 2008. Solutions Section A. Ques.1: 10 points For each of the questions, underline or circle the most suitable answer(s). The performance of a pipeline processor is
More informationOutline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches
Session xploiting ILP with SW Approaches lectrical and Computer ngineering University of Alabama in Huntsville Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar,
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More informationEXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM
EXAM #1 CS 2410 Graduate Computer Architecture Spring 2016, MW 11:00 AM 12:15 PM Directions: This exam is closed book. Put all materials under your desk, including cell phones, smart phones, smart watches,
More informationComputer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović
Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are
More informationCISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions
CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy
More informationLecture 19: Instruction Level Parallelism
Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register
More informationEECC551 Exam Review 4 questions out of 6 questions
EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving
More informationChapter 4 The Processor 1. Chapter 4D. The Processor
Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationThis Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods
10 1 Dynamic Scheduling 10 1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods Not yet complete. (Material below may repeat
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationMidterm Exam 1 Wednesday, March 12, 2008
Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm
More informationProcessor: Superscalars Dynamic Scheduling
Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),
More informationEITF20: Computer Architecture Part3.2.1: Pipeline - 3
EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done
More informationHardware Speculation Support
Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification
More informationCS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More informationECE 571 Advanced Microprocessor-Based Design Lecture 4
ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationWide Instruction Fetch
Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table
More informationUniversity of Toronto Faculty of Applied Science and Engineering
Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science
More informationSRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design
SRAMs to Memory Low Power VLSI System Design Lecture 0: Low Power Memory Design Prof. R. Iris Bahar October, 07 Last lecture focused on the SRAM cell and the D or D memory architecture built from these
More information