CMSC22200 Computer Architecture Lecture 8: Out-of-Order Execution. Prof. Yanjing Li University of Chicago
|
|
- Brittney Neal
- 6 years ago
- Views:
Transcription
1 CMSC22200 Computer Architecture Lecture 8: Out-of-Order Execution Prof. Yanjing Li University of Chicago
2 Administrative Stuff! Lab2 due tomorrow " 2 free late days! Lab3 is out " Start early!! My office hours today moved to tomorrow " Announcement on Piazza 2
3 Lecture Outline! Review: branch prediction! Out-of-order (OOO) execution " Motivation " How it works " Discussion 3
4 Review: Gshare Branch Predictor Which direction earlier branches went Direction predictor (e.g., 2-bit counters) taken? Global branch history Program Counter XOR hit? PC + 4 Next Fetch Address Address of the current instruction target address BTB: Branch Target Buffer 4
5 Two Levels of Gshare! First level: Global branch history register (N bits) xor PC! Second level: 2-bit counters for each history entry " The direction the branch took the last time the same history was seen Pattern History Table (PHT) GHR (global history register) xor PC index
6 Branch Prediction Using a 2-bit Counter actually taken strongly taken actually taken weakly!taken pred taken pred!taken actually!taken actually taken actually!taken actually taken pred taken pred!taken weakly taken actually!taken strongly!taken actually!taken Change predic3on a5er 2 consecu3ve mistakes 6
7 2-bit Counter: Another Scheme actually taken strongly taken weakly!taken pred taken actually!taken pred!taken actually!taken actually taken actually!taken actually taken pred taken actually taken pred!taken weakly taken strongly!taken actually!taken 7
8 Review: Dependency Handling in the Pipeline! Software vs. hardware " Software based instruction scheduling # static scheduling " Hardware based instruction scheduling # dynamic scheduling! What information does the compiler not know that makes static scheduling difficult? " Answer: Anything that is determined at run time! Variable-length operation latency, memory addr, branch direction 8
9 Example: Load-Use Dependency! Consider this sequence! Requires 1 stall LDUR X2, [X1,#20] AND X4,X2,X5 OR X8,X3,X6! Static scheduling to re-order instructions! No need to stall LDUR X2, [X1,#20] OR X8,X3,X6 AND X4,X2,X5 What if load sometimes take 100 cycles to execute? 9
10 Another Example: Instructions w/ Variable Latencies F D E Integer add E E E E Integer mul E E E E E E E E FP mul R W E E E E E E E E... Cache miss 10
11 Dependency Handling! Consider the following two pieces of code IMUL R3 $ R1, R2 ADD R3 $ R3, R1 ADD R1 $ R6, R7 IMUL R5 $ R6, R8 ADD R7 $ R3, R5 LD R3 $ R1 (0) ADD R3 $ R3, R1 ADD R1 $ R6, R7 IMUL R5 $ R6, R8 ADD R7 $ R3, R5! In both cases, first ADD stalls the whole pipeline! " ADD cannot dispatch because its source registers unavailable " Later independent instructions cannot get executed! IMUL and LD can take a really long time " Latency of LD is unknown until runtime (cache hit vs. miss) 11
12 How to Do Better?! Hardware has knowledge of dynamic events on a perinstruction basis (i.e., at a very fine granularity) " Cache misses " Branch mispredictions " Load/store addresses! Wouldn t it be nice if hardware did the scheduling of instructions?! Hardware-based dynamic instruction scheduling, enabling OOO execution " Tradeoffs vs. static scheduling? 12
13 Benefits of OOO! In order F D E E E E M W F D STALL E M W F! Out-of-order F D E E E E M W F D F D STALL WAIT E M! 15 vs. 12 cycles D E M W F D E E E E M W E F D STALL E M W M W W F D E E E E M W F D WAIT E M W IMUL R3 $ R1, R2 ADD R3 $ R3, R1 ADD R1 $ R6, R7 IMUL R5 $ R6, R8 ADD R7 $ R3, R5 Assume: IMUL: 4 Ex cycles ADD: 1 Ex cycle 13
14 Out-of-Order Execution
15 Out-of-Order Execution! Idea " Move the dependent instructions out of the way of independent ones (s.t. independent ones can execute)! Approach " Monitor the source values of each instruction " When all source values of an instruction are available, fire (i.e. dispatch) the instruction " Retire each instruction in program order! Benefit " Latency tolerance: Allows independent instructions to execute and complete in the presence of a long latency operation 15
16 Illustration of an OOO Pipeline TAG and VALUE Broadcast Bus F D S C H E D U L E E Integer add E E E E Integer mul E E E E E E E E FP mul E E E E E E E E... Load/store in order out of order in order R E O R D E R W! Two humps " Hump 1: reservation stations (scheduling window) " Hump 2: reorder buffer (instruction window or active window) 16
17 Dynamic Scheduling: Tomasulo s Algorithm! Invented by Robert Tomasulo " Used in IBM 360/91 Floating Point Units " Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal of R&D, Jan ! Variants are used in many high-performance processors 17
18 Key Ideas of Tomasulo s Algorithm 1. Register renaming " Track true dependencies by linking the consumer of a value to the producer 2. Buffer instructions in reservation stations until they are ready to execute " Keep track of readiness of source values " Instruction wakes up and dispatch to the appropriate functional unit (FU) if all sources are ready! If multiple instructions are awake, need to select one per FU 18
19 Register Renaming! Output and anti dependencies are not true dependencies " WHY? " They exist because not enough register ID s (i.e. names) in the ISA! The register ID is renamed to the reservation station (RS) entry that will hold the register s value " Register ID # RS entry ID " Architectural register ID # Physical register ID " After renaming, RS entry ID used to refer to the register! This eliminates anti- and output- dependencies " As if there are a large number of registers even though ISA can only support a small number 19
20 Registe Renaming Using RAT! RAT: Register Alias Table (aka Register Rename Table) X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 tag value valid? Don t care 0 1 Don t care 1 1 RS entry 7 Don t care 0 Don t care 3 1 RS entry 3 Don t care 0 RS entry 13 Don t care 0 Don t care 6 1 Don t care 7 1 RS entry 4 Don t care 0 Don t care
21 Better Register Renaming Techniques Rename through ROB Rename through merged RF Hinton et al., The Microarchitecture of the Pentium 4 Processor, Intel Technology Journal,
22 Tomasulo s Machine: IBM 360/91 from memory from instruction unit FP registers load buffers store buffers operation bus FP FU FP FU reservation stations to memory Common data bus 22
23 Tomasulo s Algorithm! If reservation station not available, stall; else Instruction + renamed operands (source value/tag) inserted into the reservation station! While in reservation station, each instruction: " Watches common data bus (CDB) for tag of its sources " When tag seen, grab value for the source and keep it in the reservation station " When both operands available, instruction ready to be dispatched! Dispatch instruction to the Functional Unit (FU) when instruction is ready " If multiple instructions ready at the same time and require the same FU, need logic to select one! After instruction finishes in the FU " Arbitrate for CDB " Put tagged value onto CDB (tag broadcast) " Register file, RS, and RAT connected to the CDB! Register contains a tag indicating the latest writer to the register! If the tag in the register file, RS, and RAT matches the broadcast tag, write broadcast value into register (and set valid bit) " Reclaim rename tag (i.e., free the corresponding RS entry) 23
24 An Exercise MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 from memory! Assume ADD (4-cycle execute), MUL (6-cycle execute)! Assume one adder and one multiplier in HW! Assume operations are done entirely using registers " No memory access from instruction unit F D E W FP registers load buffers store buffers operation bus FP FU FP FU reservation stations to memory Common data bus 24
25 Drawing Template MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 v? tag val X1 X2 X3 a r X4 b t X5 c s X6 d v X7 X8 X9 X10 ADD MUL X11 25
26 Cycle 1 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 3 a r X4 1 * 4 b t X5 1 * 5 c s X6 1 * 6 d v X7 1 * 7 X8 1 * 8 X9 1 * 9 X10 1 * 10 ADD MUL X11 1 * 11 26
27 Cycle 2 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D F v? tag val X1 1 * 1 X2 1 * 2 X3 0 r * a r 1 * 1 1 * 2 X4 1 * 4 b t X5 1 * 5 c s X6 1 * 6 d v X7 1 * 7 X8 1 * 8 X9 1 * 9 X10 1 * 10 ADD MUL X11 1 * 11 27
28 Cycle 3 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E F D F MUL (in RS entry r) starts to execute since both operands are valid v? tag val X1 1 * 1 X2 1 * 2 X3 0 r * a 0 r * 1 * 4 r 1 * 1 1 * 2 X4 1 * 4 b t X5 0 a * c s X6 1 * 6 d v X7 1 * 7 X8 1 * 8 X9 1 * 9 X10 1 * 10 ADD MUL X11 1 * 11 28
29 Cycle 4 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E F D -- F D F ADD (in RS entry a) waits since is not valid v? tag val X1 1 * 1 X2 1 * 2 X3 0 r * a 0 r * 1 * 4 r 1 * 1 1 * 2 X4 1 * 4 b 1 * 2 1 * 6 t X5 0 a * c s X6 1 * 6 d v X7 0 b * X8 1 * 8 X9 1 * 9 X10 1 * 10 ADD MUL X11 1 * 11 29
30 Cycle 5 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E F D F D E F D F ADD (in RS entry b) starts to execute v? tag val X1 1 * 1 X2 1 * 2 X3 0 r * a 0 r * 1 * 4 r 1 * 1 1 * 2 X4 1 * 4 b 1 * 2 1 * 6 t X5 0 a * c 1 * 8 1 * 9 s X6 1 * 6 d v X7 0 b * X8 1 * 8 X9 1 * 9 X10 0 c * ADD MUL X11 1 * 11 30
31 Cycle 6 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E F D F D E E F D E F D F ADD (in RS entry c) starts to execute v? tag val X1 1 * 1 X2 1 * 2 X3 0 r * a 0 r * 1 * 4 r 1 * 1 1 * 2 X4 1 * 4 b 1 * 2 1 * 6 t 0 b * 0 c * X5 0 a * c 1 * 8 1 * 9 s X6 1 * 6 d v X7 0 b * X8 1 * 8 X9 1 * 9 X10 0 c * ADD MUL X11 0 t * 31
32 Cycle 7 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E F D F D E E E F D E E F D -- F D MUL (in RS entry t) waits Pay attention to register renaming removing WAW v? tag val X1 1 * 1 X2 1 * 2 X3 0 r * a 0 r * 1 * 4 r 1 * 1 1 * 2 X4 1 * 4 b 1 * 2 1 * 6 t 0 b * 0 c * X5 0 d * c 1 * 8 1 * 9 s X6 1 * 6 d 0 a * 0 t * v X7 0 b * X8 1 * 8 X9 1 * 9 X10 0 c * ADD MUL X11 0 t * 32
33 Cycle 8 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E F D F D E E E E F D E E E F D F D -- Broadcast results through CDB to wake up dependent instructions (check both RAT and RS) v? tag val X1 1 * 1 X2 1 * 2 X3 0 r * a 0 r * 1 * 4 r 1 * 1 1 * 2 X4 1 * 4 b 1 * 2 1 * 6 t 0 b * 0 c * X5 0 d * c 1 * 8 1 * 9 s X6 1 * 6 d 0 a * 0 t * v X7 0 b * X8 1 * 8 X9 1 * 9 X10 0 c * ADD MUL X11 0 t * 33
34 Cycle 9 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E F D E E E E W F D E E E E F D F D Assuming 2 reg write ports and forwarding, we can dispatch ADD in RS entry a; also reclaim RS entries r and b v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a 1 * 2 1 * 4 r 1 * 1 1 * 2 X4 1 * 4 b 1 * 2 1 * 6 t 1 * 8 0 c * X5 0 d * c 1 * 8 1 * 9 s X6 1 * 6 d 0 a * 0 t * v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 0 c * ADD MUL X11 0 t * 34
35 Cycle 10 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E F D E E E E W F D E E E E W F D E F D Now we dispatch the second MUL (in RS entry t) v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a 1 * 2 1 * 4 r X4 1 * 4 b t 1 * 8 1 * 17 X5 0 d * c 1 * 8 1 * 9 s X6 1 * 6 d 0 a * 0 t * v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 0 t * 35
36 Cycle 11 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E F D E E E E W F D E E E E W F D E E F D v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a 1 * 2 1 * 4 r X4 1 * 4 b t 1 * 8 1 * 17 X5 0 d * c s X6 1 * 6 d 0 a * 0 t * v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 0 t * 36
37 Cycle 12 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E E F D E E E E W F D E E E E W F D E E E F D v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a 1 * 2 1 * 4 r X4 1 * 4 b t 1 * 8 1 * 17 X5 0 d * c s X6 1 * 6 d 0 a * 0 t * v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 0 t * 37
38 Cycle 13 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E E W F D E E E E W F D E E E E W F D E E E E F D v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a 1 * 2 1 * 4 r X4 1 * 4 b t 1 * 8 1 * 17 X5 0 d * c s X6 1 * 6 d 1 * 6 0 t * v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 0 t * 38
39 Cycle 14 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E E W F D E E E E W F D E E E E W F D E E E E E F D v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a r X4 1 * 4 b t 1 * 8 1 * 17 X5 0 d * c s X6 1 * 6 d 1 * 6 0 t * v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 0 t * 39
40 Cycle 15 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E E W F D E E E E W F D E E E E W F D E E E E E E F D v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a r X4 1 * 4 b t 1 * 17 1 * 8 X5 0 d * c s X6 1 * 6 d 1 * 6 0 t * v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 0 t * 40
41 Cycle 16 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E E W Now we dispatch F D E E E E W the last ADD F D E E E E W F D E E E E E E W F D E v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a r X4 1 * 4 b t 1 * 17 1 * 8 X5 0 d * c s X6 1 * 6 d 1 * 6 1 * 136 v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 1 *
42 Cycle 19 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E E W F D E E E E W F D E E E E W F D E E E E E E W F D E E E E v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a r X4 1 * 4 b t X5 0 d * c s X6 1 * 6 d 1 * 6 1 * 136 v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 1 *
43 Cycle 20 MUL X3 $ X1, X2 ADD X5 $ X3, X4 ADD X7 $ X2, X6 ADD X10 $ X8, X9 MUL X11 $ X7, X10 ADD X5 $ X5, X11 F D E E E E E E W F D E E E E W F D E E E E W F D E E E E W F D E E E E E E W F D E E E E W v? tag val X1 1 * 1 X2 1 * 2 X3 1 * 2 a r X4 1 * 4 b t X5 1 * 142 c s X6 1 * 6 d 1 * 6 1 * 136 v X7 1 * 8 X8 1 * 8 X9 1 * 9 X10 1 * 17 ADD MUL X11 1 *
Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures
More information15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011
5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer
More informationComputer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013
18-447 Computer Architecture Lecture 14: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed
More information15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 8: Issues in Out-of-order Execution Prof. Onur Mutlu Carnegie Mellon University Readings General introduction and basic concepts Smith and Sohi, The Microarchitecture
More informationComputer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013
18-447 Computer Architecture Lecture 13: State Maintenance and Recovery Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed
More informationPrecise Exceptions and Out-of-Order Execution. Samira Khan
Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take
More informationHandout 2 ILP: Part B
Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP
More informationComputer Architecture: Out-of-Order Execution II. Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Out-of-Order Execution II Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-447 Spring 2013, Computer Architecture, Lecture 15 Video
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationComputer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014
18-447 Computer Architecture Lecture 15: Load/Store Handling and Data Flow Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 Lab 4 Heads Up Lab 4a out Branch handling and branch predictors
More information15-740/ Computer Architecture Lecture 12: Issues in OoO Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011
15-740/18-740 Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011 Reviews Due next Monday Mutlu et al., Runahead Execution: An Alternative
More informationLecture-13 (ROB and Multi-threading) CS422-Spring
Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo
More informationComputer Architecture: Branch Prediction. Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Branch Prediction Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-447 Spring 2013, Computer Architecture, Lecture 11: Branch Prediction
More informationInstruction Level Parallelism (Branch Prediction)
Instruction Level Parallelism (Branch Prediction) Branch Types Type Direction at fetch time Number of possible next fetch addresses? When is next fetch address resolved? Conditional Unknown 2 Execution
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationHardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.
Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationCPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationProcessor: Superscalars Dynamic Scheduling
Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationReorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)
Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More informationChapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism
More informationCase Study IBM PowerPC 620
Case Study IBM PowerPC 620 year shipped: 1995 allowing out-of-order execution (dynamic scheduling) and in-order commit (hardware speculation). using a reorder buffer to track when instruction can commit,
More informationPage 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
More informationReduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:
Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed
More informationDYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING
DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationEEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)
1 EEC 581 Computer Architecture Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview
More informationEITF20: Computer Architecture Part3.2.1: Pipeline - 3
EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationThis Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods
10-1 Dynamic Scheduling 10-1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods Not yet complete. (Material below may
More informationCS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example
CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences
More informationCISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3
CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More information1 Tomasulo s Algorithm
Design of Digital Circuits (252-0028-00L), Spring 2018 Optional HW 4: Out-of-Order Execution, Dataflow, Branch Prediction, VLIW, and Fine-Grained Multithreading uctor: Prof. Onur Mutlu TAs: Juan Gomez
More informationMetodologie di Progettazione Hardware-Software
Metodologie di Progettazione Hardware-Software Advanced Pipelining and Instruction-Level Paralelism Metodologie di Progettazione Hardware/Software LS Ing. Informatica 1 ILP Instruction-level Parallelism
More informationCOSC4201 Instruction Level Parallelism Dynamic Scheduling
COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism
More informationReferences EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)
EE457 Out of Order (OoO) Execution Introduction to Dynamic Scheduling of Instructions (The Tomasulo Algorithm) By Gandhi Puvvada References EE557 Textbook Prof Dubois EE557 Classnotes Prof Annavaram s
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise
More informationEEC 581 Computer Architecture. Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW)
EEC 581 Computer Architecture Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University
More informationStatic vs. Dynamic Scheduling
Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor
More informationChapter. Out of order Execution
Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance
More informationThis Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods
10 1 Dynamic Scheduling 10 1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods Not yet complete. (Material below may repeat
More informationSpring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic
Spring 2010 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic C/C++ program Compiler Assembly Code (binary) Processor 0010101010101011110 Memory MAR MDR INPUT Processing Unit OUTPUT ALU TEMP PC Control
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationReview: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:
CS152 Computer Architecture and Engineering Lecture 17 Dynamic Scheduling: Tomasulo March 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
More informationELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism
ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,
More informationThe Tomasulo Algorithm Implementation
2162 Term Project The Tomasulo Algorithm Implementation Assigned: 11/3/2015 Due: 12/15/2015 In this project, you will implement the Tomasulo algorithm with register renaming, ROB, speculative execution
More informationAdvanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012
Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code
More informationEECS 470 Lecture 6. Branches: Address prediction and recovery (And interrupt recovery too.)
EECS 470 Lecture 6 Branches: Address prediction and recovery (And interrupt recovery too.) Announcements: P3 posted, due a week from Sunday HW2 due Monday Reading Book: 3.1, 3.3-3.6, 3.8 Combining Branch
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationEECC551 Exam Review 4 questions out of 6 questions
EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving
More informationDYNAMIC SPECULATIVE EXECUTION
DYNAMIC SPECULATIVE EXECUTION Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationTopics. Digital Systems Architecture EECE EECE Predication, Prediction, and Speculation
Digital Systems Architecture EECE 343-01 EECE 292-02 Predication, Prediction, and Speculation Dr. William H. Robinson February 25, 2004 http://eecs.vanderbilt.edu/courses/eece343/ Topics Aha, now I see,
More informationReduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction
ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch
More informationGood luck and have fun!
Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationOutline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches
Session xploiting ILP with SW Approaches lectrical and Computer ngineering University of Alabama in Huntsville Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar,
More informationComplex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar
Complex Pipelining COE 501 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Diversified Pipeline Detecting
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationE0-243: Computer Architecture
E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationCISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions
CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy
More informationESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW
Computer Architecture ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW 1 Review from Last Lecture Leverage Implicit
More informationChapter 4 The Processor 1. Chapter 4D. The Processor
Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline
More informationCS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at
More informationCPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor
1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic
More informationCMSC411 Fall 2013 Midterm 2 Solutions
CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has
More informationDynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm:
LECTURE - 13 Dynamic Scheduling Better than static scheduling Scoreboarding: Used by the CDC 6600 Useful only within basic block WAW and WAR stalls Tomasulo algorithm: Used in IBM 360/91 for the FP unit
More informationUG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects
Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer
More informationCS433 Homework 2 (Chapter 3)
CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..
More informationLimitations of Scalar Pipelines
Limitations of Scalar Pipelines Superscalar Organization Modern Processor Design: Fundamentals of Superscalar Processors Scalar upper bound on throughput IPC = 1 Inefficient unified pipeline
More information15-740/ Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University Announcements Homework 4 Out today Due November 15 Midterm II November 22 Project
More informationScoreboard information (3 tables) Four stages of scoreboard control
Scoreboard information (3 tables) Instruction : issued, read operands and started execution (dispatched), completed execution or wrote result, Functional unit (assuming non-pipelined units) busy/not busy
More informationPortland State University ECE 587/687. Superscalar Issue Logic
Portland State University ECE 587/687 Superscalar Issue Logic Copyright by Alaa Alameldeen, Zeshan Chishti and Haitham Akkary 2017 Instruction Issue Logic (Sohi & Vajapeyam, 1987) After instructions are
More informationSuper Scalar. Kalyan Basu March 21,
Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build
More informationComputer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue
More informationDesign of Digital Circuits Lecture 18: Branch Prediction. Prof. Onur Mutlu ETH Zurich Spring May 2018
Design of Digital Circuits Lecture 18: Branch Prediction Prof. Onur Mutlu ETH Zurich Spring 2018 3 May 2018 Agenda for Today & Next Few Lectures Single-cycle Microarchitectures Multi-cycle and Microprogrammed
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More informationMultiple Instruction Issue and Hardware Based Speculation
Multiple Instruction Issue and Hardware Based Speculation Soner Önder Michigan Technological University, Houghton MI www.cs.mtu.edu/~soner Hardware Based Speculation Exploiting more ILP requires that we
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 14: Speculation II Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CS 246, Harvard University] Tomasulo+ROB Add
More informationDynamic Scheduling. CSE471 Susan Eggers 1
Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip
More informationSuperscalar Processors
Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input
More informationCS433 Homework 2 (Chapter 3)
CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationEN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture
EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture Prof. Sherief Reda School of Engineering Brown University Material from: Mostly from Modern Processor Design by Shen and
More informationECE/CS 552: Introduction to Superscalar Processors
ECE/CS 552: Introduction to Superscalar Processors Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Limitations of Scalar Pipelines
More informationSuperscalar Architectures: Part 2
Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23 rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr) Computer Science and Engineering Seoul NaMonal University Download this
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationLoad1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1
Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]
More informationFor this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units
CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)
More informationCS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More information