CS252 Graduate Computer Architecture Spring 2014 Lecture 6: Modern Out- of- Order Processors
|
|
- Ralf Atkinson
- 5 years ago
- Views:
Transcription
1 CS252 Graduate Comuter Architecture Sring 2014 Lecture 6: Modern Out- of- Order Processors Krste Asanovic htt://inst.eecs.berkeley.edu/~cs252/s14 CS252, Sring 2014, Lecture 6
2 Last Time in Lecture 5 Decouled execu@on Simle out- of- order scoreboard for CDC6600 Tomasulo algorithm for register renaming CS252, Sring 2014, Lecture 6 2
3 IBM 360/91 FloaJng- Point Unit tag/data tag/data tag/data tag/data tag/data tag/data load buffers (from memory) R. M. Tomasulo, 1967 instruc@ons tag/data tag/data tag/data tag/data Floa@ng- Point Regfile Distribute reserva4on sta4ons to func4onal units store buffers (to memory) CS252, Sring 2014, Lecture tag/data tag/data tag/data tag/data tag/data tag/data Adder tag/data tag/data tag/data 1 tag/data 2 tag/data tag/data tag/data < tag, result > Mult Common bus ensures that data is made available immediately to all the instruc4ons wai4ng for it. Match tag, if equal, coy value & set resence. 3
4 Out- of- Order Fades into Background Out- of- order rocessing imlemented commercially in 1960s, but disaeared again 1990s as two major roblems had to be solved: Precise tras - Imrecise tras comlicate debugging and OS code - Note, recise interruts are rela@vely easy to rovide Branch redic@on - Amount of exloitable instruc@on- level arallelism (ILP) limited by control hazards Also, simler machine designs in new technology beat comlicated machines in old technology - Big advantage to fit rocessor & caches on one chi - Microrocessors had era of 1%/week erformance scaling CS252, Sring 2014, Lecture 6 4
5 SearaJng ComleJon from Commit Re- order buffer holds register results from commit - Entries allocated in rogram order during decode - Buffers comleted values and exce@on state un@l in- order commit oint - Comleted values can be used by deendents before commibed (byassing) - Each entry holds rogram counter, instruc@on tye, des@na@on register secifier and value if any, and exce@on status (info ocen comressed to save hardware) Memory reordering needs secial data structures - Secula@ve store address and data buffers - Secula@ve load address and data buffers CS252, Sring 2014, Lecture 6 5
6 In- Order Commit for Precise Tras In- order Out- of- order In- order Fetch Decode Reorder Buffer Commit Kill Inject handler PC Kill Execute Kill Tra? In- order fetch and decode, and disatch to inside reorder buffer issue from out- of- order Out- of- order values stored in temorary buffers Commit is in- order, checks for tras, and if none udates architectural state CS252, Sring 2014, Lecture 6 6
7 PC I- cache Fetch Buffer Decode/Rename Issue Buffer Units Result Buffer Commit Architectural State CS252, Sring 2014, Lecture 6 Phases of InstrucJon ExecuJon Fetch: Instruc4on bits retrieved from instruc4on cache. Decode: Instruc4ons disatched to aroriate issue buffer Execute: Instruc4ons and oerands issued to func4onal units. When execu4on comletes, all results and exce4on flags are available. Commit: Instruc4on irrevocably udates architectural state (aka gradua4on ), or takes recise tra/interrut. 7
8 In- Order versus Out- of- Order Phases fetch/decode/rename always in- order - Need to arse ISA sequen@ally to get correct seman@cs - Proosals for secula@ve OoO instruc@on fetch, e.g., Mul@scalar. Predict control flow and data deendencies across sequen4al rogram segments fetched/decoded/ executed in arallel, fixu if redic@on wrong Disatch (lace instruc@on into machine buffers to wait for issue) also always in- order - Disatch some@mes used to mean issue, but not in these lectures CS252, Sring 2014, Lecture 6 8
9 In- Order Versus Out- of- Order Issue In- order issue: - Issue stalls on RAW deendencies or structural hazards, or ossibly WAR/WAW hazards - Instruc@on cannot issue to execu@on units unless all receding instruc@ons have issued to execu@on units Out- of- order issue: - Instruc@ons disatched in rogram order to reserva4on sta4ons (or other forms of instruc4on buffer) to wait for oerands to arrive, or other hazards to clear - While earlier instruc@ons wait in issue buffers, following instruc@ons can be disatched and issued out- of- order CS252, Sring 2014, Lecture 6 9
10 In- Order versus Out- of- Order ComleJon All but the simlest machines have out- of- order due to different latencies of units and desire to byass values as soon as available Classic RISC 5- stage integer ieline just barely has in- order - Load takes two cycles, but following one- cycle integer o comletes at not earlier - Adding ielined FPU immediately brings OoO comle@on CS252, Sring 2014, Lecture 6 10
11 In- Order versus Out- of- Order Commit In- order commit suorts recise tras, standard today - Some roosals to reduce the cost of in- order commit by re@ring some instruc@ons early to comact reorder buffer, but this is just an o@mized in- order commit Out- of- order commit was effec@vely what early OoO machines imlemented (imrecise tras) as comle@on irrevocably changed machine state CS252, Sring 2014, Lecture 6 11
12 OoO Design Choices Where are - Part of reorder buffer, or in searate issue window? - Distributed by func@onal units, or centralized? How is register renaming erformed? - Tags and data held in reserva@on sta@ons, with searate architectural register file - Tags only in reserva@on sta@ons, data held in unified hysical register file CS252, Sring 2014, Lecture 6 12
13 Oldest Free v v v v v i i i i i CS252, Sring 2014, Lecture 6 Data- in- ROB Design (HP PA8000, PenJum Pro, Core2Duo, Nehalem) Ocode Ocode Ocode Ocode Ocode Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Managed as circular buffer in rogram order, new instruc@ons disatched to free slots, oldest instruc@on commibed/reclaimed when done ( bit set on result) Tag is given by index in ROB (Free ointer value) In disatch, non- busy source oerands read from architectural register file and coied to Src1 and Src2 with resence bit set. Busy oerands coy tag of roducer and clear bit. Set valid bit v on disatch, set issued bit i on issue On comle@on, search source tags, set bit and coy data into src on tag match. Write result and exce@on flags to ROB. On commit, check exce@on status, and coy result into architectural register file if no tra. On tra, flush machine and ROB, set free=oldest, jum to handler
14 Rename table associated with architectural registers, managed in decode/disatch CS252, Sring 2014, Lecture 6 Managing Rename for Data- in- ROB Tag Tag Tag Tag Value Value Value Value If bit set, then use value in architectural register file Else, tag field indicates instruc@on that will/has roduced value For disatch, read source oerands <,tag,value> from arch. regfile, and also read <,result> from roducing instruc@on in ROB, byassing as needed. Coy to ROB Write des@na@on arch. register entry with <0,Free,_>, to assign tag to ROB index of this instruc@on On commit, udate arch. regfile with <1, _, Result> On tra, reset table (All =1) One entry er arch. register 14
15 Data Movement in Data- in- ROB Design Read oerands during decode Write sources in disatch ROB Architectural Register File Source Oerands Result Data Write results at commit Read results for commit Byass newer values at disatch Read oerands at issue Units Write results at CS252, Sring 2014, Lecture 6 15
16 Unified Physical Register File (MIPS R10K, Alha 21264, Intel PenGum 4 & Sandy/Ivy Bridge) Rename all architectural registers into a single hysical register file during decode, no register values read Func@onal units read and write from single unified register file holding commibed and temorary registers in execute Commit only udates maing of architectural register to hysical register, no data movement Decode Stage Register Maing Unified Physical Register File Commibed Register Maing Read oerands at issue Write results at comle@on Func@onal Units CS252, Sring 2014, Lecture 6 16
17 LifeJme of Physical Registers Physical regfile holds commibed and values Physical registers decouled from ROB entries (no data in ROB) ld x1, (x3) addi x3, x1, #4 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x11) Rename ld, (Px) addi,, #4 sub, Py, Pz add,, ld P5, () add P6, P5, sd P6, () ld P7, (Pw) When can we reuse a hysical register? When next writer of same architectural register commits CS252, Sring 2014, Lecture 6 17
18 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) (LPRd requires third read ort on Rename Table for each instruction) CS252, Sring 2014, Lecture 6 18
19 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 19
20 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <R1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 20
21 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <R1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 21
22 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 x add x3 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 22
23 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 23
24 Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 Physical Register Management P5 P6 P7 P8 Physical Regs <x1> <x6> <x7> <x3> <x1> Free List P8 Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd x x ld P7 x1 P8 x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Execute & Commit CS252, Sring 2014, Lecture 6 24
25 Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 Physical Register Management P5 P6 P7 P8 Physical Regs <x1> <x3> <x6> <x7> <x3> Free List P8 P7 Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd x x ld P7 x1 P8 x x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Execute & Commit CS252, Sring 2014, Lecture 6 25
26 MIPS R10K Tra Handling Rename table is reaired by unrenaming in reverse order using the PRd/LPRd fields The Alha had similar hysical register file scheme, but ket comlete rename table snashots for each in ROB (80 snashots total) - Flash coy all bits from snashot to ac@ve table in one cycle CS252, Sring 2014, Lecture 6 26
27 Reorder Buffer Holds AcJve InstrucJons (Decoded but not Commi`ed) (Older instrucgons) ld x1, (x3) add x3, x1, x2 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x1) (Newer instrucgons) ROB contents Commit Execute Fetch ld x1, (x3) add x3, x1, x2 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x1) Cycle t Cycle t + 1 CS252, Sring 2014, Lecture 6 27
28 Searate Issue Window from ROB The issue window holds only that have been decoded and renamed but not issued into Has register tags and resence bits, and ointer to ROB entry. use ex o 1 PR1 2 PR2 PRd ROB# Reorder buffer used to hold exce@on informa@on for commit. Oldest Free Done? Rd LPRd PC Excet? CS252, Sring 2014, Lecture 6 ROB is usually larger than issue window why? 28
29 Suerscalar Register Renaming During decode, allocated new hysical register Source oerands renamed to hysical register with newest value unit only sees hysical register numbers Inst 1 O Dest Src1 Src2 O Dest Src1 Src2 Inst 2 Udate Maing Write Ports Read Addresses Rename Table Read Data Register Free List O PDest PSrc1 PSrc2 O PDest PSrc1 PSrc2 Does this work? CS252, Sring 2014, Lecture 6 29
30 Suerscalar Register Renaming Inst 1 O Dest Src1 Src2 O Dest Src1 Src2 Inst 2 Udate Maing Must check for RAW hazards between instruc@ons issuing in same cycle. Can be done in arallel with rename O looku. Write Ports PDest Read Addresses Rename Table Read Data PSrc1 PSrc2 O PDest PSrc1 PSrc2 Register Free List MIPS R10K renames 4 serially- RAW- deendent insts/cycle =? =? CS252, Sring 2014, Lecture 6 30
31 Acknowledgements This course is artly insired by revious MIT and Berkeley CS252 comuter architecture courses created by my collaborators and colleagues: - Arvind (MIT) - Joel Emer (Intel/MIT) - James Hoe (CMU) - John Kubiatowicz (UCB) - David Paberson (UCB) CS252, Sring 2014, Lecture 6 31
CS252 Graduate Computer Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors
CS252 Graduate Comuter Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors Krste Asanovic krste@eecs.berkeley.edu htt://inst.eecs.berkeley.edu/~cs252/fa15 Suercomuters Defini=ons of a suercomuter:
More informationCSC 631: High-Performance Computer Architecture
CSC 631: High-Performance Comuter Architecture Sring 2017 Lecture 6: Out-of-Order Processors Suercomuters Definitions of a suercomuter: Fastest machine in world at given task A device to turn a comute-bound
More informationCS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
More informationCS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://wwweecsberkeleyedu/~krste
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II
CS252 Spring 2017 Graduate Computer Architecture Lecture 8: Advanced Out-of-Order Superscalar Designs Part II Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 9 Instruction-Level Parallelism Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationLecture 12 Branch Prediction and Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars Krste Asanovic Electrical Engineering and Computer
More informationCS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at
More informationCS 152, Spring 2011 Section 8
CS 152, Spring 2011 Section 8 Christopher Celio University of California, Berkeley Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Intel Core 2 Duo (Penryn) Vs. NVidia
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationChapter 3: Instruc0on Level Parallelism and Its Exploita0on
Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationCS252 Graduate Computer Architecture Lecture 8. Review: Scoreboard (CDC 6600) Explicit Renaming Precise Interrupts February 13 th, 2010
CS252 Graduate Computer Architecture Lecture 8 Explicit Renaming Precise Interrupts February 13 th, 2010 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley
More informationLecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques,
Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques, ARM Cortex-A53, and Intel Core i7 CSCE 513 Computer Architecture Department of Computer Science and Engineering Yonghong
More informationCS 5515 Fall Solution to Test è1. Open booksènotes; calculator allowed
CS 5515 Fall 1997 Solution to Test è1 Oen booksènotes; calculator allowed 1. Consider a standard DLX IntèFT ieline with 5 stages: IF, ID, EX, M and WB. The execution unit can execute æoating oint or integer
More informationCS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25
CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem
More informationComputer Architecture ELEC3441
Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic
More informationLecture 13 - VLIW Machines and Statically Scheduled ILP
CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationC 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA
CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo Last time Virtual address caches Virtually-indexed, physically-tagged cache design
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationCS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP
CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 17: Virtual Memory and Caches
CS252 Spring 2017 Graduate Computer Architecture Lecture 17: Virtual Memory and Caches Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 16 Memory
More informationCS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 4 Pipelining Part II
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 4 Pipelining Part II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley
More informationECE 4750 Computer Architecture, Fall 2017 T11 Advanced Processors: Register Renaming
ECE 4750 Comuter Architecture, Fall 207 T Adanced Processors: Register Renaming School of Electrical and Comuter Engineering Cornell Uniersity reision: 207--2-2-7 WAW and WAR Hazards 2 2 IO2L Pointer-Based
More informationCPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner
CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that
More informationCISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions
CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy
More informationInstruction Level Parallelism
Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic
More informationBranch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines
6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationComplex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units
6823, L14--1 Complex Pipelining: Out-of-order Execution & Register Renaming Laboratory for Computer Science MIT http://wwwcsglcsmitedu/6823 Multiple Function Units 6823, L14--2 ALU Mem IF ID Issue WB Fadd
More information15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011
5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/
More informationCS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example
CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance
More informationLecture 4 Pipelining Part II
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 4 Pipelining Part II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley
More informationProcessor: Superscalars Dynamic Scheduling
Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),
More informationComputer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505
Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng-
More informationOut of Order Processing
Out of Order Processing Manu Awasthi July 3 rd 2018 Computer Architecture Summer School 2018 Slide deck acknowledgements : Rajeev Balasubramonian (University of Utah), Computer Architecture: A Quantitative
More informationCS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading
CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading Krste Asanovic krste@eecs.berkeley.edu http://inst.eecs.berkeley.edu/~cs252/sp14 Last Time in Lecture 12 Synchroniza?on and Memory
More informationReorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)
Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationCS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 3 - Pipelining
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 3 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley
More informationCS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory
CS 152 Computer Architecture and Engineering Lecture 9 - Virtual Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationCPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo
More informationPrecise Exceptions and Out-of-Order Execution. Samira Khan
Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take
More informationLecture 9 - Virtual Memory
CS 152 Computer Architecture and Engineering Lecture 9 - Virtual Memory Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152
More informationDynamic Scheduling. CSE471 Susan Eggers 1
Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip
More informationCS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions)
CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II (Branches, Exceptions) John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationCS 152 Computer Architecture and Engineering. Lecture 16 - VLIW Machines and Statically Scheduled ILP
CS 152 Computer Architecture and Engineering Lecture 16 - VLIW Machines and Statically Scheduled ILP Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 9 Virtual Memory
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 9 Virtual Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley
More informationHardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.
Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I
More informationLecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ
Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)
More informationDonn Morrison Department of Computer Science. TDT4255 ILP and speculation
TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple
More informationCS252 Graduate Computer Architecture Midterm 1 Solutions
CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate
More informationLecture-13 (ROB and Multi-threading) CS422-Spring
Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue
More informationChapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,
Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationComputer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue
More informationEE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University
EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationLecture 4 - Pipelining
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationC 1. Last Time. CSE 490/590 Computer Architecture. ISAs and MIPS. Instruction Set Architecture (ISA) ISA to Microarchitecture Mapping
CSE 49/59 Computer Architecture ISAs and MIPS Last Time Computer Architecture >> ISAs and RTL Comp. Arch. shaped by technology and applications Computer Architecture brings a quantitative approach to the
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationCS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches
CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationThis Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods
10 1 Dynamic Scheduling 10 1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods Not yet complete. (Material below may repeat
More informationLecture 14: Multithreading
CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw
More information15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 8: Issues in Out-of-order Execution Prof. Onur Mutlu Carnegie Mellon University Readings General introduction and basic concepts Smith and Sohi, The Microarchitecture
More informationESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW
Computer Architecture ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW 1 Review from Last Lecture Leverage Implicit
More informationReduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:
Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1
CS252 Spring 2017 Graduate Computer Architecture Lecture 14: Multithreading Part 2 Synchronization 1 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationThis Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods
10-1 Dynamic Scheduling 10-1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods Not yet complete. (Material below may
More informationLecture 19: Instruction Level Parallelism
Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register
More informationCS433 Homework 2 (Chapter 3)
CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..
More informationComputer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013
18-447 Computer Architecture Lecture 13: State Maintenance and Recovery Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed
More informationCSC 631: High-Performance Computer Architecture
CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 4: Pipelining Last Time in Lecture 3 icrocoding, an effective technique to manage control unit complexity, invented in era when logic
More informationPage 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
More informationLecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )
Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6) 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3
More informationComputer Architecture Spring 2016
Computer rchitecture Spring 2016 Lecture 10: Out-of-Order Execution & Register Renaming Shuai Wang Department of Computer Science and Technology Nanjing University In Search of Parallelism Trivial Parallelism
More informationEITF20: Computer Architecture Part3.2.1: Pipeline - 3
EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done
More informationCS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 7 Memory III
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 7 Memory III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationCISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3
CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationCS 152 Computer Architecture and Engineering. Lecture 18: Multithreading
CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation
CS 152 Computer Architecture and Engineering Lecture 8 - Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationHandout 2 ILP: Part B
Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2
CS252 Spring 2017 Graduate Computer Architecture Lecture 15: Synchronization and Memory Models Part 2 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Project Proposal
More informationComputer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures
More informationComputer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013
18-447 Computer Architecture Lecture 14: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationStatic vs. Dynamic Scheduling
Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor
More informationReview: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:
CS152 Computer Architecture and Engineering Lecture 17 Dynamic Scheduling: Tomasulo March 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
More informationComputer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović
Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More information