CS252 Graduate Computer Architecture Spring 2014 Lecture 6: Modern Out- of- Order Processors

Size: px
Start display at page:

Download "CS252 Graduate Computer Architecture Spring 2014 Lecture 6: Modern Out- of- Order Processors"

Transcription

1 CS252 Graduate Comuter Architecture Sring 2014 Lecture 6: Modern Out- of- Order Processors Krste Asanovic htt://inst.eecs.berkeley.edu/~cs252/s14 CS252, Sring 2014, Lecture 6

2 Last Time in Lecture 5 Decouled execu@on Simle out- of- order scoreboard for CDC6600 Tomasulo algorithm for register renaming CS252, Sring 2014, Lecture 6 2

3 IBM 360/91 FloaJng- Point Unit tag/data tag/data tag/data tag/data tag/data tag/data load buffers (from memory) R. M. Tomasulo, 1967 instruc@ons tag/data tag/data tag/data tag/data Floa@ng- Point Regfile Distribute reserva4on sta4ons to func4onal units store buffers (to memory) CS252, Sring 2014, Lecture tag/data tag/data tag/data tag/data tag/data tag/data Adder tag/data tag/data tag/data 1 tag/data 2 tag/data tag/data tag/data < tag, result > Mult Common bus ensures that data is made available immediately to all the instruc4ons wai4ng for it. Match tag, if equal, coy value & set resence. 3

4 Out- of- Order Fades into Background Out- of- order rocessing imlemented commercially in 1960s, but disaeared again 1990s as two major roblems had to be solved: Precise tras - Imrecise tras comlicate debugging and OS code - Note, recise interruts are rela@vely easy to rovide Branch redic@on - Amount of exloitable instruc@on- level arallelism (ILP) limited by control hazards Also, simler machine designs in new technology beat comlicated machines in old technology - Big advantage to fit rocessor & caches on one chi - Microrocessors had era of 1%/week erformance scaling CS252, Sring 2014, Lecture 6 4

5 SearaJng ComleJon from Commit Re- order buffer holds register results from commit - Entries allocated in rogram order during decode - Buffers comleted values and exce@on state un@l in- order commit oint - Comleted values can be used by deendents before commibed (byassing) - Each entry holds rogram counter, instruc@on tye, des@na@on register secifier and value if any, and exce@on status (info ocen comressed to save hardware) Memory reordering needs secial data structures - Secula@ve store address and data buffers - Secula@ve load address and data buffers CS252, Sring 2014, Lecture 6 5

6 In- Order Commit for Precise Tras In- order Out- of- order In- order Fetch Decode Reorder Buffer Commit Kill Inject handler PC Kill Execute Kill Tra? In- order fetch and decode, and disatch to inside reorder buffer issue from out- of- order Out- of- order values stored in temorary buffers Commit is in- order, checks for tras, and if none udates architectural state CS252, Sring 2014, Lecture 6 6

7 PC I- cache Fetch Buffer Decode/Rename Issue Buffer Units Result Buffer Commit Architectural State CS252, Sring 2014, Lecture 6 Phases of InstrucJon ExecuJon Fetch: Instruc4on bits retrieved from instruc4on cache. Decode: Instruc4ons disatched to aroriate issue buffer Execute: Instruc4ons and oerands issued to func4onal units. When execu4on comletes, all results and exce4on flags are available. Commit: Instruc4on irrevocably udates architectural state (aka gradua4on ), or takes recise tra/interrut. 7

8 In- Order versus Out- of- Order Phases fetch/decode/rename always in- order - Need to arse ISA sequen@ally to get correct seman@cs - Proosals for secula@ve OoO instruc@on fetch, e.g., Mul@scalar. Predict control flow and data deendencies across sequen4al rogram segments fetched/decoded/ executed in arallel, fixu if redic@on wrong Disatch (lace instruc@on into machine buffers to wait for issue) also always in- order - Disatch some@mes used to mean issue, but not in these lectures CS252, Sring 2014, Lecture 6 8

9 In- Order Versus Out- of- Order Issue In- order issue: - Issue stalls on RAW deendencies or structural hazards, or ossibly WAR/WAW hazards - Instruc@on cannot issue to execu@on units unless all receding instruc@ons have issued to execu@on units Out- of- order issue: - Instruc@ons disatched in rogram order to reserva4on sta4ons (or other forms of instruc4on buffer) to wait for oerands to arrive, or other hazards to clear - While earlier instruc@ons wait in issue buffers, following instruc@ons can be disatched and issued out- of- order CS252, Sring 2014, Lecture 6 9

10 In- Order versus Out- of- Order ComleJon All but the simlest machines have out- of- order due to different latencies of units and desire to byass values as soon as available Classic RISC 5- stage integer ieline just barely has in- order - Load takes two cycles, but following one- cycle integer o comletes at not earlier - Adding ielined FPU immediately brings OoO comle@on CS252, Sring 2014, Lecture 6 10

11 In- Order versus Out- of- Order Commit In- order commit suorts recise tras, standard today - Some roosals to reduce the cost of in- order commit by re@ring some instruc@ons early to comact reorder buffer, but this is just an o@mized in- order commit Out- of- order commit was effec@vely what early OoO machines imlemented (imrecise tras) as comle@on irrevocably changed machine state CS252, Sring 2014, Lecture 6 11

12 OoO Design Choices Where are - Part of reorder buffer, or in searate issue window? - Distributed by func@onal units, or centralized? How is register renaming erformed? - Tags and data held in reserva@on sta@ons, with searate architectural register file - Tags only in reserva@on sta@ons, data held in unified hysical register file CS252, Sring 2014, Lecture 6 12

13 Oldest Free v v v v v i i i i i CS252, Sring 2014, Lecture 6 Data- in- ROB Design (HP PA8000, PenJum Pro, Core2Duo, Nehalem) Ocode Ocode Ocode Ocode Ocode Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Managed as circular buffer in rogram order, new instruc@ons disatched to free slots, oldest instruc@on commibed/reclaimed when done ( bit set on result) Tag is given by index in ROB (Free ointer value) In disatch, non- busy source oerands read from architectural register file and coied to Src1 and Src2 with resence bit set. Busy oerands coy tag of roducer and clear bit. Set valid bit v on disatch, set issued bit i on issue On comle@on, search source tags, set bit and coy data into src on tag match. Write result and exce@on flags to ROB. On commit, check exce@on status, and coy result into architectural register file if no tra. On tra, flush machine and ROB, set free=oldest, jum to handler

14 Rename table associated with architectural registers, managed in decode/disatch CS252, Sring 2014, Lecture 6 Managing Rename for Data- in- ROB Tag Tag Tag Tag Value Value Value Value If bit set, then use value in architectural register file Else, tag field indicates instruc@on that will/has roduced value For disatch, read source oerands <,tag,value> from arch. regfile, and also read <,result> from roducing instruc@on in ROB, byassing as needed. Coy to ROB Write des@na@on arch. register entry with <0,Free,_>, to assign tag to ROB index of this instruc@on On commit, udate arch. regfile with <1, _, Result> On tra, reset table (All =1) One entry er arch. register 14

15 Data Movement in Data- in- ROB Design Read oerands during decode Write sources in disatch ROB Architectural Register File Source Oerands Result Data Write results at commit Read results for commit Byass newer values at disatch Read oerands at issue Units Write results at CS252, Sring 2014, Lecture 6 15

16 Unified Physical Register File (MIPS R10K, Alha 21264, Intel PenGum 4 & Sandy/Ivy Bridge) Rename all architectural registers into a single hysical register file during decode, no register values read Func@onal units read and write from single unified register file holding commibed and temorary registers in execute Commit only udates maing of architectural register to hysical register, no data movement Decode Stage Register Maing Unified Physical Register File Commibed Register Maing Read oerands at issue Write results at comle@on Func@onal Units CS252, Sring 2014, Lecture 6 16

17 LifeJme of Physical Registers Physical regfile holds commibed and values Physical registers decouled from ROB entries (no data in ROB) ld x1, (x3) addi x3, x1, #4 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x11) Rename ld, (Px) addi,, #4 sub, Py, Pz add,, ld P5, () add P6, P5, sd P6, () ld P7, (Pw) When can we reuse a hysical register? When next writer of same architectural register commits CS252, Sring 2014, Lecture 6 17

18 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) (LPRd requires third read ort on Rename Table for each instruction) CS252, Sring 2014, Lecture 6 18

19 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 19

20 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <R1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 20

21 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <R1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 21

22 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 x add x3 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 22

23 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 23

24 Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 Physical Register Management P5 P6 P7 P8 Physical Regs <x1> <x6> <x7> <x3> <x1> Free List P8 Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd x x ld P7 x1 P8 x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Execute & Commit CS252, Sring 2014, Lecture 6 24

25 Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 Physical Register Management P5 P6 P7 P8 Physical Regs <x1> <x3> <x6> <x7> <x3> Free List P8 P7 Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd x x ld P7 x1 P8 x x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Execute & Commit CS252, Sring 2014, Lecture 6 25

26 MIPS R10K Tra Handling Rename table is reaired by unrenaming in reverse order using the PRd/LPRd fields The Alha had similar hysical register file scheme, but ket comlete rename table snashots for each in ROB (80 snashots total) - Flash coy all bits from snashot to ac@ve table in one cycle CS252, Sring 2014, Lecture 6 26

27 Reorder Buffer Holds AcJve InstrucJons (Decoded but not Commi`ed) (Older instrucgons) ld x1, (x3) add x3, x1, x2 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x1) (Newer instrucgons) ROB contents Commit Execute Fetch ld x1, (x3) add x3, x1, x2 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x1) Cycle t Cycle t + 1 CS252, Sring 2014, Lecture 6 27

28 Searate Issue Window from ROB The issue window holds only that have been decoded and renamed but not issued into Has register tags and resence bits, and ointer to ROB entry. use ex o 1 PR1 2 PR2 PRd ROB# Reorder buffer used to hold exce@on informa@on for commit. Oldest Free Done? Rd LPRd PC Excet? CS252, Sring 2014, Lecture 6 ROB is usually larger than issue window why? 28

29 Suerscalar Register Renaming During decode, allocated new hysical register Source oerands renamed to hysical register with newest value unit only sees hysical register numbers Inst 1 O Dest Src1 Src2 O Dest Src1 Src2 Inst 2 Udate Maing Write Ports Read Addresses Rename Table Read Data Register Free List O PDest PSrc1 PSrc2 O PDest PSrc1 PSrc2 Does this work? CS252, Sring 2014, Lecture 6 29

30 Suerscalar Register Renaming Inst 1 O Dest Src1 Src2 O Dest Src1 Src2 Inst 2 Udate Maing Must check for RAW hazards between instruc@ons issuing in same cycle. Can be done in arallel with rename O looku. Write Ports PDest Read Addresses Rename Table Read Data PSrc1 PSrc2 O PDest PSrc1 PSrc2 Register Free List MIPS R10K renames 4 serially- RAW- deendent insts/cycle =? =? CS252, Sring 2014, Lecture 6 30

31 Acknowledgements This course is artly insired by revious MIT and Berkeley CS252 comuter architecture courses created by my collaborators and colleagues: - Arvind (MIT) - Joel Emer (Intel/MIT) - James Hoe (CMU) - John Kubiatowicz (UCB) - David Paberson (UCB) CS252, Sring 2014, Lecture 6 31

CS252 Graduate Computer Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors

CS252 Graduate Computer Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors CS252 Graduate Comuter Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors Krste Asanovic krste@eecs.berkeley.edu htt://inst.eecs.berkeley.edu/~cs252/fa15 Suercomuters Defini=ons of a suercomuter:

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Comuter Architecture Sring 2017 Lecture 6: Out-of-Order Processors Suercomuters Definitions of a suercomuter: Fastest machine in world at given task A device to turn a comute-bound

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://wwweecsberkeleyedu/~krste

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II CS252 Spring 2017 Graduate Computer Architecture Lecture 8: Advanced Out-of-Order Superscalar Designs Part II Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2 ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 9 Instruction-Level Parallelism Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars

Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars Krste Asanovic Electrical Engineering and Computer

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

CS 152, Spring 2011 Section 8

CS 152, Spring 2011 Section 8 CS 152, Spring 2011 Section 8 Christopher Celio University of California, Berkeley Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Intel Core 2 Duo (Penryn) Vs. NVidia

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

CS252 Graduate Computer Architecture Lecture 8. Review: Scoreboard (CDC 6600) Explicit Renaming Precise Interrupts February 13 th, 2010

CS252 Graduate Computer Architecture Lecture 8. Review: Scoreboard (CDC 6600) Explicit Renaming Precise Interrupts February 13 th, 2010 CS252 Graduate Computer Architecture Lecture 8 Explicit Renaming Precise Interrupts February 13 th, 2010 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

More information

Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques,

Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques, Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques, ARM Cortex-A53, and Intel Core i7 CSCE 513 Computer Architecture Department of Computer Science and Engineering Yonghong

More information

CS 5515 Fall Solution to Test è1. Open booksènotes; calculator allowed

CS 5515 Fall Solution to Test è1. Open booksènotes; calculator allowed CS 5515 Fall 1997 Solution to Test è1 Oen booksènotes; calculator allowed 1. Consider a standard DLX IntèFT ieline with 5 stages: IF, ID, EX, M and WB. The execution unit can execute æoating oint or integer

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic

More information

Lecture 13 - VLIW Machines and Statically Scheduled ILP

Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo Last time Virtual address caches Virtually-indexed, physically-tagged cache design

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1 ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP

CS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 17: Virtual Memory and Caches

CS252 Spring 2017 Graduate Computer Architecture. Lecture 17: Virtual Memory and Caches CS252 Spring 2017 Graduate Computer Architecture Lecture 17: Virtual Memory and Caches Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 16 Memory

More information

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 4 Pipelining Part II

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 4 Pipelining Part II CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 4 Pipelining Part II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley

More information

ECE 4750 Computer Architecture, Fall 2017 T11 Advanced Processors: Register Renaming

ECE 4750 Computer Architecture, Fall 2017 T11 Advanced Processors: Register Renaming ECE 4750 Comuter Architecture, Fall 207 T Adanced Processors: Register Renaming School of Electrical and Comuter Engineering Cornell Uniersity reision: 207--2-2-7 WAW and WAR Hazards 2 2 IO2L Pointer-Based

More information

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that

More information

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines 6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units 6823, L14--1 Complex Pipelining: Out-of-order Execution & Register Renaming Laboratory for Computer Science MIT http://wwwcsglcsmitedu/6823 Multiple Function Units 6823, L14--2 ALU Mem IF ID Issue WB Fadd

More information

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011 5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

Lecture 4 Pipelining Part II

Lecture 4 Pipelining Part II CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 4 Pipelining Part II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505 Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng-

More information

Out of Order Processing

Out of Order Processing Out of Order Processing Manu Awasthi July 3 rd 2018 Computer Architecture Summer School 2018 Slide deck acknowledgements : Rajeev Balasubramonian (University of Utah), Computer Architecture: A Quantitative

More information

CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading

CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading CS252 Graduate Computer Architecture Spring 2014 Lecture 13: Mul>threading Krste Asanovic krste@eecs.berkeley.edu http://inst.eecs.berkeley.edu/~cs252/sp14 Last Time in Lecture 12 Synchroniza?on and Memory

More information

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 3 - Pipelining

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 3 - Pipelining CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 3 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley

More information

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory CS 152 Computer Architecture and Engineering Lecture 9 - Virtual Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo

More information

Precise Exceptions and Out-of-Order Execution. Samira Khan

Precise Exceptions and Out-of-Order Execution. Samira Khan Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take

More information

Lecture 9 - Virtual Memory

Lecture 9 - Virtual Memory CS 152 Computer Architecture and Engineering Lecture 9 - Virtual Memory Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152

More information

Dynamic Scheduling. CSE471 Susan Eggers 1

Dynamic Scheduling. CSE471 Susan Eggers 1 Dynamic Scheduling Why go out of style? expensive hardware for the time (actually, still is, relatively) register files grew so less register pressure early RISCs had lower CPIs Why come back? higher chip

More information

CS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions)

CS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions) CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II (Branches, Exceptions) John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

CS 152 Computer Architecture and Engineering. Lecture 16 - VLIW Machines and Statically Scheduled ILP

CS 152 Computer Architecture and Engineering. Lecture 16 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 16 - VLIW Machines and Statically Scheduled ILP Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 9 Virtual Memory

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 9 Virtual Memory CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 9 Virtual Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I

More information

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007, Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1 ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html

More information

Lecture 4 - Pipelining

Lecture 4 - Pipelining CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

C 1. Last Time. CSE 490/590 Computer Architecture. ISAs and MIPS. Instruction Set Architecture (ISA) ISA to Microarchitecture Mapping

C 1. Last Time. CSE 490/590 Computer Architecture. ISAs and MIPS. Instruction Set Architecture (ISA) ISA to Microarchitecture Mapping CSE 49/59 Computer Architecture ISAs and MIPS Last Time Computer Architecture >> ISAs and RTL Comp. Arch. shaped by technology and applications Computer Architecture brings a quantitative approach to the

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches CS 152 Computer Architecture and Engineering Lecture 11 - Virtual Memory and Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods 10 1 Dynamic Scheduling 10 1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods Not yet complete. (Material below may repeat

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 8: Issues in Out-of-order Execution Prof. Onur Mutlu Carnegie Mellon University Readings General introduction and basic concepts Smith and Sohi, The Microarchitecture

More information

ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW

ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW Computer Architecture ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW 1 Review from Last Lecture Leverage Implicit

More information

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1 CS252 Spring 2017 Graduate Computer Architecture Lecture 14: Multithreading Part 2 Synchronization 1 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods 10-1 Dynamic Scheduling 10-1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods Not yet complete. (Material below may

More information

Lecture 19: Instruction Level Parallelism

Lecture 19: Instruction Level Parallelism Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..

More information

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 18-447 Computer Architecture Lecture 13: State Maintenance and Recovery Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 4: Pipelining Last Time in Lecture 3 icrocoding, an effective technique to manage control unit complexity, invented in era when logic

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections ) Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections 2.3-2.6) 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB) Branch prediction and instr fetch R1 R1+R2 R2 R1+R3 BEQZ R2 R3

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer rchitecture Spring 2016 Lecture 10: Out-of-Order Execution & Register Renaming Shuai Wang Department of Computer Science and Technology Nanjing University In Search of Parallelism Trivial Parallelism

More information

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

EITF20: Computer Architecture Part3.2.1: Pipeline - 3 EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done

More information

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 7 Memory III

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 7 Memory III CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 7 Memory III Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3 CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation CS 152 Computer Architecture and Engineering Lecture 8 - Translation Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2 CS252 Spring 2017 Graduate Computer Architecture Lecture 15: Synchronization and Memory Models Part 2 Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Project Proposal

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 18-447 Computer Architecture Lecture 14: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Static vs. Dynamic Scheduling

Static vs. Dynamic Scheduling Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor

More information

Review: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:

Review: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software: CS152 Computer Architecture and Engineering Lecture 17 Dynamic Scheduling: Tomasulo March 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/

More information

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation

More information