CS252 Graduate Computer Architecture Spring 2014 Lecture 6: Modern Out- of- Order Processors

Size: px

Start display at page:

Download "CS252 Graduate Computer Architecture Spring 2014 Lecture 6: Modern Out- of- Order Processors"

Ralf Atkinson
5 years ago
Views:

1 CS252 Graduate Comuter Architecture Sring 2014 Lecture 6: Modern Out- of- Order Processors Krste Asanovic htt://inst.eecs.berkeley.edu/~cs252/s14 CS252, Sring 2014, Lecture 6

2 Last Time in Lecture 5 Decouled execu@on Simle out- of- order scoreboard for CDC6600 Tomasulo algorithm for register renaming CS252, Sring 2014, Lecture 6 2

3 IBM 360/91 FloaJng- Point Unit tag/data tag/data tag/data tag/data tag/data tag/data load buffers (from memory) R. M. Tomasulo, 1967 instruc@ons tag/data tag/data tag/data tag/data Floa@ng- Point Regfile Distribute reserva4on sta4ons to func4onal units store buffers (to memory) CS252, Sring 2014, Lecture tag/data tag/data tag/data tag/data tag/data tag/data Adder tag/data tag/data tag/data 1 tag/data 2 tag/data tag/data tag/data < tag, result > Mult Common bus ensures that data is made available immediately to all the instruc4ons wai4ng for it. Match tag, if equal, coy value & set resence. 3

4 Out- of- Order Fades into Background Out- of- order rocessing imlemented commercially in 1960s, but disaeared again 1990s as two major roblems had to be solved: Precise tras - Imrecise tras comlicate debugging and OS code - Note, recise interruts are rela@vely easy to rovide Branch redic@on - Amount of exloitable instruc@on- level arallelism (ILP) limited by control hazards Also, simler machine designs in new technology beat comlicated machines in old technology - Big advantage to fit rocessor & caches on one chi - Microrocessors had era of 1%/week erformance scaling CS252, Sring 2014, Lecture 6 4

5 SearaJng ComleJon from Commit Re- order buffer holds register results from commit - Entries allocated in rogram order during decode - Buffers comleted values and exce@on state un@l in- order commit oint - Comleted values can be used by deendents before commibed (byassing) - Each entry holds rogram counter, instruc@on tye, des@na@on register secifier and value if any, and exce@on status (info ocen comressed to save hardware) Memory reordering needs secial data structures - Secula@ve store address and data buffers - Secula@ve load address and data buffers CS252, Sring 2014, Lecture 6 5

6 In- Order Commit for Precise Tras In- order Out- of- order In- order Fetch Decode Reorder Buffer Commit Kill Inject handler PC Kill Execute Kill Tra? In- order fetch and decode, and disatch to inside reorder buffer issue from out- of- order Out- of- order values stored in temorary buffers Commit is in- order, checks for tras, and if none udates architectural state CS252, Sring 2014, Lecture 6 6

7 PC I- cache Fetch Buffer Decode/Rename Issue Buffer Units Result Buffer Commit Architectural State CS252, Sring 2014, Lecture 6 Phases of InstrucJon ExecuJon Fetch: Instruc4on bits retrieved from instruc4on cache. Decode: Instruc4ons disatched to aroriate issue buffer Execute: Instruc4ons and oerands issued to func4onal units. When execu4on comletes, all results and exce4on flags are available. Commit: Instruc4on irrevocably udates architectural state (aka gradua4on ), or takes recise tra/interrut. 7

8 In- Order versus Out- of- Order Phases fetch/decode/rename always in- order - Need to arse ISA sequen@ally to get correct seman@cs - Proosals for secula@ve OoO instruc@on fetch, e.g., Mul@scalar. Predict control flow and data deendencies across sequen4al rogram segments fetched/decoded/ executed in arallel, fixu if redic@on wrong Disatch (lace instruc@on into machine buffers to wait for issue) also always in- order - Disatch some@mes used to mean issue, but not in these lectures CS252, Sring 2014, Lecture 6 8

9 In- Order Versus Out- of- Order Issue In- order issue: - Issue stalls on RAW deendencies or structural hazards, or ossibly WAR/WAW hazards - Instruc@on cannot issue to execu@on units unless all receding instruc@ons have issued to execu@on units Out- of- order issue: - Instruc@ons disatched in rogram order to reserva4on sta4ons (or other forms of instruc4on buffer) to wait for oerands to arrive, or other hazards to clear - While earlier instruc@ons wait in issue buffers, following instruc@ons can be disatched and issued out- of- order CS252, Sring 2014, Lecture 6 9

10 In- Order versus Out- of- Order ComleJon All but the simlest machines have out- of- order due to different latencies of units and desire to byass values as soon as available Classic RISC 5- stage integer ieline just barely has in- order - Load takes two cycles, but following one- cycle integer o comletes at not earlier - Adding ielined FPU immediately brings OoO comle@on CS252, Sring 2014, Lecture 6 10

11 In- Order versus Out- of- Order Commit In- order commit suorts recise tras, standard today - Some roosals to reduce the cost of in- order commit by re@ring some instruc@ons early to comact reorder buffer, but this is just an o@mized in- order commit Out- of- order commit was effec@vely what early OoO machines imlemented (imrecise tras) as comle@on irrevocably changed machine state CS252, Sring 2014, Lecture 6 11

12 OoO Design Choices Where are - Part of reorder buffer, or in searate issue window? - Distributed by func@onal units, or centralized? How is register renaming erformed? - Tags and data held in reserva@on sta@ons, with searate architectural register file - Tags only in reserva@on sta@ons, data held in unified hysical register file CS252, Sring 2014, Lecture 6 12

13 Oldest Free v v v v v i i i i i CS252, Sring 2014, Lecture 6 Data- in- ROB Design (HP PA8000, PenJum Pro, Core2Duo, Nehalem) Ocode Ocode Ocode Ocode Ocode Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Tag Src1 Tag Src2 Reg Result Excet? Managed as circular buffer in rogram order, new instruc@ons disatched to free slots, oldest instruc@on commibed/reclaimed when done ( bit set on result) Tag is given by index in ROB (Free ointer value) In disatch, non- busy source oerands read from architectural register file and coied to Src1 and Src2 with resence bit set. Busy oerands coy tag of roducer and clear bit. Set valid bit v on disatch, set issued bit i on issue On comle@on, search source tags, set bit and coy data into src on tag match. Write result and exce@on flags to ROB. On commit, check exce@on status, and coy result into architectural register file if no tra. On tra, flush machine and ROB, set free=oldest, jum to handler

14 Rename table associated with architectural registers, managed in decode/disatch CS252, Sring 2014, Lecture 6 Managing Rename for Data- in- ROB Tag Tag Tag Tag Value Value Value Value If bit set, then use value in architectural register file Else, tag field indicates instruc@on that will/has roduced value For disatch, read source oerands <,tag,value> from arch. regfile, and also read <,result> from roducing instruc@on in ROB, byassing as needed. Coy to ROB Write des@na@on arch. register entry with <0,Free,_>, to assign tag to ROB index of this instruc@on On commit, udate arch. regfile with <1, _, Result> On tra, reset table (All =1) One entry er arch. register 14

15 Data Movement in Data- in- ROB Design Read oerands during decode Write sources in disatch ROB Architectural Register File Source Oerands Result Data Write results at commit Read results for commit Byass newer values at disatch Read oerands at issue Units Write results at CS252, Sring 2014, Lecture 6 15

16 Unified Physical Register File (MIPS R10K, Alha 21264, Intel PenGum 4 & Sandy/Ivy Bridge) Rename all architectural registers into a single hysical register file during decode, no register values read Func@onal units read and write from single unified register file holding commibed and temorary registers in execute Commit only udates maing of architectural register to hysical register, no data movement Decode Stage Register Maing Unified Physical Register File Commibed Register Maing Read oerands at issue Write results at comle@on Func@onal Units CS252, Sring 2014, Lecture 6 16

17 LifeJme of Physical Registers Physical regfile holds commibed and values Physical registers decouled from ROB entries (no data in ROB) ld x1, (x3) addi x3, x1, #4 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x11) Rename ld, (Px) addi,, #4 sub, Py, Pz add,, ld P5, () add P6, P5, sd P6, () ld P7, (Pw) When can we reuse a hysical register? When next writer of same architectural register commits CS252, Sring 2014, Lecture 6 17

18 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) (LPRd requires third read ort on Rename Table for each instruction) CS252, Sring 2014, Lecture 6 18

19 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 19

20 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <R1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 20

21 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <R1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 21

22 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 x add x3 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 22

23 Physical Register Management Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 P5 P6 P7 P8 Physical Regs <x6> <x7> <x3> <x1> Free List Pn ROB use ex o x ld 1 PR1 P7 2 PR2 Rd x1 LPRd P8 PRd x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) CS252, Sring 2014, Lecture 6 23

24 Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 Physical Register Management P5 P6 P7 P8 Physical Regs <x1> <x6> <x7> <x3> <x1> Free List P8 Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd x x ld P7 x1 P8 x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Execute & Commit CS252, Sring 2014, Lecture 6 24

25 Rename Table x0 x1 P8 x2 x3 P7 x4 x5 x6 P5 x7 P6 Physical Register Management P5 P6 P7 P8 Physical Regs <x1> <x3> <x6> <x7> <x3> Free List P8 P7 Pn ROB use ex o 1 PR1 2 PR2 Rd LPRd PRd x x ld P7 x1 P8 x x addi x3 P7 x sub P6 P5 x6 P5 x add x3 x ld x6 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Execute & Commit CS252, Sring 2014, Lecture 6 25

26 MIPS R10K Tra Handling Rename table is reaired by unrenaming in reverse order using the PRd/LPRd fields The Alha had similar hysical register file scheme, but ket comlete rename table snashots for each in ROB (80 snashots total) - Flash coy all bits from snashot to ac@ve table in one cycle CS252, Sring 2014, Lecture 6 26

27 Reorder Buffer Holds AcJve InstrucJons (Decoded but not Commi`ed) (Older instrucgons) ld x1, (x3) add x3, x1, x2 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x1) (Newer instrucgons) ROB contents Commit Execute Fetch ld x1, (x3) add x3, x1, x2 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x1) Cycle t Cycle t + 1 CS252, Sring 2014, Lecture 6 27

28 Searate Issue Window from ROB The issue window holds only that have been decoded and renamed but not issued into Has register tags and resence bits, and ointer to ROB entry. use ex o 1 PR1 2 PR2 PRd ROB# Reorder buffer used to hold exce@on informa@on for commit. Oldest Free Done? Rd LPRd PC Excet? CS252, Sring 2014, Lecture 6 ROB is usually larger than issue window why? 28

29 Suerscalar Register Renaming During decode, allocated new hysical register Source oerands renamed to hysical register with newest value unit only sees hysical register numbers Inst 1 O Dest Src1 Src2 O Dest Src1 Src2 Inst 2 Udate Maing Write Ports Read Addresses Rename Table Read Data Register Free List O PDest PSrc1 PSrc2 O PDest PSrc1 PSrc2 Does this work? CS252, Sring 2014, Lecture 6 29

30 Suerscalar Register Renaming Inst 1 O Dest Src1 Src2 O Dest Src1 Src2 Inst 2 Udate Maing Must check for RAW hazards between instruc@ons issuing in same cycle. Can be done in arallel with rename O looku. Write Ports PDest Read Addresses Rename Table Read Data PSrc1 PSrc2 O PDest PSrc1 PSrc2 Register Free List MIPS R10K renames 4 serially- RAW- deendent insts/cycle =? =? CS252, Sring 2014, Lecture 6 30

31 Acknowledgements This course is artly insired by revious MIT and Berkeley CS252 comuter architecture courses created by my collaborators and colleagues: - Arvind (MIT) - Joel Emer (Intel/MIT) - James Hoe (CMU) - John Kubiatowicz (UCB) - David Paberson (UCB) CS252, Sring 2014, Lecture 6 31

CS252 Graduate Computer Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors

CS252 Graduate Computer Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors CS252 Graduate Comuter Architecture Fall 2015 Lecture 6: Modern Out- of- Order Processors Krste Asanovic krste@eecs.berkeley.edu htt://inst.eecs.berkeley.edu/~cs252/fa15 Suercomuters Defini=ons of a suercomuter: