CS 152, Spring 2012 Section 8

Size: px

Start display at page:

Download "CS 152, Spring 2012 Section 8"

Erin Stokes
5 years ago
Views:

1 CS 152, Spring 2012 Section 8 Christopher Celio University of California, Berkeley

2 Agenda More Out- of- Order

3 Intel Core 2 Duo (Penryn) Vs. NVidia GTX 280 Intel Core 2 Duo (Penryn) dual- core nm 410 million transistors ~2GHz 3 or 6MB of cache Watts 107mm 2 NVidia GTX 280 each core is 22mm 2 L2 SRAM is 6mm 2 /MB 10 core(?) (240 stream processors) nm 1.4 Billion transistors 576mm MHz(core clock) 236 Watts!!!

4 Quiz 2 Will be returned this Tuesday

5 Out-of-Order Control Complexity: MIPS R10000 Control Logic [ SGI/MIPS Technologies Inc., 1995 ] March 14, 2011 CS152, Spring

6 Out of Order Processors Yeager. The MIPS R10000 Superscalar Microprocesor. IEE Micro. 1996

7 Out of Order Processors

8 BOOM: A Single Issue Slot Question 1 Br Logic = = collect CPI with BHT, without, compare to 5- stage Resolve = = Kill in- order Question 2 UOP Code BrMask Ctrl... Val RDst RS1 p1 ready issue slot is valid Probe the Instruction Window to potential benefit issue of dual issue Question 3 (From the register file's two write ports) WDest0 WDest1 Probe IW for dual issue of ALU/Mem ops Control Signals Physical Destination Register Physical Source Registers RS2 p2 ready request Issue Select Logic Issued to the Register Read stage 8

9 BOOM: A Single Issue Slot each instruction gets a br mask... allows us to kill instructions Br Logic Resolve or Kill (From the register file's two write ports) WDest0 WDest1 UOP Code BrMask Ctrl... Val RDst RS1 p1 issue slot is valid = = the register file has two write-ports, so watch both ports write addresses ready RS2 = = p2 ready each slot asserts request when ready to fire request issue one slot gets the issue Issue Select Logic uop holds the micro-op code (is it a LD, an ADD, etc.) Control Signals Physical Destination Register Issued to the Register Read stage Physical Source Registers (note: I show a bus implementation, but it s actually implemented with 9 a bunch of muxes)

10 OOO Styles

11 Data-in-ROB Design (HP PA8000, Intel Pentium Pro, Core2 Duo & Nehalem) Register File holds only committed state Reorder buffer Ins# use exec op p1 src1 p2 src2 pd dest data t 1 t 2.. t n Load Unit FU FU FU Store Unit Commit < t, result > On dispatch into ROB, ready sources can be in regfile or in ROB dest (copied into src1/src2 if ready before dispatch) On completion, write to dest field and broadcast to src fields. On issue, read from ROB src fields March 9, 2011 CS152, Spring

12 Unified Physical Register File (MIPS R10K, Alpha 21264, Intel Pentium 4 & Sandy Bridge) Rename all architectural registers into a single physical register file during decode, no register values read Functional units read and write from single unified register file holding committed and temporary registers in execute Commit only updates mapping of architectural register to physical register, no data movement Decode Stage Register Mapping Read operands at issue Unified Physical Register File Commited Register Mapping Write results at completion Functional Units March 9, 2011 CS152, Spring

13 21264 Instruction Reordering As mentioned earlier, uses explicit renaming, as opposed to data- in- ROB design What does ROB hold?

14 BOOM Fetch Decode Rename Dispatch Issue RegisterRead Execute Memory WB Branch Prediction Br Logic Resolve Branch Fetch Fetch Buffer Decode Register Rename Issue Window Unified Register File 2R,2W ALU LAQ ROB SAQ addr wdata Data Mem rdata Commit SDQ 14

DEC Alpha 21264 1996/1997 single- core 4- way out- of- order highly speculative 7- stage up to 80 instructions in flight tournament branch predictor 15.

15 DEC Alpha /1997 single- core 4- way out- of- order highly speculative 7- stage up to 80 instructions in flight tournament branch predictor 15.2M transistors 6M for logic rest is caching, history tables 350 nm 600 MHz 64KB I$, 64KB D$ (on- chip) 1 to 16MB L2$ (off- chip) 314mm 2 die (fairly large)

16 DEC Alpha 21264

17 21264 Register Renaming Registers are renamed, then instructions are inserted into the issue queue (window) Map table backed up on every in- flight insn

18 21264 Register Renaming What hazards does renaming obviate? In what situations is renaming useful? If you had to choose between branch prediction and renaming, which would you pick?

19 21264 Register Renaming What hazards does renaming obviate? WAR, WAW In what situations is renaming useful? If you had to choose between branch prediction and renaming, which would you pick?

20 21264 Register Renaming What hazards does renaming obviate? WAR, WAW In what situations is renaming useful? Code with ILP and name dependencies: loops If you had to choose between branch prediction and renaming, which would you pick?

21 21264 Register Renaming What hazards does renaming obviate? WAR, WAW In what situations is renaming useful? Code with ILP and name dependencies: loops If you had to choose between branch prediction and renaming, which would you pick? Not much ILP within a basic block, so renaming isn t too useful without branch prediction

22 21264 Superscalar Execution couldn t fit full bypassing into one clock cycle Instead, they fully bypass within each of two clusters; inter- cluster bypass takes another cycle

23 Question: Stores When are stores sent to memory? at commit time Why are stores saved in a store buffer before commit time? so they can be forwarded to dependent loads

24 val SDQ data SAQ BOOM: val addr LD/ST Unit addr = = = = LAQ val st_mask 4 4 sta_val std_val st_addr_ st_addr_ st_addr_ eq eq eq ld_val LD/ST Compare st_mask ld_is_rdy ld_is_byp byp_idx only showing comparision logic for one Load load is ready to fire load can be bypassed out of SDQ location in SDQ to get ld data from addr wdata Data Mem rdata to RF

25 BOOM Fetch Decode Rename Dispatch Issue RegisterRead Execute Memory WB Branch Prediction Br Logic Resolve Branch Fetch Fetch Buffer Decode Register Rename Issue Window Unified Register File 2R,2W ALU LAQ single issue 6- stage full branch speculation (BHT) magic, 1- cycle memory (no caches) no bypasses no floating point ROB Commit no exceptions 25 SAQ SDQ addr wdata Data Mem rdata

26 Memory Ordering in the To execute the critical instruction path quickly, want to execute loads ASAP Initially, loads speculatively bypass stores On a misspeculation, set a wait bit for that load s PC, so it will behave conservatively from then on Clear wait bits periodically

27 Speculation in the What does the speculate on? Next I$ line/way Branches, indirect jumps Exceptions Load/Store ordering Load hit/miss Shortens hit time by a cycle Anything else?

28 Pentium ~mark/330/p6.html Pentium processor

29 Questions?

CS 152, Spring 2011 Section 8

CS 152, Spring 2011 Section 8 Christopher Celio University of California, Berkeley Agenda Grades Upcoming Quiz 3 What it covers OOO processors VLIW Branch Prediction Intel Core 2 Duo (Penryn) Vs. NVidia