" # " $ % & ' ( ) * + $ " % '* + * ' "

Similar documents
Multithreaded Processors. Department of Electrical Engineering Stanford University

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019

Handout 2 ILP: Part B

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

November 7, 2014 Prediction

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

Exploitation of instruction level parallelism

Hardware-Based Speculation

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

Hardware-Based Speculation

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Simultaneous Multithreading Processor

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )

E0-243: Computer Architecture

Processor Architecture

TDT 4260 lecture 7 spring semester 2015

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

Hardware-based Speculation

LSU EE 4720 Dynamic Scheduling Study Guide Fall David M. Koppelman. 1.1 Introduction. 1.2 Summary of Dynamic Scheduling Method 3

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Advanced issues in pipelining

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1)

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

CS425 Computer Systems Architecture

5008: Computer Architecture

Lecture 14: Multithreading

Superscalar Processor Design

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

Four Steps of Speculative Tomasulo cycle 0

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

Lecture: Out-of-order Processors

Super Scalar. Kalyan Basu March 21,

Lecture 11: Out-of-order Processors. Topics: more ooo design details, timing, load-store queue

Pentium IV-XEON. Computer architectures M

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

Chapter 4 The Processor 1. Chapter 4D. The Processor

Pipelining to Superscalar

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others

Static vs. Dynamic Scheduling

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Metodologie di Progettazione Hardware-Software

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Announcements. EE382A Lecture 6: Register Renaming. Lecture 6 Outline. Dynamic Branch Prediction Using History. 1. Branch Prediction (epilog)

CS425 Computer Systems Architecture

CS 152 Computer Architecture and Engineering

CS 152, Spring 2011 Section 8

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

TDT 4260 TDT ILP Chap 2, App. C

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars

Computer System Architecture Quiz #2 April 5th, 2019

Out of Order Processing

Instruction Level Parallelism

Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques,

Hardware-based Speculation

Simultaneous Multithreading (SMT)

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Chapter. Out of order Execution

Spring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic

CS 152 Computer Architecture and Engineering

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Multiple Instruction Issue and Hardware Based Speculation

Multi-cycle Instructions in the Pipeline (Floating Point)

Hyperthreading Technology

Chapter 4. The Processor

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II

Lecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue

CS 152, Spring 2012 Section 8

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Current Microprocessors. Efficient Utilization of Hardware Blocks. Efficient Utilization of Hardware Blocks. Pipeline

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji

ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW

Case Study IBM PowerPC 620

Superscalar Processor

Transcription:

! )! # & ) * + * + * & *,+,-

Update Instruction Address IA Instruction Fetch IF Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Program Counter Instruction Register Register File MUX PC Update Instruction Cache Decoder Execution Units Data Cache MUX Forwarding Paths! * - * - +/ + + / 01 01 * + & 2001 r1r2 + r3 r3mem[r1 + 8] r5r3 - r4 Cycle 1 2 3 4 5 6 7 8 9 10 IA IF ID EX ME WB IA IF ID EX ME WB IA IF ID EX ME WB Figure 2 Three instructions flow down a pipeline; the first forwards data to the second as shown with an arrow) The second instruction first stalls for one cycle, then forwards data to the third * + 3 # # # 2!!#!3#3 # )2/456/7584 # #

* / 9, 9,!*4: ; VLIW Functional Unit Instr Fetch Decode Functional Unit Memory Access Register File Memory Access # # ; <-= * : * + * >? * : >? >?

Physical Register Files) Branch Predictor I-cache Fetch buffer Decode Pipeline Issue F D Register Rename D Buffer I Exec Unit Exec Unit D Exec Unit to I-cache Load Queue Store Queue L1 Data Cache MSHRs L2 Cache to main memory Reorder Buffer Window) & R!& * : ) @ * * : 3) 3)3) >? * @ ;!!8555</+= @ 3)

!!) * : * A 3) 3) * B ** 3) 3) C & * B * place exceptions in entry mark entry when complete reserve entry at tail when dispatched Exceptions Reg Mapping Prog Counter 0000 0000 r7 p5 r3 p6 Store 0 0 24 20 remove from head if complete; STOP if exception present Complete 1 0 Figure 5 Reorder buffer; instructions are dispatched into the tail and exit from the head only after they have completed 3) **

6 3 3)) * : ) 6 +, D * 4 * loop: Static Instructions r3 memr4+r2) r7 memr5+r2) r7 r7 * r3 r1 r1-1 memr6+r2) r7 r2 r2 + 8 PC loop; r1!=0 Branch Predict & Fetch Dynamic Instruction Stream r3 memr4+r2) r7 memr5+r2) r7 r7 * r3 r1 r1-1 memr6+r2) r7 r2 r2 + 8 PC loop; r1!=0 r3 memr4+r2) r7 memr5+r2) r7 r7 * r3 r1 r1-1 memr6+r2) r7 r2 r2 + 8 PC loop; r1!=0 Figure 6 A block of instructions on the left) are fetched with the benefit of branch prediction) to form the dynamic instruction stream shown at right A branch instruction appears as an assignment to the program counter PC) ; * 7 >? *

) 3 2 3 2 * 7 * 7 :- + : A * 7 8 & / & 8 3 2 /8 / 8 * 8 r3 memr4+r2) r7 memr5+r2) Register MAP r1 p3 r2 p4 r3 p6 r4 p1 r5 p2 r6 p7 r7 p5 a) r3 memp1+p4) r3 memr4+r2) r7 memr5+r2) Register MAP r1 p3 r2 p4 r3 p8 r4 p1 r5 p2 r6 p7 r7 p5 b) p8 memp1+p4) p9 memp2+p4) Free Pool p8,p9,p10,p11,p12,p13, p14,p15,p16,p17,p18, P19,p20,p21,p22,p23,p24 Figure 7 The register renaming process a) first source registers access the logical-to-physical register map to find their current mappings b) then the first physical register in the free pool is assigned to the result register and the register map table is updated DA : * : 6,E /,E + B

* 8 * + 6 6 Renamed Stream dispatch issue complete commit p8 memp1+p4) 0 1 4 5 p9 memp2+p4) 0 2 5 6 p10 p9 * p8 0 5 10 11 p11 p3 1 0 1 2 11 memp7+p4)p10 1 3 12 13 p12 p4 + 8 1 2 3 13 PC loop; p11!=0 1 2 3 13 p13 memp1+p12) 2 4 7 13 p14 memp2+p12) 2 5 8 14 p15 p14 * p13 2 8 13 14 p16 p11 1 2 3 4 14 memp7+p12) p15 3 6 15 15 p17 p12 + 8 3 4 5 15 PC loop; p16!=0 3 4 5 15 p18 memp1+p17) 4 7 10 15 p19 memp2+p17) 4 8 11 16 p20 p19 * p18 4 11 16 17 p21 p16 1 4 5 6 17 memp7+p17) p20 5 9 18 19 p22 p17 + 8 5 6 7 19 Figure 8 Three iterations of the example instruction stream after renaming Dispatch, issue, complete, and commit cycles illustrate out-of-order instruction issue and in-order instruction commit # B 43 * 7 / 4 4 ++ & -/!# 6 6 6 /B!# C /B!# ; -

3) * B * 8 * F * 3)+4* 8 +F 3)!#B# 3) 3) 3) 3) * Register MAP r1 p21 r2 p22 r3 p18 r4 p1 r5 p2 r6 p7 r7 p20 Restore Register Map Register MAP r1 p16 r2 p17 r3 p18 r4 p1 r5 p2 r6 p7 r7 p14 Exceptions Reg Mapping Prog Counter 0000 r2 p17 6C 0000 68 0000 64 0000 0001 r1 r7 p16 p19 60 r7 p14 5C Restore Program Counter 5C Store 0 1 0 0 0 Complete 1 0 1 1 1 tail head Figure 9 Example of ROB restoring architected state after an exception The instruction at the head of the ROB has an exception The register mapping and PC are backed up, and the pending store instruction is flushed!/ 01##2,+ 6 #-

3 * D ),6 * 4 * 8 D 6 * +5 6,,),) &,- 2 ; 3 2 ;3 * * +5 3) & 3) )

MSHRs Data from Memory Miss to Memory Instruction Issue hit/miss Address Generation Loads MUX L1 Data Cache Data on hit) MUX Data to Processor TLB Store-to- Load Forward Data Load Address Buffer Enable Store-to-Load Forward Address Compare store addresses store data Store Commit from ROB store data Pending Complete Store Queue Commit Stores Coalescing) Store Buffers Figure 10 L1 data cache and buffering subsystem that allow load/store reordering with forwarding of load data * ; * +5

+& -& 2E1 / * ++ G *- H H*+ D * 8 address from Address Generation Logic store address load address load address Load Address Queue address SQ tag pending Compare1 tag match & pending Compare 2 address match & valid Enable Forward1 Store-to-Load Forward Data1 Enable Forward2 data tag from pipeline control logic store data from execution units store data address data Store-to-Load Forward Data2 Commit from ROB valid Store Queue Figure 11 Detailed drawing of load/store buffering and comparison logic

* +- * ++ #,H G address from Address Generation Logic load address Load Address Queue address address store address SQ tag SQ tag Commit from ROB pending forwarded Compare3 address match & not forwarded flush/restart data tag from pipeline control logic store data from execution units store data address data Commit from ROB valid Store Queue Figure 12 Portion of load/store unit that implements speculative issuing of load instructions before prior store addresses are known # </-=* <//=!!!+2

* * * 8 *6!#-/ @ * 6 / D * 3) @ 3 #+ 3) 3),! @ CI I *0<-5= C *!! Issue Width Linear Relationship Linear Relationship Linear Relationship ~ Quadratic Relationship I-Fetch Resources aciheved width) Commit Width Numbers of Functional Units ROB Size Linear Relationships Issue Buffer Size # Rename Registers Load/Store Buffer Sizes + 3) @

@ 3 + 3) @ @J +6/ @3) Processor Intel Core IBM Power4 MIPS R10000 Table 1 The relationship between window size ROB) and issue width for some real processors Intel PentiumPro Alpha 21264 AMD Opteron HP PA-8000 Intel Pentium 4 Reorder Buffer Size Issue Buffer Size Issue Width 32 36 20 log 2 ROB) log 2 Issue Width) Issue Buffer Size ROB Size A * +/ 3) @ 3) 6 * : 6+4 & 46 +- 6 / </:= #+ / 3! * 7#+ &

* - ) G / 3! THE 6600 BARREL AND SLOT +F45,E! ## 4455 </B= ## 4455 A ##4455!! +B 44556!! 0!!!!,E!!& >?>?3* +:!!&!!!!& A!!!!*!!& )!!,E *

I/O Programs in Barrel Memory Latency = 1 Barrel Rotation PC reg0 reg1 regn-1 SLOT Time-shared instruction control PC reg0 reg1 regn-1 ALU DENELCOR HEP Figure 14 CDC 6600 Barrel and Slot multi-threading ;0!</=;0! * +B ;0! 3 2*! opcode Scheduler Function Unit Main Memory PSW Queue PSW Instruction Memory reg addresses Register Memory operands Increment Delay nonmemory insts memory insts PSW Buffer pending memory results Function Units Figure 15 Block diagram of the Denelcor HEP 3 * +B!! 0!!

&!#!!! 2 6!!#!!! ;0!+-5!! ;! /!! ##4455!! ;0!2 ) 3)6 * +4! : # 2 <4=! : * +4 </4= -

L1 Cache Store Buffer I-Fetch Uop Queue Rename Queue Sched Register Read Execute Register Write Commit Prog Counters Trace Cache Allocate Registers Data Cache Reorder Buffer Figure 16 Intel Pentium 4 hyperthreading Registers,3E * +7 * * +7 address TId tag offset V TId tag data Compare == hit/miss Figure 17 A thread identifier TId) separates the entries belonging to different threads in a shared buffer or memory 3)6 D

>? ; & @ )!/ 2 6 * * * & >?>?*D * +8 * D

3 * +8D D D Objectives Policies Mechanisms Capacity Resource Bandwidth Resource Capacity Resource Bandwidth Resource Figure 18 Objectives, policies, and mechanisms 4!5 D 6 D 6 D D 6 D D * DD PERFORMANCE D D * DD @ D @ D! D *

@ FAIRNESS >? *! : <4= D & >? 4455!!! : 2 & <-7= & A >? @6 K )& * & # @ <+5= >? * D * D ; @ D D! : >?* )&<-+= ISOLATION )

6 # D D IMPLEMENTING OBJECTIVES * +8D D D 2 C>?</7= *! :! :! : ) /B/ )2!B D /!

BANDWIDTH SHARING ) + + J! : CAPACITY SHARING * +7!! :! :* +F! :,

Instruction Fetch Instruction Dispatch Instruction Issue Read Registers Execute Memory Access Write-Back Commit Program Counters Mechanisms part shared part part part part shared shared shared shared shared Round- Robin PRE-EMPTION Uop queue tracecache Round- Robin Rename/ Allocation Tables Issue Buffers FR-FCFS Policies Registers Execution part Ld/St Buffers shared Data Cache Figure 19 Pentium 4 hyper-threading mechanisms and policies shared shared Registers! FEEDBACK MECHANISMS & ; * @! A POLICY COORDINATION D * -5 * part Round- Robin part ROB

bandwidth resource capacity resource bandwidth resource capacity resource bandwidth resource capacity resource bandwidth resource Local Policy Local Policy Local Policy Local Policy Local Policy Local Policy Local Policy Global Policy Figure 20 Local policies manage local resources in accordance with a global policy * -+ bandwidth resource capacity resource bandwidth resource capacity resource bandwidth resource capacity resource bandwidth resource Policy Feedback Figure 21 A policy may incorporate a feedback mechanism that monitors the status at a later pipeline stage SCHEDULING GRANULARITY + + # * -- D Monitor Status *

! * --; ;0!##4455!!! : 2 @ * issue width cycles cycles cycles a) coarse-grain b) fine-grain c) simultaneous Figure 22 Multi-threaded scheduling policies G G 2 2 * -- 2 *! : G! : * +F ; THREAD SELECTION, ) 4455!! 33

-G,3E ; /G 0 /,* *#* ; *3*#*! : 1G WORK CONSERVATION J +2 +2 *! : J #! :! : CAPACITY POLICIES #B ) -

)!! : # * F#+ #B!2!#! ) J # * -/ >? * -/ * -/ # + 4

cache misses pipeline stalls thread 1 thread 2 fine-grain MT coarsegrain MT time clock cycles) Figure 23 Scheduling of two threads with fine-grain round-robin scheduling and coarse-grain switch-on-event scheduling E * * ; * -/ )23 4:9 /B+ D * -: * -:; * -: ;0!

thread 1 thread 2 fine-grain MT time clock cycles) Figure 24 Fine-grain multithreading of pipelines without forwarding hardware There are more gaps due to stalls in the individual threads However, fine-grain multi-threading is able to fill in most of the gaps #!2! A SCHEDULING GRANULARITY * -B # branch misprediction cache miss Processor Instructions per Cycle Figure 25 Superscalar processors instruction execution is interspersed with miss events branch mispredictions and cache/tlb misses) A *,+,+ *,-,+,-,)

J<+5=,- /55,+ # * )23 4:9 /B+2 </8=,/ <-8= & *,+! : 2 SINGLE THREAD POLICIES D D @ * +/0 *

@ & @ L&* @&* @ * -4 @ 4: * 8 @ 70 60 Total Issue Buffer Size 50 40 30 20 10 0 1 2 3 4 5 6 7 8 Active Threads Figure 26 Relationship between the number of active threads and the aggregate issue buffer size D D * D FETCH UNIT MECHANISMS AND POLICIES # &

D * @ * 2 ; * </F= </4= E 2 2 D 6 ; 2</4= ) <B= ; D 2 )2!B INSTRUCTION ISSUE POLICIES @

A 2<+:= /034* -7 #EA #EA#EA 6 5 Instructions per Cycle 4 3 2 1 Round Robin ICOUNT 0 2 4 6 8 Threads Figure 27 Performance comparison of Round-Robin and ICOUNT fetch policies in an 8-way SMT prrocessor from [reference]) RETIREMENT POLICIES A3) ** 3) 3) 3) )2!B FAIRNESS POLICIES D /:+ * ) 3

D D <-/= <+5= * )& <-+= * #EA EXPLICIT PRIORITY POLICIES ; *!B /B/ * /!05!2! ) 5 </5= D @ / 36 *

/*7 )23 4:9!!# )2 @ 3 4:9 Update Instruction Address IA Instruction Fetch IF Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Program Counter Instruction Buffer 16) Register File MUX PC Update Instruction Cache Decoder Execution Units Data Cache Branch Target Buffer 8) MUX Forwarding Paths Thread Switch Buffer 8) Figure 28 The IBM RS64 IV pipeline has conventional in-order pipeline stages It is a 4-way superscalar processor and has instruction buffers to reduce branch misprediction and thread switch delays!!# : B * -8)! : 3 4:9! * #3 # D,+,-,) ),-,+,+,-,+,+,-

@,),) ; @ # * -8 * -F,+ + ) A +& cycles: 0 1 2 3 4 5 6 7 8 9 10 11 12 Load 1 IA IF inst buffer ID EX ME WB Inst 1 IA IF inst buffer ID EX ME Inst 2 IA IF thread switch buffer ID EX ME WB miss => flush; thread-switch 3 cycle switch penalty Figure 29 Thread switch timing on a data cache miss The processor is 4-way superscalar, but to simplify the figure this is not shown * /5 2,+ +7M -

4 2 13 17 42 L1 Cache Misses IERAT Misses TLB Misses L2 Cache Misses Timeout Priority Miscellaneous 9 13 Figure 30 Causes for thread switches The IERAT serves as an instruction TLB BM 8992 EAA <+-=A ),++4N8N /2,- ; A #42! A Instruction Fetch IF Thread Select TS Instruction Decode ID Execute EX Memory Access ME Writeback Results WB Select Logic PC Instruction Buffers Select Logic Register File 4 threads MUX Store Buffers partitioned) PCs 4 threads Instruction Cache 2 Decoder Execution Units Data Cache MUX Instruction types Misses Traps Resource conflicts Thread Select Policy Forwarding Paths Figure 31 Block diagram of SUN Niagara multi-threaded processor pipeline A * /+,E, 6,

!#,3E G # & * /- 5+ 5 5,3E + -5 & 5# 5 + A/+ 5 5 55 ; 5 5 / 5

cycles: 0 1 2 3 4 5 6 7 load 0 TS ID EX ME WB add 1 TS ID EX ME WB load 1 TS ID EX ME WB add 0 ID EX ME WB TS hit/ miss? forward Figure 32 Example of Niagara thread scheduling The add instruction from thread 0 is issued speculatively assuming thread 0s load hits in the data cache) / : )2!B!B!: 0!B!B 2!B +87B2),-,/ #4 * // D G -:!B J# @ ** ) * *#*

Instruction Fetch Instruction Dispatch Instruction Issue Read Registers Execute Memory Access Write-Back Commit Mechanisms pooled part pooled part part part part pooled pooled pooled pooled pooled Branch Predictors Program Counters Round- Robin I-cache Priority Inst Buffers Dispatch Policy Rename Tables Issue Buffers FCFS Registers Execution Load Miss Queue occupancy part Ld/St Buffers pooled Data Cache pooled pooled Registers GCT occupancy Round- Robin pooled GCT Policies Figure 33 Resource management diagram for IBM Power5 @ /-: * // 0,5 +7 + * /: 0 5+ Figure 34 Thread performance for different setting of thread priorities in the IBM Power5 C+ @

- / * )2 #EA /:B J#,2H J#,2H * J#,2H,2H * #! 1 O3@># J!!?1 61-0 61-02+F8-+- +F 2 O 0 J > 2!? 1 777+FFB+45FG+4-: 3 )O > ;0!2 #?!03! 9+F8+-:+-:8 4 0 > 2 C! A!?7778 6+FF7 5 O0P098C!E P!#-55+ 6 2 P; 2 P O*-55-7 O2)P2!!#!# P )2O3-555 8 3! 8 3 #2 2-55-000 # #-5+ 9 O 2 @ #2 2 #! +: #! # -55BB+4-103J 2>* 02?2-554+:F+45-554 11,;)AN> # 2?777/ 7F8B +FF7

12! N A C /- 2!3#!000 2 26-55B-+-F 13,)! C ) # 2!-7 # O-555 14>0 # C * 2!?19 / 6 2+FF4+F+-5-15) >!B 2?:8; O-55BB5BB-+ 16*O#@> #3 2!?2-55:+7++8-173 N A O 2 >#D ## 2? 2-55:+FB-54 180 >) 2 C,# 2 ;?2-55:+8/+F: 19;2J>A -*?2 3A-554 20 0>2 2!?-557 21 03 N3 >3! 2!?!#-55/+B-B 22*O#@>!! 2!C ) 2?000#O-55478B7FF 23*O#@>H ;! 2! 0? 0002 O-55:-:/+ 24*O#@>C! 20?+8!! -55:7:8/ 25* O#@ > 0 3 2!?-55: 0-55:::B+ 26 2 O ) >;?-55+ 2-55+/+8/-7 27N, O JD 2* >) 2?-55+! A-55++4:+7+ 28!2 N D>)?-55- * J ;!! O-55-4774 293O #O; 9>?-/ 0003-55-+/:+:B 30 > O 2!?F!, -555-/:-::

31N>;!!85553 ##!E?77782 +FF7-7/- 32J#O0>2!?-B # +FF8+:-+B/ 33O> #2 2?# -554 34N ># # 2?!, 7 +FF4 35O0>! ##4455?*!!*O##--4 +F4://:5 3603P#C,,; ) * P-F 0006#2 2 2#3-F+FF4-:/: 37 P# # P777/ O+F8+ 38# 2A 3) >2 C #!? 0002 26-55B 39#> @ *2 ; 3?-- # O+FFB////::