Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505

Size: px
Start display at page:

Download "Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505"

Transcription

1 Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505

2 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng- Point Opera@ons Loop Unrolling Correla@ng and Tournament Branch Predic@on Dynamic Scheduling: Scoreboard Dynamic Scheduling: Tomasulo: Hardware Based Specula@on Mul@ple Issue: VLIW 3.7 Mul@ple Issue: Superscalar(specula@ve) 3.8 Branch Target Buffer: Principles 3.9

3 Issue Processors data and control stalls eliminated with dynamic scheduling and Performance close to 1 IPC To go beyond 1 IPC, more than 1 instruc@on must be issued (and completed) per cycle Three major flavors of mul@ple issue: 1. Sta@cally scheduled superscalar processors 2. VLIW (very long instruc@on word) processors 3. Dynamically scheduled superscalar processors

4 Superscalar Processors issue varying number of per clock Either in- order scheduled) or out- of- order (dynamically scheduled) VLIW Processors issue fixed number of forma_ed as one large with parallelism explicitly indicated by inst. Inherently scheduled by compiler High similarity to superscalar!

5 Overview of Issue Processors Common Name Superscalar Superscalar (dynamic) Superscalar Issue structure Hazard detec1on Scheduling Dis1nguishing characteris1c Examples Dynamic Hardware In- order Mostly Embedded : MIPS, ARM (e.g. Cortex- A8) Dynamic Hardware Dynamic Some out- of- order execu@on, but no specula@on Dynamic Hardware Dynamic with specula@on VLIW/LIW Sta@c Primarily sodware EPIC Primarily sta@c Primarily sodware Sta@c Mostly sta@c Out- of- order execu@on with specula@on All hazards determined by compiler (oden implicitly) All hazards determined and indicated explicitly by compiler None so far Intel Core ix, AMD Phenom, IBM POWER7 Mostly signal processing, e.g. TI C6x Itanium

6 Basic concept of VLIW VLIW uses independent units (as but VLIW packages for each FU into one very large Overheads grow with amount of parallelism Example: VLIW with 1x Integer FU, 2x Load/Store FU and 2x FP FU 16 to 24 bit opcode per FU: bit inst. word

7 VLIW: Example Program with loop (same as last week): Loop: L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) DADDIU R1, R1, -8 BNE R1, R2, Loop VLIW requires heavy unrolling for being efficient Q: How many unrolls to prevent stalls? Assuming these latencies: Source Instruc1on Des1na1on instruc1on FP ALU op FP ALU op 3 FP ALU op Store double 2 Load double FP ALU op 1 Load double Store double 0 Latency cycles

8 VLIW: How many unrolls? Memory Unit 1 Memory Unit 2 FP unit 1 FP unit 2 Int unit L.D F0,0(R1) ADD.D F4,F0,F2 S.D F4,0(R1) DADDUI R1,R1,#-56 BNE R1,R2,Loop

9 VLIW: Book proposes 7 Memory Unit 1 Memory Unit 2 FP unit 1 FP unit 2 Int unit L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) L.D F18,-32(R1) L.D F22,-40(R1) ADD.D F4,F0,F2 ADD.D F8,F6,F2 L.D F18,-48(R1) ADD.D F12,F10,F2 ADD.D F16,F14,F2 ADD.D F20,F18,F2 ADD.D F24,F22,F2 S.D F4,0(R1) S.D F8,-8(R1) ADD.D F28,F26,F2 S.D F12,-16(R1) S.D F16,-24(R1) S.D F24,+24(R1) S.D F20,+16(R1) S.D F28,+8(R1) 23 instruc@ons in 9 cycles: 2.5 IPC DADDUI R1,R1,#-56 BNE R1,R2,Loop

10 VLIW: Disadvantages Increase in code size Non- fully filled result in excess code VLIW requires heavy unrolling (increasing code) in lockstep: components like caches can cause processor to stall Code if number/delay of units varies across processor families, code needs to be recompiled for each machine

11 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng- Point Opera@ons Loop Unrolling Correla@ng and Tournament Branch Predic@on Dynamic Scheduling: Scoreboard Dynamic Scheduling: Tomasulo: Hardware Based Specula@on Mul@ple Issue: VLIW 3.7 Mul@ple Issue: Superscalar(specula@ve) 3.8 Branch Target Buffer: Principles 3.9

12 Pukng it all together: Goal: combine Issue with Dynamic Scheduling and Example: machine extended to issue 2 instruc@ons per cycle Separate load/store unit, integer unit and FP unit, each can ini@ate 1 instruc@on per cycle Instruc@ons issued in- order to prevent program seman@cs viola@ons

13 Issue Tomasulo + Specula@on Architecture essen@ally the same However, components must be redundant E.g. CDB must broadcast up to 2 (N) results per cycle Issue and comple@on logic become more complex

14 Issue with Problem: scheduled in parallel may depend on each other Tables to be updated in parallel Either by pipelining table updates (issuing logic) or By widening issue logic (or both) à Issuing step becomes bo_leneck, as complexity grows with N 2 (for N IPC) Back- end of pipeline must be able to complete and commit mul@ple instruc@ons per clock Easier since dependences were resolved during issue

15 Issue: Steps All steps must be performed in one clock cycle: 1. Assign and ROB for every to be issued next. (Possible without knowing for ROB and for RS by the number of per unit class). that cannot be issued (no FU available) are delayed in- order. 2. Analyze dependences of in the issued instruc.on bundle. 3. If in bundle depends on earlier one in bundle, use ROB number to update table of dependent Otherwise just as before.

16 Issue+ Example Example Program: LOOP: LD R2,0(R1) DADDIU R2,R2,#1 SD R2,0(R1) DADDIU R1,R1,#8 BNE R2,R3,LOOP Q: Mul1ple issue performance with and without Specula1on? Without 2 nd LD must wait for BNE to execute With 2 nd LD can execute as soon as R1 is updated à Data dependent branches limit performance without specula@on

17 Example: without cycle 1 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) 1 1 DADDIU R2,R2,#1 1 1 SD R2,0(R1) 1 DADDIU R1,R1,#8 1 BNE R2,R3,LOOP 2 LD R2,0(R1) 2 DADDIU R2,R2,#1 2 SD R2,0(R1) 2 DADDIU R1,R1,#8 2 BNE R2,R3,LOOP 3 LD R2,0(R1) 3 DADDIU R2,R2,#1 3 SD R2,0(R1) 3 DADDIU R1,R1,#8 3 BNE R2,R3,LOOP

18 Example: without cycle 2 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,#1 1 1 SD R2,0(R1) 2 1 DADDIU R1,R1,#8 2 1 BNE R2,R3,LOOP 2 LD R2,0(R1) 2 DADDIU R2,R2,#1 2 SD R2,0(R1) 2 DADDIU R1,R1,#8 2 BNE R2,R3,LOOP 3 LD R2,0(R1) 3 DADDIU R2,R2,#1 3 SD R2,0(R1) 3 DADDIU R1,R1,#8 3 BNE R2,R3,LOOP

19 Example: without cycle 3 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,#1 1 1 SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 3 2 LD R2,0(R1) 2 DADDIU R2,R2,#1 2 SD R2,0(R1) 2 DADDIU R1,R1,#8 2 BNE R2,R3,LOOP 3 LD R2,0(R1) 3 DADDIU R2,R2,#1 3 SD R2,0(R1) 3 DADDIU R1,R1,#8 3 BNE R2,R3,LOOP

20 Example: without cycle 4 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,#1 1 1 SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 3 2 LD R2,0(R1) 4 2 DADDIU R2,R2,#1 4 2 SD R2,0(R1) 2 DADDIU R1,R1,#8 2 BNE R2,R3,LOOP 3 LD R2,0(R1) 3 DADDIU R2,R2,#1 3 SD R2,0(R1) 3 DADDIU R1,R1,#8 3 BNE R2,R3,LOOP

21 Example: without cycle 5 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 3 2 LD R2,0(R1) 4 2 DADDIU R2,R2,#1 4 2 SD R2,0(R1) 5 2 DADDIU R1,R1,#8 5 2 BNE R2,R3,LOOP 3 LD R2,0(R1) 3 DADDIU R2,R2,#1 3 SD R2,0(R1) 3 DADDIU R1,R1,#8 3 BNE R2,R3,LOOP

22 Example: without cycle 6 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 3 2 LD R2,0(R1) 4 2 DADDIU R2,R2,#1 4 2 SD R2,0(R1) 5 2 DADDIU R1,R1,#8 5 2 BNE R2,R3,LOOP 6 3 LD R2,0(R1) 3 DADDIU R2,R2,#1 3 SD R2,0(R1) 3 DADDIU R1,R1,#8 3 BNE R2,R3,LOOP

23 Example: without cycle 7 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) 4 2 DADDIU R2,R2,#1 4 2 SD R2,0(R1) 5 2 DADDIU R1,R1,#8 5 2 BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 3 DADDIU R1,R1,#8 3 BNE R2,R3,LOOP

24 Example: without cycle 8 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,#1 4 2 SD R2,0(R1) 5 2 DADDIU R1,R1,# BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP

25 Example: without cycle 9+10 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,#1 4 2 SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP 9

26 Example: without cycle Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP 9

27 Example: without cycle 13 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP 9

28 Example: without cycle 14 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,# BNE R2,R3,LOOP 9

29 Example: without cycle 15 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,#1 7 3 SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9

30 Example: without cycle Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9

31 Example: without cycle 18 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9

32 Example: without cycle 19 Itera1on Instruc1on Issue Execute Mem access Write CDB 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9 19

33 Example: with Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) 4 2 DADDIU R2,R2,#1 4 Nothing changes up to here 2 SD R2,0(R1) 5 2 DADDIU R1,R1,#8 5 2 BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP 9

34 Example: with cycle 5 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) 4 5 Specula@ve Execu@on starts in cycle 5 2 DADDIU R2,R2,#1 4 2 SD R2,0(R1) 5 2 DADDIU R1,R1,#8 5 2 BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 Specula1ve Execu1onè ROB needed 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP

35 Example: with cycle 6 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,#1 4 2 SD R2,0(R1) 5 6 Ok because of renaming 2 DADDIU R1,R1,# BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP

36 Example: with cycle 7 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,#1 4 Write only on commit 2 SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 6 3 LD R2,0(R1) 7 3 DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP

37 Example: with cycle 8 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 6 3 LD R2,0(R1) DADDIU R2,R2,#1 7 3 SD R2,0(R1) 8 3 DADDIU R1,R1,#8 8 3 BNE R2,R3,LOOP

38 Example: with cycle 9 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 6 3 LD R2,0(R1) DADDIU R2,R2,#1 7 3 SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9

39 Example: with cycle 10 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,#1 7 3 SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9

40 Example: with cycle 11 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9

41 Example: with cycle 12 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9

42 Example: with cycle 13 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP 9 13

43 Example: with cycle 14 Itera1o Instruc1on Issue Execute Mem access Write CDB Commit n 1 LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP LD R2,0(R1) DADDIU R2,R2,# SD R2,0(R1) DADDIU R1,R1,# BNE R2,R3,LOOP Commits almost 2 IPC aser 1 st itera1on

44 Example: with Example with code completes ader 14 cycles instead of 19 cycles. à Advantage for data- dependent branches However, can result in worse performance and much higher power

45 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng- Point Opera@ons Loop Unrolling Correla@ng and Tournament Branch Predic@on Dynamic Scheduling: Scoreboard Dynamic Scheduling: Tomasulo: Hardware Based Specula@on Mul@ple Issue: VLIW 3.7 Mul@ple Issue: Superscalar(specula@ve) 3.8 Branch Target Buffer: Principles 3.9

46 Branch Target Buffer (BTB) Issue requires high IF bandwidth Branch predic1on: predicts branch outcome only Even with best possible of branch outcome, have to wait for branch target address to be determined for IF

47 Branch Target Buffer (BTB) BTB decides whether undecoded is branch, and if so, predicts following PC

48 Branch Target Buffer (BTB) BTB decides whether undecoded is branch, and if so, predicts following PC BTB only contains info for control (Jumps and Branches) For all other (and predict not- taken branches), next PC is PC+4 How to update BTB?

49 BTB explained

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Static vs. Dynamic Scheduling

Static vs. Dynamic Scheduling Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007, Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

Superscalar Architectures: Part 2

Superscalar Architectures: Part 2 Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23 rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr) Computer Science and Engineering Seoul NaMonal University Download this

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed

More information

ILP: Instruction Level Parallelism

ILP: Instruction Level Parallelism ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal

More information

ECSE 425 Lecture 11: Loop Unrolling

ECSE 425 Lecture 11: Loop Unrolling ECSE 425 Lecture 11: Loop Unrolling H&P Chapter 2 Vu, Meyer Textbook figures 2007 Elsevier Science Last Time ILP is small within basic blocks Dependence and Hazards Change code, but preserve program correctness

More information

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches Session xploiting ILP with SW Approaches lectrical and Computer ngineering University of Alabama in Huntsville Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar,

More information

COSC 6385 Computer Architecture - Instruction Level Parallelism (II)

COSC 6385 Computer Architecture - Instruction Level Parallelism (II) COSC 6385 Computer Architecture - Instruction Level Parallelism (II) Edgar Gabriel Spring 2016 Data fields for reservation stations Op: operation to perform on source operands S1 and S2 Q j, Q k : reservation

More information

The basic structure of a MIPS floating-point unit

The basic structure of a MIPS floating-point unit Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from

More information

EECC551 Exam Review 4 questions out of 6 questions

EECC551 Exam Review 4 questions out of 6 questions EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving

More information

ECE 505 Computer Architecture

ECE 505 Computer Architecture ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems

More information

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections ) Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Computer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution

Computer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution CSCE 6 (Fall 07) Computer Architecture Homework Set # COVER SHEET Please turn in with your own solution Eun Jung Kim Write your answers on the sheets provided. Submit with the COVER SHEET. If you need

More information

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types

More information

ECSE 425 Lecture 25: Mul1- threading

ECSE 425 Lecture 25: Mul1- threading ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

Lecture: Static ILP. Topics: loop unrolling, software pipelines (Sections C.5, 3.2)

Lecture: Static ILP. Topics: loop unrolling, software pipelines (Sections C.5, 3.2) Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2) 1 Loop Example for (i=1000; i>0; i--) x[i] = x[i] + s; Source code FPALU -> any: 3 s FPALU -> ST : 2 s IntALU -> BR :

More information

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) 1 EEC 581 Computer Architecture Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences Chapter 3: Instruction Level Parallelism (ILP) and its exploitation Pipeline CPI = Ideal pipeline CPI + stalls due to hazards invisible to programmer (unlike process level parallelism) ILP: overlap execution

More information

EEC 581 Computer Architecture. Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW)

EEC 581 Computer Architecture. Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW) EEC 581 Computer Architecture Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. he memory is word addressable he size of the cache is 8 blocks; each block is 4 words (32 words cache).

More information

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 10 Compiler Techniques / VLIW Israel Koren ECE568/Koren Part.10.1 FP Loop Example Add a scalar

More information

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

EITF20: Computer Architecture Part3.2.1: Pipeline - 3 EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09

More information

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 ILP techniques Hardware Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Metodologie di Progettazione Hardware-Software

Metodologie di Progettazione Hardware-Software Metodologie di Progettazione Hardware-Software Advanced Pipelining and Instruction-Level Paralelism Metodologie di Progettazione Hardware/Software LS Ing. Informatica 1 ILP Instruction-level Parallelism

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit ILP 1 / 44 ILP techniques

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Lecture: Pipeline Wrap-Up and Static ILP

Lecture: Pipeline Wrap-Up and Static ILP Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Multicycle

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

COSC4201 Instruction Level Parallelism Dynamic Scheduling

COSC4201 Instruction Level Parallelism Dynamic Scheduling COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism

More information

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,

More information

TDT 4260 TDT ILP Chap 2, App. C

TDT 4260 TDT ILP Chap 2, App. C TDT 4260 ILP Chap 2, App. C Intro Ian Bratt (ianbra@idi.ntnu.no) ntnu no) Instruction level parallelism (ILP) A program is sequence of instructions typically written to be executed one after the other

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS Advanced Computer Architecture- 06CS81 Hardware Based Speculation Tomasulu algorithm and Reorder Buffer Tomasulu idea: 1. Have reservation stations where register renaming is possible 2. Results are directly

More information

Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques,

Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques, Lecture 18: Instruction Level Parallelism -- Dynamic Superscalar, Advanced Techniques, ARM Cortex-A53, and Intel Core i7 CSCE 513 Computer Architecture Department of Computer Science and Engineering Yonghong

More information

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches

計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 4.1 Basic Compiler Techniques for Exposing ILP 計算機結構 Chapter 4 Exploiting Instruction-Level Parallelism with Software Approaches 吳俊興高雄大學資訊工程學系 To avoid a pipeline stall, a dependent instruction must be

More information

Computer Architecture Practical 1 Pipelining

Computer Architecture Practical 1 Pipelining Computer Architecture Issued: Monday 28 January 2008 Due: Friday 15 February 2008 at 4.30pm (at the ITO) This is the first of two practicals for the Computer Architecture module of CS3. Together the practicals

More information

CSE 502 Graduate Computer Architecture

CSE 502 Graduate Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 CSE 502 Graduate Computer Architecture Lec 15-19 Inst. Lvl. Parallelism Instruction-Level Parallelism and Its Exploitation Larry Wittie

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches

More information

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012 Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code

More information

ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW

ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW Computer Architecture ESE 545 Computer Architecture Instruction-Level Parallelism (ILP): Speculation, Reorder Buffer, Exceptions, Superscalar Processors, VLIW 1 Review from Last Lecture Leverage Implicit

More information

Lecture: Static ILP. Topics: predication, speculation (Sections C.5, 3.2)

Lecture: Static ILP. Topics: predication, speculation (Sections C.5, 3.2) Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2) 1 Scheduled and Unrolled Loop Loop: L.D F0, 0(R1) L.D F6, -8(R1) L.D F10,-16(R1) L.D F14, -24(R1) ADD.D F4, F0, F2 ADD.D F8, F6,

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

CS433 Homework 3 (Chapter 3)

CS433 Homework 3 (Chapter 3) CS433 Homework 3 (Chapter 3) Assigned on 10/3/2017 Due in class on 10/17/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 15 Very Long Instruction Word Machines Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle

More information

Scoreboard information (3 tables) Four stages of scoreboard control

Scoreboard information (3 tables) Four stages of scoreboard control Scoreboard information (3 tables) Instruction : issued, read operands and started execution (dispatched), completed execution or wrote result, Functional unit (assuming non-pipelined units) busy/not busy

More information

CMSC411 Fall 2013 Midterm 2 Solutions

CMSC411 Fall 2013 Midterm 2 Solutions CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has

More information

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Dealing With Control Hazards Simplest solution to stall pipeline until branch is resolved and target address is calculated

More information

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #: 16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam October 16, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

CS 2410 Mid term (fall 2018)

CS 2410 Mid term (fall 2018) CS 2410 Mid term (fall 2018) Name: Question 1 (6+6+3=15 points): Consider two machines, the first being a 5-stage operating at 1ns clock and the second is a 12-stage operating at 0.7ns clock. Due to data

More information

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences

More information

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar Complex Pipelining COE 501 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Diversified Pipeline Detecting

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence

More information