Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1
|
|
- Adrian Williams
- 5 years ago
- Views:
Transcription
1 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining & Hazards Fall 2 Prof. Thomas Wenisch edu/courses/eecs4 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, ipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, and niversity of Wisconsin. Slide
2 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Announcements Reminders: HW # due Friday 9/4 Hand in at start of discussion Programming assignment # due Friday 9/4 Electronic hand in by midnight Tools: problem with vcs GI You can use the GI on relic or sunlogin Be sure to provide relative path to 4submit Slide 2
3 Readings Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar For today: Cramming ore Components onto ICs. G.E. oore H & P Chapter A. A.6 For onday: No new readings Slide 3
4 Outline: Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar nderstanding di the Execution Core. 3 s 5 stage pipeline (review) 2. Implementing pipeline interlocks (review) 3. Scoreboard scheduling (CDC 66) 4. Tomasulo s s OoO schedulingalgorithm (IB36) 5. Precise interrupts with a Reorder Buffer (P6) 6. odern OoO (IPS RK, Netburst) Slide 4
5 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Pipelining Idealism otivation: Increase throughput with little increase in hardware Repetition of Identical Operations The same operations are to be performed repeatedly on a large number of different inputs niform Suboperations The operation to be pipelined can be evenly partitioned into uniform latency suboperations Repetition of Independent Operations All the repetitions of the same operation are mutually independent, i.e. no data dependence and no resource conflicts Good Examples: automobile assembly line floating point multiplier, but instruction pipeline??? Slide 5
6 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Pipeline Illustrated: Comb. ogic n Gate Delay BW = ~(/n) n -- 2 Gate Delay n -- Gate 2 Delay BW = ~(2/n) n Gate n Gate n Gate --Delay --Delay Delay BW = ~(3/n) Slide 6
7 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Instruction Pipeline Design Identical operations? unifying instruction types coalescing instruction types into one multi function pipe minimize external fragmentation (some idling stages) niform Suboperations? balance pipeline stages stage quantization to yield balanced stages minimize internal fragmentation (some waiting stages) Independent operations? resolve data and resource hazards inter instruction dependency detection and resolution minimize performance loss Slide
8 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar 3 Processor Pipeline Review Fetch th Decoded Executet emory (Write-back) 4 PC I-cache Reg File A D-cache T pipeline = T base / 5 Slide 8
9 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage : Fetch Fetch an instruction from memory every cycle. se PC to index memory Increment PC (assume no branches for now) Write state to the pipeline register (IF/) The next stage will read this pipeline register. Note that pipeline register must be edge triggered Slide 9
10 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC ed datap path en PC Instruction emory/ Cache structio on bits In Rest of pipelin en IF / Pipeline register Slide
11 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 2: Decode Decodes opcode bits ay set up control signals for later stages Read input operands from registers file specified by rega and regb of instruction bits Write state to the pipeline register (/E) Opcode Register contents Offset & destination fields PC (even though decode didn t use it) Slide
12 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Contents Of regb Contents Of rega Rest of pipelined datapath PC rega regb Destreg Data Register File en Stage : Fetch datapath Instruction PC bits Control Signals IF / Pipeline register / E Pipeline register Slide 2
13 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 3: Execute Perform A operation. Input operands can be: Contents of rega or RegB Offset field on the instruction Branches: calculate PCoffset Write state to the pipeline register (E/em) A result, contents of RegB and PCoffset Instruction bits for opcode and destreg specifiers Slide 3
14 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar A 2: Decode datapath Contents Of regb Contents Of rega PC contents A PC of regb Result offset Rest of pipelined datapath Stage Control Signals Control Signals / E Pipeline register E/em Pipeline register Slide 4
15 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 4: emory Operation Perform data cache access for memory ops A result contains address for ld and st Opcode bits control mem R/W and enable signals Write state to the pipeline register (em/wb) A result and emdata Instruction bits for opcode and destreg specifiers Slide 5
16 This goes back to the before the PC in stage. control for PC input Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 3: Execute datapath contents Alu PC of regb Result offset emory Read Data Alu Result Rest of pipelined datapath Data emory en R/W Control Signals Control Signals E/em Pipeline register em/wb Pipeline register Slide 6
17 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Stage 5: Write back Writing result to register file (if required) Write emdata to destreg for ld instruction Write A result to destreg for arithmetic instruction Opcode bits control register write enable signal Slide
18 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar 4: em mory dat tapath t Alu Resul emory Re ead Data This goes back to data input of register file Contro ol Signal ls Stage em/wb Pipeline register This goes back to the destination register specifier register write enable bits -2 bits Slide 8
19 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Sample Code (Simple) Run the following code on a pipelined datapath: add 2 3 ; reg 3 = reg reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg 4 = em[reg22] add ; reg 5 = reg 2 reg 5 sw 3 ; em[reg3] =reg Slide 9
20 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC instruction rega regb ster file Regi R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata data dest IF/ Bits -2 Bits 6-8 Bits dest op / E dest op E/ em dest op em/ WB Slide 2
21 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Initial State Inst mem noop IF/ ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R noop / E A noop E/ em Data memory noop em/ WB data dest Slide 2
22 add 2 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem Fetch: add 2 3 Time: add 2 3 IF/ ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R noop / E A noop E/ em Data memory noop em/ WB data dest Slide 22
23 nand add 2 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem Fetch: nand Time: 2 2 nand IF/ 2 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R add / E A noop E/ em Data memory noop em/ WB data dest Slide 23
24 lw nand add 2 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem Fetch: lw Time: 3 3 lw IF/ 4 5 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R nand / E A add E/ em Data memory noop em/ WB data dest Slide 24
25 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar add lw nand add 2 3 PC Inst mem Fetch: add Time: 4 4 add IF/ 2 4 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R lw / E A nand E/ em Data memory 45 3 add em/ WB data dest Slide 25
26 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add lw nand add PC Inst mem Fetch: sw 3 Time: 5 5 sw 3 IF/ 2 5 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R add / E A lw -3 Data memory E/ em -3 6 nand 45 em/ WB data dest Slide 26
27 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add lw nand PC Inst mem No more instructions Time: 6 IF/ 3 ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R sw / E A add Data memory E/ em 4 lw -3 em/ WB data dest Slide 2
28 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add lw PC Inst mem No more instructions Time: IF/ ster file Regi Bits -2 Bits 6-8 Bits R R R2 R3 R4 R5 R6 R / E A sw 6 Data memory E/ em 5 add 99 em/ WB data dest Slide 28
29 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw 3 add PC Inst mem Regi ster file R R R2 R3 R4 R5 R6 R A Data memory 55 6 data dest No more instructions Time: 8 IF/ Bits -2 Bits 6-8 Bits / E E/ em sw 5 em/ WB Slide 29
30 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar sw PC Inst mem Regi ster file R R R2 R3 R4 R5 R6 R A Data memory data dest No more instructions Bits -2 Bits 6-8 Bits Time: 9 IF/ / E E/ em em/ WB Slide 3
31 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Time graphs Time: add nand lw add sw fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback Slide 3
32 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Balancing Pipeline Stages IF T IF = 6 units Without pipelining T cyc T IF T T E T E T WB = 3 E T = 2 units Pipelined T cyc max{t IF,T,T E,T E,T WB } T E= 9 units = 9 E T E = 5 units Speedup= 3 / 9 WB T WB = 9 units Can we do better in terms of either performance or efficiency? Slide 32
33 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Granularity of Pipeline Stages Coarser-Grained achine Cycle: 4 machine cyc / instruction cyc Finer-Grained achine Cycle: machine cyc / instruction cyc IF IF T IF& = 8 units IF IF DEAY DEAY 2 3 E 2 T E = 9 units E E DEAY DEAY 4 5 DEAY DEAY 6 E 3 T E = 5 units E E E2 E2 8 WB 4 T WB = 9 units WB WB DEAY DEAY 9 T cyc = 3 units Slide 33
34 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Balancing Pipeline Stages Two ethods for Stage Quantization: erging of multiple subcomputations into one Subdividing a subcomputation into multiple subcomputations Current Trends: Deeper pipelines (more and more stages) ultiplicity of different (subpipelines) Pipelining of memory access (tricky) Slide 34
35 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Why Not Deeper Pipelines Instruction pipelines are not ideal i.e. Instructions in different stages can have dependenciesd Suppose add 2 3 nand RAW!! t t t 2 t 3 t 4 t 5 add Ft Dt Et 2 t 3 Wt 4 t 5 nand Inst F FD DE E StallW E W Inst F FD D E Stall DW E Slide 35
36 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Types of Dependencies and Hazards Data Dependence (Both memory and register) True dependence (RAW) Instruction must wait for all required input operands Anti Dependence (WAR) aterwrite must not clobber a still pending earlier read Output dependence (WAW) Earlier write must not clobber an already finished later write Control Dependence (aka Procedural Dependence) Conditional branches cause uncertainty to instruction sequencing Instructions following a conditional branch depends on the resolution of the branch instruction (more exact definition later) Slide 36
37 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Terminology Pipeline Hazards: Potential violations of program dependences ust ensure program dependences are not violated Hazard dresolution: Static ethod: Performed at compiled time in software Dynamic ethod: Performed at run time using hardware Pipeline Interlock: Hardware mechanisms for dynamic hazard resolution ust detect and enforce dependences at run time Slide 3
38 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Necessary Conditions for Data Hazards stage j:r k _ Reg Write j:r k _ Reg Write j:_ r k Reg Read stage Y i:r k _ Reg Write i:_ r k Reg Read i:r k _ Reg Write Hazard Distance WAW Hazard WAR Hazard RAW Hazard dist(i,j) dist(,y)?? Hazard!! dist(i,j) > dist(,y)?? Safe Slide 38
39 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards Avoidance (static) ake sure there are no hazards in the code Detect and Stall (dynamic) Stall until earlier instructions finish Detect and Forward (dynamic) Get correct value from elsewhere in pipeline Slide 39
40 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards: Avoidance Programmer/compiler must know implementation details Insert noops between dependent instructions add 2 3 write R3 in cycle 5 noop noop nand read R3 in cycle 6 Slide 4
41 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Avoidance Binary compatability New implementations may require more noops Code size Higher instruction cache footprint onger binary load times Worse in machines that execute multiple instructions / cycle Intel Itanium 25 4% of instructions are noops Slower execution CPI=, but many instructions are noops Slide 4
42 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards: Detect t & Stall Detection Compare rega & regb with DestReg of preceding insn. 3 bit comparators Stall Do not advance pipeline register for Fetch/Decode Pass noop to Execute Slide 42
43 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, Fetch Decode Execute ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar emory WB PC Inst mem PC in nstruction rega regb Re egister file R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata data dest Bits -2 Bits 6-8 Bits dest op dest op dest op IF/ / E E/ em em/ WB Slide 43
44 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, Fetch Decode Execute ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar emory WB PC Inst mem PC in nstruction rega regb Re egister file R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata data dest dest dest dest op op op IF/ / E E/ em em/ WB Slide 44
45 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, Fetch Decode Execute ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar emory WB PC Inst mem PC in nstruction rega regb data Re egister file R R R2 R3 R4 R5 R6 R PC vala valb offset A target eq? A result valb Data memory A result mdata IF/ op op op fwd fwd fwd / E E/ em em/ WB Slide 45
46 End of Cycle Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC add 2 3 rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 PC vala valb offset A target eq? A result valb Data memory A result mdata op op op IF/ / E E/ em em/ WB Slide 46
47 End of Cycle 2 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC n and rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 PC 4 3 A target eq? A result valb Data memory A result mdata add op op IF/ / E E/ em em/ WB Slide 4
48 First half of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem PC Hazard detection na and rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 PC 4 3 A target eq? A result valb Data memory A result mdata add op op IF/ / E E/ em em/ WB Slide 48
49 Hazard detected Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar 3 compare compare compare compare rega regb 3 REG file IF/ / E Slide 49
50 Hazard detected Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar compare rega regb 3 Slide 5
51 First half of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar en en PC Inst mem 2 na and Hazard 3 3 rega regb data Re egister file R R R2 R3 R4 R5 R6 R 4 4 A target eq? A result Data memory A result mdata valb add IF/ / E E/ em em/ WB Slide 5
52 End of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 2 n and rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A 2 Data memory A result mdata noop add IF/ / E E/ em em/ WB Slide 52
53 First half of cycle 4 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar en en PC Inst mem 2 n and Hazard 3 rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A 2 Data memory A result mdata noop add IF/ / E E/ em em/ WB Slide 53
54 End of cycle 4 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 2 n and rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A Data memory 2 noop noop add IF/ / E E/ em em/ WB Slide 54
55 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 5 PC Inst mem 2 n and No Hazard 3 rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R 4 A Data memory 2 noop noop add IF/ / E E/ em em/ WB Slide 55
56 End of cycle 5 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 3 a dd 3 5 rega regb data Re egister file R R R2 R3 R4 R5 R6 R A Data memory nand noop noop IF/ / E E/ em em/ WB Slide 56
57 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Detect & Stall CPI increases on every hazard Are these stalls necessary? Not always! The new value for R3 is in the E/em register Reroute the result to the nand Called forwarding or bypassing Slide 5
58 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards: Detect t & Forward Detection Same as detect and stall, but each possible hazard requires different forwarding paths Forward Add data paths for all possible sources Add mux in front of A to select source bypassing logic often a critical path in wide issue machines # paths grows quadratically with machine width Slide 58
59 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, add 2 3 // r3 = r ipasti, r2artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar nand // r5 = r3 NAND r4 add 6 3 // r = r3 r6 lw 3 6 // r6 = E[r3] sw // E[r62]=r2 r add nand add lw sw Slide 59
60 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 3 PC Inst mem 2 na and Hazard 3 3 rega regb data Re egister file R R R2 R3 R4 R5 R6 R A Data memory add IF/ fwd fwd fwd / E E/ em em/ WB Slide 6
61 End of cycle 3 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 3 a dd rega regb 3 data Re egister file R R R2 R3 R4 R5 R6 R A 2 Data memory nand add IF/ H / E E/ em em/ WB Slide 6
62 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 4 PC Inst mem 3 add 6 3 New Hazard R 4 8 R rega R2 3 regb R3 R4 3 R5 data R6 5 Re egister file R 2 2 A 2 Data memory nand add IF/ H / E E/ em em/ WB Slide 62
63 End of cycle 4 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 4 lw 3 6 rega regb 5 3 data Re egister file R R R2 R3 R4 R5 R6 R A -2 Data memory 2 add nand add IF/ H2 / E H E/ em em/ WB Slide 63
64 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 5 PC Inst mem 4 lw 3 6 No Hazard 3 rega regb 5 3 data Re egister file R R R2 R3 R4 R5 R6 R A -2 Data memory 2 add nand add IF/ H2 / E H E/ em em/ WB Slide 64
65 End of cycle 5 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 5 sw rega regb 6 5 data Re egister file R R R2 R3 R4 R5 R6 R A 22 Data memory -2 lw add nand IF/ / E H2 E/ em H em/ WB Slide 65
66 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 6 en en PC Inst mem 5 sw Hazard 6 rega regb 6 5 data Re egister file R R R2 R3 R4 R5 R6 R A 22 Data memory -2 lw add nand IF/ / E H2 E/ em H em/ WB Slide 66
67 End of cycle 6 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem 5 sw rega regb 6 data Re egister file R R R2 R3 R4 R5 R6 R A 3 Data memory 22 noop lw add IF/ / E E/ em H2 em/ WB Slide 6
68 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle PC Inst mem 5 sw Hazard 6 rega regb 6 data Re egister file R R R2 R3 R4 R5 R6 R A 3 Data memory 22 noop lw add IF/ / E E/ em H2 em/ WB Slide 68
69 End of cycle Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem rega regb 6 data Re egister file R R R2 R3 R4 R5 R6 R A Data memory 99 sw noop lw IF/ H3 / E E/ em em/ WB Slide 69
70 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar First half of cycle 8 PC Inst mem rega regb 6 data Re egister file R R R R3 R4 R5 R6 R 2 2 A Data memory 99 sw noop lw IF/ H3 / E E/ em em/ WB Slide
71 End of cycle 8 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC Inst mem rega regb data Re egister file R R R2 R3 R4 R5 R6 R A Data memory sw noop IF/ / E H3 E/ em em/ WB Slide
72 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar oad Delay Slot (IPS R2) t t t 2 t 3 t 4 t 5 i: F D E W j: F D E W F D E W k: h: R k -- i: R k E[ - ] - The effect of a delayed oad is not visible to the instructions in its delay slots. j: -- R k Which (R k: -- R k ) do we really mean? k Slide 2
73 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Control Hazards beq sub beq sub t t t 2 t 3 t 4 t 5 F D E W F D E W squash Slide 3
74 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Data Hazards Avoidance (static) No branches? Convert branches to predication Control dependence becomes data dependence Detect andstall (dynamic) Stop fetch until branch resolves Speculate and squash (dynamic) Keep going past branch, throw away instructions if wrong Slide 4
75 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Avoidance: if-conversion if (a == b) { sub t ab a, x; jnz t, PC2 y = n / d; add x x, # } div y n, d sub t a, b sub t a, b add(t) x x, # add t2 x, # div(t) y n, d div t3 n, d cmov(t) x t2 cmov(t) y t3 Slide 5
76 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Control Hazards: Detect t & Stall Detection In decode, check if opcode is branch or jump Stall Hold next instruction in Fetch Pass noop to Decode Slide 6
77 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Detect & Stall CPI increases on every branch Are these stalls necessary? Not always! Branch is only taken half the time Assume branch is NOT taken Keep fetching, treat branch as noop If wrong, make sure bad instructions don t complete Slide
78 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Handling Control Hazards: Speculate & Squash Speculate Assume branch is not taken Squash Overwrite opcodes in Fetch, Decode, Execute with noop Pass target to Fetch Slide 8
79 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar PC beq sub add nand Inst mem noop add IF/ Control REG file sign ext noop sub / E equal A noop beq E/ em Data memory beq em/ WB Slide 9
80 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Problems with Speculate & Squash Alwaysassumes assumes branch is not taken Can we do better? Yes. Predict branch direction and target! Why possible? Program behavior repeats. ore on branch prediction to come... Slide 8
81 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Branch Delay Slot (IPS, SPARC) branch: next: target: t t t 2 t 3 t 4 t 5 F D E W F Squash F D E W - Instruction in delay slot executes even on taken branch branch: delay: target: F D E W F D E W F D E W i: beq, 2, tgt j: add 3, 4, 5 What can we put here? Slide 8
82 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Pipeline Hazard Checklist emory Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Register Data Dependences Output Dependence (WAW) Anti Dependence (WAR) True Data Dependence (RAW) Control Dependences Slide 82
83 Sequential Code Semantics Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar Instruction Dependences i: xxxx i i2: xxxx i2 i3: xxxx i3 and Pipeline Hazards A true dependence between two instructions may only involve one substep of each instruction. i: i2: The implied sequential precedences are overspecifications. It is sufficient but not necessary to ensure program correctness. i3: Slide 83
EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont
GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith,
More informationEECS 470. Further review: Pipeline Hazards and More. Lecture 2 Winter 2018
EECS 470 Further review: Pipeline Hazards and ore Lecture 2 Winter 208 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar,
More information(Basic) Processor Pipeline
(Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might
More informationPipeline design. Mehran Rezaei
Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We
More informationEECS 470 Lecture 2. Performance, Power & ISA. Fall Jon Beaumont
Performance, Power & ISA Fall 218 Jon Beaumont Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, udge, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie
More informationEECS 470. Control Hazards and ILP. Lecture 3 Winter 2014
EECS 470 Control Hazards and ILP Lecture 3 Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More informationLecture 10: Pipelined Implementations: Hazards and Resolutions. Instruction Pipeline Reality
18-447 Lecture 10: Pipelined Implementations: Hazards and Resolutions S 09 L10-1 James C. Hoe José F. Martínez Electrical and Computer Engineering Carnegie Mellon University February 15, 2010 Instruction
More informationEECS 470 Lecture 1. Computer Architecture Winter 2014
EECS 470 Lecture 1 Computer Architecture Winter 2014 Slides developed in part by Profs. Brehob, Austin, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch 1 What Is Computer
More informationLecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationECE 505 Computer Architecture
ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems
More informationPIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PIPELINING: HAZARDS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission deadline: Jan. 30 th This
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationECE/CS 552: Pipeline Hazards
ECE/CS 552: Pipeline Hazards Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipeline Hazards Forecast Program Dependences
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationComplex Pipelining. Motivation
6.823, L10--1 Complex Pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Motivation 6.823, L10--2 Pipelining becomes complex when we want high performance in the presence
More informationPipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1
55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationWide Instruction Fetch
Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table
More informationCS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall11.html
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationECEC 355: Pipelining
ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationECE260: Fundamentals of Computer Engineering
Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining
More informationLecture 19: Instruction Level Parallelism
Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register
More informationEECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table
Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.
More information15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011
15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationPipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010
Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationComputer Systems Architecture I. CSE 560M Lecture 5 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 5 Prof. Patrick Crowley Plan for Today Note HW1 was assigned Monday Commentary was due today Questions Pipelining discussion II 2 Course Tip Question 1:
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationPipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!
Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationCS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationPipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations
Principles of pipelining Pipelining Simple pipelining Structural Hazards Data Hazards Control Hazards Interrupts Multicycle operations Pipeline clocking ECE D52 Lecture Notes: Chapter 3 1 Sequential Execution
More information10. Basic Processor Design: Single-Cycle and Multi-Cycle Datapaths
0. Basic Processor Design: Single-Cycle and ulti-cycle paths EECS 370 Introduction to Computer Organization Winter 2007 Prof. Valeria Bertacco & Prof. Scott ahlke EECS Department niversity of ichigan in
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationLecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions
Lecture 7: Pipelining Contd. Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 More pipelining complications: Interrupts and Exceptions Hard to handle in pipelined
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationEITF20: Computer Architecture Part3.2.1: Pipeline - 3
EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done
More informationece4750-t11-ooo-execution-notes.txt ========================================================================== ece4750-l12-ooo-execution-notes.txt ==========================================================================
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationVery Simple MIPS Implementation
06 1 MIPS Pipelined Implementation 06 1 line: (In this set.) Unpipelined Implementation. (Diagram only.) Pipelined MIPS Implementations: Hardware, notation, hazards. Dependency Definitions. Hazards: Definitions,
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 9 Instruction-Level Parallelism Part 2
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 9 Instruction-Level Parallelism Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15 www.duke.edu/~bcl15/class/class_ece252fall12.html
More informationFinal Exam Fall 2007
ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd
More informationBasic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?
Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction
More informationComputer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types
More informationChapter 3. Pipelining. EE511 In-Cheol Park, KAIST
Chapter 3. Pipelining EE511 In-Cheol Park, KAIST Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationCIS 662: Midterm. 16 cycles, 6 stalls
CIS 662: Midterm Name: Points: /100 First read all the questions carefully and note how many points each question carries and how difficult it is. You have 1 hour 15 minutes. Plan your time accordingly.
More information15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More informationChapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,
Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More information