Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018

Size: px
Start display at page:

Download "Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018"

Transcription

1 Desig of Digital Circuits Lecture 4: Pipeliig Prof. Our Mutlu ETH Zurich Sprig 28 9 April 28

2 Ageda for Today & Next Few Lectures Previous lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed Microarchitectures Pipeliig Issues i Pipeliig: Cotrol & Data Depedece Hadlig, State Maiteace ad Recovery, Out-of-Order Executio Issues i OoO Executio: Load-Store Hadlig, 2

3 Lecture Aoucemet Moday, April 3, 28 6:5-7:5 CAB G 6 Apéro after the lecture J Prof. We-Mei Hwu (Uiversity of Illiois at Urbaa-Champaig) D-INFK Distiguished Collouium Iovative Applicatios ad Techology Pivots A Perfect Storm i Computig 3

4 Readigs for This Week ad Next Week H&H, Chapter 7.5: Pipelied Processor H&H (fiish Chapter 7) Smith ad Sohi, The Microarchitecture of Superscalar Processors, Proceedigs of the IEEE, 995 More advaced pipeliig Iterrupt ad exceptio hadlig Out-of-order ad superscalar executio cocepts 4

5 Ca We Do Better? 5

6 Ca We Do Better? What limitatios do you see with the multi-cycle desig? Limited cocurrecy Some hardware resources are idle durig differet phases of istructio processig cycle Fetch logic is idle whe a istructio is beig decoded or executed Most of the datapath is idle whe a memory access is happeig 6

7 Ca We Use the Idle Hardware to Improve Cocurrecy? Goal: More cocurrecy à Higher istructio throughput (i.e., more work completed i oe cycle) Idea: Whe a istructio is usig some resources i its processig phase, process other istructios o idle resources ot eeded by that istructio E.g., whe a istructio is beig decoded, fetch the ext istructio E.g., whe a istructio is beig executed, decode aother istructio E.g., whe a istructio is accessig data memory (ld/st), execute the ext istructio E.g., whe a istructio is writig its result ito the register file, access data memory for the ext istructio 7

8 Pipeliig 8

9 Pipeliig: Basic Idea More systematically: Pipelie the executio of multiple istructios Aalogy: Assembly lie processig of istructios Idea: Divide the istructio processig cycle ito distict stages of processig Esure there are eough hardware resources to process oe istructio i each stage Process a differet istructio i each stage Istructios cosecutive i program order are processed i cosecutive stages Beefit: Icreases istructio processig throughput (/CPI) Dowside: Start thikig about this 9

10 Example: Executio of Four Idepedet ADDs Multi-cycle: 4 cycles per istructio F D E W F D E W F D E W F D E W Pipelied: 4 cycles per 4 istructios (steady state) Time F D E W F D E W F D E W Is life always this beautiful? F D E W Time

11 The Laudry Aalogy Time Task order A B C D 6 PM AM place oe dirty load of clothes i the washer whe the washer is fiished, place the wet load i the dryer whe the dryer is fiished, take out the dry load ad fold whe foldig is fiished, ask your roommate (??) to put the clothes away - steps to do a load are seuetially depedet - o depedece betwee differet loads - differet steps do ot share resources Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.]

12 Pipeliig Multiple Loads of Laudry Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D - 4 loads of laudry i parallel - o additioal resources - throughput icreased by 4 - latecy per load is the same Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 2

13 Pipeliig Multiple Loads of Laudry: I Practice Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D the slowest step decides throughput Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 3

14 Pipeliig Multiple Loads of Laudry: I Practice Time Task order A B C D 6 PM AM Time 6 PM AM Task order A B C D Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] A B A B throughput restored (2 loads per hour) usig 2 dryers 4

15 A Ideal Pipelie Goal: Icrease throughput with little icrease i cost (hardware cost, i case of istructio processig) Repetitio of idetical operatios The same operatio is repeated o a large umber of differet iputs (e.g., all laudry loads go through the same steps) Repetitio of idepedet operatios No depedecies betwee repeated operatios Uiformly partitioable suboperatios Processig ca be evely divided ito uiform-latecy suboperatios (that do ot share resources) Fittig examples: automobile assembly lie, doig laudry What about the istructio processig cycle? 5

16 Ideal Pipeliig combiatioal logic (F,D,E,M,W) T psec BW=~(/T) T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,M) T/3 ps (M,W) BW=~(3/T) 6

17 More Realistic Pipelie: Throughput Nopipelied versio with delay T BW = /(T+S) where S = latch delay T ps k-stage pipelied versio BW k-stage = / (T/k +S ) BW max = / ( gate delay + S ) Latch delay reduces throughput (switchig overhead b/w stages) T/k ps T/k ps 7

18 More Realistic Pipelie: Cost Nopipelied versio with combiatioal cost G Cost = G+L where L = latch cost G gates k-stage pipelied versio Cost k-stage = G + Lk Latches icrease hardware cost G/k G/k 8

19 Pipeliig Istructio Processig 9

20 Remember: The Istructio Processig Cycle Fetch. Istructio fetch (IF) 2. Istructio Decodedecode ad register Evaluate operad Address fetch (ID/RF) 3. Execute/Evaluate Fetch Operads memory address (EX/AG) 4. Memory operad fetch (MEM) 5. Store/writeback Execute result (WB) Store Result 2

21 Remember the Sigle-Cycle Uarch Istructio [25 ] Shift Jump address [3 ] left PCSrc =Jump 4 Add PC+4 [3 28] Istructio [3 26] Cotrol RegDst Jump Brach MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Shift left 2 Add result ALU M u x M u x PCSrc 2 =Br Take PC Read address Istructio memory Istructio [3 ] Istructio [25 2] Istructio [2 6] Istructio [5 ] M u x Read register Read data Read register 2 Registers Read Write data 2 register Write data M u x Zero ALU ALU result bcod Address Write data Data memory Read data M u x Istructio [5 ] 6 Sig 32 exted ALU cotrol Istructio [5 ] ALU operatio Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T BW=~(/T) 2

22 Dividig Ito Stages 2ps IF: Istructio fetch M u x ps 2ps 2ps ps ID: Istructio decode/ register file read EX: Execute/ address calculatio MEM: Memory access WB: Write back igore for ow Add 4 Shift left 2 Add Add result PC Address Istructio memory Istructio Read register Read data Read register 2 Registers Read data 2 Write register Write data M u x Zero ALU ALU result Address Write data Data memory Read data M u x RF write 6 Sig exted 32 Is this the correct partitioig? Why ot 4 or 6 stages? Why ot differet boudaries? Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 22

23 Istructio Pipelie Throughput Program executio order Time (i istructios) lw $, ($) Istructio fetch Reg ALU Data access Reg lw $2, 2($) 8ps 8 s Istructio fetch Reg ALU Data access Reg lw $3, 3($) Program executio Time order (i istructios) lw $, ($) Istructio fetch 8 8ps s Reg ALU Data access Reg Istructio fetch 8ps 8 s... lw $2, 2($) 2ps 2 s Istructio fetch Reg ALU Data access Reg lw $3, 3($) 2ps 2 s Istructio fetch Reg ALU Data access Reg 2ps 2 s 2ps 2 s 22ps s 2ps 2 s 2ps 2 s 5-stage speedup is 4, ot 5 as predicted by the ideal model. Why? 23

24 Eablig Pipelied Processig: Pipelie Registers IF: Istructio fetch M M u u x x ID: Istructio decode/ register file read EX: Execute/ address calculatio MEM: Memory access WB: Write back No resource is used by more tha stage! IF/ID ID/EX EX/MEM MEM/WB 4 4 Add Add PC D +4 PC E +4 Add Add Add Add result result PC M Shift Shift left left 2 2 PC PC PC F Address Address Istructio memory Istructio Istructio memory IR D Istructio Read Read register register Read Read data data Read Read register 2 2 Registers Read Read Write Write data data 2 2 register register Write Write data data Sig Sig exted exted A E B E Imm E M M u u x x Zero Zero ALU ALU ALU ALU result result Aout M B M Address Address Write Write data data Data Data memory Read Read data data MDR W Aout W M M u u x x Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] T/k ps T T/k ps 24

25 Pipelied Operatio Example lw Istructio fetch M u x All istructio classes must follow the same path ad timig through the pipelie stages. lw lw Istructio decode Ay performace impact? Executio lw Memory lw Write back IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left left 2 Add Add result PC PC Address Istructio memory Istructio Read register Read data Read register 2 Registers Read Write data 2 register Write data 6 6 Sig exted M u x Zero ALU ALU result Address Data memory Data memory Write data data Read data M u x Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 25

26 Write data Pipelied Operatio Example 32 6 Sig exted M u x Write data Data memory M u x Clock 5 sub lw $, $, 2($) $2, $3 Istructio fetch M u x sub lw $, $, 2($) $2, $3 Istructio decode lw $, 2($) Executio sub $, $2, $3 Executio sub lw $, $, 2($) $2, $3 Memory sub lw $, $, 2($) $2, $3 Write back IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left 2 Add Add result PC Address Istructio memory Istructio Read register Read Read data register 2 Zero Registers Read ALU ALU Write data 2 result register M u Write x data Is life always this beautiful? 6 Sig exted 32 Address Data memory Write data Read data M u x Clock 2 3 Clock Clock 56 4 Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 26 sub $, $2, $3

27 Illustratig Pipelie Operatio: Operatio View t t t 2 t 3 t 4 t 5 Ist Ist Ist 2 Ist 3 Ist 4 IF ID IF EX ID IF MEM EX ID IF WB MEM EX ID IF steady state (full pipelie) WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF 27

28 Illustratig Pipelie Operatio: Resource View t t t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t IF I I I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I ID I I I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 EX I I I 2 I 3 I 4 I 5 I 6 I 7 I 8 MEM I I I 2 I 3 I 4 I 5 I 6 I 7 WB I I I 2 I 3 I 4 I 5 I 6 28

29 Cotrol Poits i a Pipelie PCSrc M u x IF/ID ID/EX EX/MEM MEM/WB Add 4 RegWrite Shift left 2 Add Add result Brach PC Address Istructio memory Istructio Read register Read data Read register 2 Registers Read Write data 2 register Write data Istructio [5 ] 6 Sig 32 exted ALUSrc M u x 6 ALU cotrol Zero ALU ALU result Address Write data MemWrite Data memory MemRead Read data MemtoReg M u x Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] Istructio [2 6] Istructio [5 ] M u x RegDst ALUOp Idetical set of cotrol poits as the sigle-cycle datapath!! 29

30 Cotrol Sigals i a Pipelie For a give istructio Þ same cotrol sigals as sigle-cycle, but cotrol sigals reuired at differet cycles, depedig o stage Optio : decode oce usig the same logic as sigle-cycle ad buffer sigals util cosumed WB Istructio Cotrol M WB EX M WB Þ IF/ID ID/EX EX/MEM MEM/WB Optio 2: carry relevat istructio word/field dow the pipelie ad decode locally withi each or i a previous stage Which oe is better? 3

31 Pipelied Cotrol Sigals PCSrc M u x Cotrol ID/EX WB M EX/MEM WB MEM/WB IF/ID EX M WB Add PC 4 Address Istructio memory Istructio Read register Read data Read register 2 Registers Read Write data 2 register Write data RegWrite Shift left 2 M u x Add Add result ALUSrc Zero ALU ALU result Brach Write data MemWrite Address Data memory Read data MemtoReg M u x Istructio [5 ] 6 Sig 32 exted 6 ALU cotrol MemRead Istructio [2 6] Istructio [5 ] M u x RegDst ALUOp Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 3

32 Caregie Mello Aother Example: Sigle-Cycle ad Pipelied CLK PC' PC A RD Istructio Memory Istr 25:2 2:6 CLK A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData CLK A RD Data Memory WD WE ReadData 4 + PCPlus4 2:6 5: 5: Sig Exted SigImm PC' CLK PCF A RD Istructio Memory CLK IstrD 25:2 2:6 2:6 5: CLK A A2 A3 WD3 WE3 Register File RD RD2 CLK RtE RdE SrcAE SrcBE WriteDataE WriteRegE 4: CLK ZeroM ALUOutM WriteDataM CLK WE A RD Data Memory WD ALUOutW ReadDataW + 4 5: Sig Exted SigImmE <<2 PCBrachM + WriteReg 4: <<2 + PCBrach Result ALU CLK PCPlus4F PCPlus4D PCPlus4E Fetch Decode Execute Memory Writeback ResultW 32

33 Caregie Mello Aother Example: Correct Pipelied Datapath CLK PC' PCF A RD Istructio Memory CLK IstrD 25:2 2:6 2:6 5: CLK A A2 A3 WD3 WE3 Register File RD RD2 CLK RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: CLK ZeroM ALUOutM WriteDataM WriteRegM 4: CLK A RD Data Memory WD WE CLK ALUOutW ReadDataW WriteRegW 4: 4 + 5: Sig Exted SigImmE <<2 + PCBrachM PCPlus4F PCPlus4D PCPlus4E Fetch Decode Execute Memory Writeback WriteReg must arrive at the same time as Result ResultW 33

34 Caregie Mello Aother Example: Pipelied Cotrol CLK CLK CLK Cotrol Uit RegWriteD MemtoRegD MemWriteD RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Fuct BrachD ALUCotrolD ALUSrcD BrachE ALUCotrolE 2: ALUSrcE BrachM PCSrcM RegDstD RegDstE ALUOutW PC' CLK PCF A RD Istructio Memory CLK IstrD 25:2 2:6 2:6 5: CLK A A2 A3 WD3 WE3 Register File RD RD2 RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: ZeroM ALUOutM WriteDataM WriteRegM 4: CLK A RD Data Memory WD WE ReadDataW WriteRegW 4: 4 + 5: Sig Exted SigImmE <<2 + PCBrachM PCPlus4F PCPlus4D PCPlus4E ResultW Same cotrol uit as sigle-cycle processor Cotrol delayed to proper pipelie stage 34

35 Remember: A Ideal Pipelie Goal: Icrease throughput with little icrease i cost (hardware cost, i case of istructio processig) Repetitio of idetical operatios The same operatio is repeated o a large umber of differet iputs (e.g., all laudry loads go through the same steps) Repetitio of idepedet operatios No depedecies betwee repeated operatios Uiformly partitioable suboperatios Processig a be evely divided ito uiform-latecy suboperatios (that do ot share resources) Fittig examples: automobile assembly lie, doig laudry What about the istructio processig cycle? 35

36 Istructio Pipelie: Not A Ideal Pipelie Idetical operatios... NOT! Þ differet istructios à ot all eed the same stages Forcig differet istructios to go through the same pipe stages à exteral fragmetatio (some pipe stages idle for some istructios) Uiform suboperatios... NOT! Þ differet pipelie stages à ot the same latecy Need to force each stage to be cotrolled by the same clock à iteral fragmetatio (some pipe stages are too fast but all take the same clock cycle time) Idepedet operatios... NOT! Þ istructios are ot idepedet of each other Need to detect ad resolve iter-istructio depedecies to esure the pipelie provides correct results à pipelie stalls (pipelie is ot always movig) 36

37 Issues i Pipelie Desig Balacig work i pipelie stages How may stages ad what is doe i each stage Keepig the pipelie correct, movig, ad full i the presece of evets that disrupt pipelie flow Hadlig depedeces Data Cotrol Hadlig resource cotetio Hadlig log-latecy (multi-cycle) operatios Hadlig exceptios, iterrupts Advaced: Improvig pipelie throughput Miimizig stalls 37

38 Causes of Pipelie Stalls Stall: A coditio whe the pipelie stops movig Resource cotetio Depedeces (betwee istructios) Data Cotrol Log-latecy (multi-cycle) operatios 38

39 Depedeces ad Their Types Also called depedecy or less desirably hazard Depedeces dictate orderig reuiremets betwee istructios Two types Data depedece Cotrol depedece Resource cotetio is sometimes called resource depedece However, this is ot fudametal to (dictated by) program sematics, so we will treat it separately 39

40 Hadlig Resource Cotetio Happes whe istructios i two pipelie stages eed the same resource Solutio : Elimiate the cause of cotetio Duplicate the resource or icrease its throughput E.g., use separate istructio ad data memories (caches) E.g., use multiple ports for memory structures Solutio 2: Detect the resource cotetio ad stall oe of the cotedig stages Which stage do you stall? Example: What if you had a sigle read ad write port for the register file? 4

41 Caregie Mello Example Resource Depedece: RegFile The register file ca be read ad writte i the same cycle: write takes place durig the st half of the cycle read takes place durig the 2d half of the cycle => o problem!!! However operatios that ivolve register file have oly half a clock cycle to complete the operatio!! Time (cycles) add $s2 add $s, $s2, $s3 IM RF $s3 + DM $s RF ad $t, $s, $s IM ad $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF 4

42 Data Depedeces Types of data depedeces Flow depedece (true data depedece read after write) Output depedece (write after write) Ati depedece (write after read) Which oes cause stalls i a pipelied machie? For all of them, we eed to esure sematics of the program is correct Flow depedeces always eed to be obeyed because they costitute true depedece o a value Ati ad output depedeces exist due to limited umber of architectural registers They are depedece o a ame, ot a value We will later see what we ca do about them 42

43 Data Depedece Types Flow depedece r 3 r op r 2 Read-after-Write r 5 r 3 op r 4 (RAW) Ati depedece r 3 r op r 2 Write-after-Read r r 4 op r 5 (WAR) Output-depedece r 3 r op r 2 Write-after-Write r 5 r 3 op r 4 (WAW) r 3 r 6 op r 7 43

44 Write data Pipelied Operatio Example 32 6 Sig exted M u x Write data Data memory M u x Clock 5 sub lw $, $, 2($) $2, $3 Istructio fetch M u x sub lw $, $, 2($) $2, $3 Istructio decode lw $, 2($) Executio sub $, $2, $3 Executio sub lw $, $, 2($) $2, $3 Memory sub lw $, $, 2($) $2, $3 Write back IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left 2 Add Add result PC Read Address register Read Read data Istructio register 2 Zero Registers memory Read ALU ALU Write data 2 result Address register M u x Data Write data memory What if the SUB were depedet o LW? Write data Istructio 6 Sig exted 32 Read data M u x Clock 2 3 Clock Clock 56 4 Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] 44 sub $, $2, $3

45 Data Depedece Hadlig 45

46 Readig for Next Few Lectures H&H, Chapter Smith ad Sohi, The Microarchitecture of Superscalar Processors, Proceedigs of the IEEE, 995 More advaced pipeliig Iterrupt ad exceptio hadlig Out-of-order ad superscalar executio cocepts 46

47 How to Hadle Data Depedeces Ati ad output depedeces are easier to hadle write to the destiatio i oe stage ad i program order Flow depedeces are more iterestig Five fudametal ways of hadlig flow depedeces Detect ad wait util value is available i register file Detect ad forward/bypass data to depedet istructio Detect ad elimiate the depedece at the software level No eed for the hardware to detect depedece Predict the eeded value(s), execute speculatively, ad verify Do somethig else (fie-graied multithreadig) No eed to detect 47

48 Iterlockig Detectio of depedece betwee istructios i a pipelied processor to guaratee correct executio Software based iterlockig vs. Hardware based iterlockig MIPS acroym? 48

49 Approaches to Depedece Detectio (I) Scoreboardig Each register i register file has a Valid bit associated with it A istructio that is writig to the register resets the Valid bit A istructio i Decode stage checks if all its source ad destiatio registers are Valid Yes: No eed to stall No depedece No: Stall the istructio Advatage: Simple. bit per register Disadvatage: Need to stall for all types of depedeces, ot oly flow dep. 49

50 Not Stallig o Ati ad Output Depedeces What chages would you make to the scoreboard to eable this? 5

51 Approaches to Depedece Detectio (II) Combiatioal depedece check logic Special logic that checks if ay istructio i later stages is supposed to write to ay source register of the istructio that is beig decoded Yes: stall the istructio/pipelie No: o eed to stall o flow depedece Advatage: No eed to stall o ati ad output depedeces Disadvatage: Logic is more complex tha a scoreboard Logic becomes more complex as we make the pipelie deeper ad wider (flash-forward: thik superscalar executio) 5

52 Oce You Detect the Depedece i Hardware What do you do afterwards? Observatio: Depedece betwee two istructios is detected before the commuicated data value becomes available Optio : Stall the depedet istructio right away Optio 2: Stall the depedet istructio oly whe ecessary à data forwardig/bypassig Optio 3: 52

53 Data Forwardig/Bypassig Problem: A cosumer (depedet) istructio has to wait i decode stage util the producer istructio writes its value i the register file Goal: We do ot wat to stall the pipelie uecessarily Observatio: The data value eeded by the cosumer istructio ca be supplied directly from a later stage i the pipelie (istead of oly from the register file) Idea: Add additioal depedece check logic ad data forwardig paths (buses) to supply the producer s value to the cosumer right after the value is available Beefit: Cosumer ca move i the pipelie util the poit the value ca be supplied à less stallig 53

54 A Special Case of Data Depedece Cotrol depedece Data depedece o the Istructio Poiter / Program Couter 54

55 Cotrol Depedece Questio: What should the fetch PC be i the ext cycle? Aswer: The address of the ext istructio All istructios are cotrol depedet o previous oes. Why? If the fetched istructio is a o-cotrol-flow istructio: Next Fetch PC is the address of the ext-seuetial istructio Easy to determie if we kow the size of the fetched istructio If the istructio that is fetched is a cotrol-flow istructio: How do we determie the ext Fetch PC? I fact, how do we kow whether or ot the fetched istructio is a cotrol-flow istructio? 55

56 Data Depedece Hadlig: Cocepts ad Implemetatio 56

57 Remember: Data Depedece Types Flow depedece r 3 r op r 2 Read-after-Write r 5 r 3 op r 4 (RAW) Ati depedece r 3 r op r 2 Write-after-Read r r 4 op r 5 (WAR) Output-depedece r 3 r op r 2 Write-after-Write r 5 r 3 op r 4 (WAW) r 3 r 6 op r 7 57

58 RAW Depedece Hadlig Which oe of the followig flow depedeces lead to coflicts i the 5-stage pipelie? addi ra r- - IF ID EX MEM WB addi r- ra - IF ID EX MEM WB addi r- ra - IF ID EX MEM addi r- ra - IF ID EX addi r- ra - IF? ID addi r- ra - IF 58

59 Pipelie Stall: Resolvig Data Depedece t t t 2 t 3 t 4 t 5 Ist h IF ID ALU MEM WB Ist i i IF ID ALU MEM WB Ist j j IF ID ALU ID MEM ALU ID WB MEM ALU ID WB MEM ALU Ist k IF ID IF ALU ID IF MEM ALU ID IF WB MEM ALU ID Ist l IF ID IF ALU ID IF MEM ALU ID IF IF ID IF ALU ID IF i: r x _ j: bubble _ r IF ID x dist(i,j)= IF Stall = make the depedet istructio j: bubble _ r x dist(i,j)=2 IF j: bubble _ r x dist(i,j)=3 j: _ r x dist(i,j)=4 wait util its source data value is available. stop all up-stream stages 2. drai all dow-stream stages 59

60 How to Implemet Stallig PCSrc M u x Cotrol ID/EX WB M EX/MEM WB MEM/WB IF/ID EX M WB Add PC 4 Address Istructio memory Istructio Read register Read data Read register 2 Registers Read Write data 2 register Write data RegWrite Shift left 2 M u x Add Add result ALUSrc Zero ALU ALU result Brach Write data MemWrite Address Data memory Read data MemtoReg M u x Istructio [5 ] 6 Sig 32 exted 6 ALU cotrol MemRead Stall Istructio [2 6] Istructio [5 ] M u x RegDst disable PC ad IF/ID latchig; esure stalled istructio stays i its stage Isert ivalid istructios/ops ito the stage followig the stalled oe (called bubbles ) Based o origial figure from [P&H CO&D, COPYRIGHT 24 Elsevier. ALL RIGHTS RESERVED.] ALUOp 6

61 Caregie Mello RAW Data Depedece Example Oe istructio writes a register ($s) ad ext istructios read this register => read after write (RAW) depedece. add writes ito $s i the first half of cycle 5 ad reads $s o cycle 3, obtaiig the wrog value Oly if the pipelie hadles data depedeces wrog! or reads $s o cycle 4, agai obtaiig the wrog value. sub reads $s i the secod half of cycle 5, obtaiig the correct value subseuet istructios read the correct value of $s Time (cycles) add $s2 add $s, $s2, $s3 IM RF $s3 + DM $s RF ad $t, $s, $s IM ad $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF 6

62 Caregie Mello Compile-Time Detectio ad Elimiatio Time (cycles) add $s2 add $s, $s2, $s3 IM RF $s3 + DM $s RF op IM op RF DM RF op IM op RF DM RF ad $t, $s, $s IM ad $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF Isert eough NOPs for the reuired result to be ready Or (if you ca) move idepedet useful istructios up 62

63 Caregie Mello Data Forwardig Also called Data Bypassig We have already see the basic idea before Forward the result value to the depedet istructio as soo as the value is available Remember dataflow? Data value supplied to depedet istructio as soo as it is available Istructio executes whe all its operads are available Data forwardig brigs a pipelie closer to data flow executio priciples 63

64 Caregie Mello Data Forwardig Time (cycles) add $s2 add $s, $s2, $s3 IM RF $s3 + DM $s RF ad $t, $s, $s IM ad $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF 64

65 Caregie Mello Data Forwardig Cotrol Uit RegWriteD MemtoRegD MemWriteD CLK CLK CLK RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM 3:26 5: Op Fuct ALUCotrolD 2: ALUSrcD RegDstD BrachD ALUCotrolE 2: ALUSrcE RegDstE BrachE BrachM PCSrcM CLK PC' PCF 4 A RD Istructio Memory + CLK IstrD 25:2 2:6 25:2 2:6 5: 5: CLK A A2 A3 WD3 WE3 Sig Exted Register File RD RD2 SigImmD RsD RtD RdD RsE RtE RdE SrcAE SrcBE WriteDataE ALU WriteRegE 4: SigImmE <<2 ZeroM ALUOutM WriteDataM WriteRegM 4: CLK WE A RD Data Memory WD ReadDataW ALUOutW WriteRegW 4: + PCPlus4F PCPlus4D PCPlus4E PCBrachM ResultW ForwardAE ForwardBE RegWriteM RegWriteW Hazard Uit 65

66 Caregie Mello Data Forwardig Forward to Execute stage from either: Memory stage or Writeback stage Whe should we forward from oe either Memory or Writeback stage? If that stage will write a destiatio register ad the destiatio register matches the source register. If both the Memory ad Writeback stages cotai matchig destiatio registers, the Memory stage should have priority, because it cotais the more recetly executed istructio. 66

67 Caregie Mello Data Forwardig Forward to Execute stage from either: Memory stage or Writeback stage Forwardig logic for ForwardAE (pseudo code): if ((rse!= ) AND (rse == WriteRegM) AND RegWriteM) the ForwardAE = # forward from Memory stage else if ((rse!= ) AND (rse == WriteRegW) AND RegWriteW) the ForwardAE = # forward from Writeback stage else ForwardAE = # o forwardig Forwardig logic for ForwardBE same, but replace rse with rte 67

68 Caregie Mello Stallig Time (cycles) lw $s, 4($) IM RF 4 lw $ + DM $s RF ad $t, $s, $s IM ad Trouble! $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF Forwardig is sufficiet to resolve RAW data depedeces but There are cases whe forwardig is ot possible due to pipelie desig ad istructio latecies 68

69 Caregie Mello Stallig Time (cycles) lw $s, 4($) IM RF 4 lw $ + DM $s RF ad $t, $s, $s IM ad Trouble! $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF The lw istructio does ot fiish readig data util the ed of the Memory stage, so its result caot be forwarded to the Execute stage of the ext istructio. 69

70 Caregie Mello Stallig Time (cycles) lw $s, 4($) IM RF 4 lw $ + DM $s RF ad $t, $s, $s IM ad Trouble! $s RF $s & DM $t RF or $t, $s4, $s IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 IM sub $s RF $s5 - DM $t2 RF The lw istructio has a two-cycle latecy, therefore a depedet istructio caot use its result util two cycles later. The lw istructio receives data from memory at the ed of cycle 4. But the ad istructio eeds that data as a source operad at the begiig of cycle 4. There is o way to supply the data with forwardig. 7

71 Caregie Mello Stallig Time (cycles) lw $s, 4($) IM RF 4 lw $ + DM $s RF ad $t, $s, $s IM ad $s RF $s $s RF $s & DM $t RF or $t, $s4, $s IM or IM or $s4 RF $s DM $t RF sub $t2, $s, $s5 Stall IM sub $s RF $s5 - DM $t2 RF 7

72 Caregie Mello Stallig Hardware Stalls are supported by: addig eable iputs (EN) to the Fetch ad Decode pipelie registers ad a sychroous reset/clear (CLR) iput to the Execute pipelie register or a INV bit associated with each pipelie register Whe a lw stall occurs StallD ad StallF are asserted to force the Decode ad Fetch stage pipelie registers to hold their old values. FlushE is also asserted to clear the cotets of the Execute stage pipelie register, itroducig a bubble 72

73 Desig of Digital Circuits Lecture 4: Pipeliig Prof. Our Mutlu ETH Zurich Sprig 28 9 April 28

74 Backup Slides 74

75 How to Hadle Data Depedeces Ati ad output depedeces are easier to hadle write to the destiatio i oe stage ad i program order Flow depedeces are more iterestig Five fudametal ways of hadlig flow depedeces Detect ad wait util value is available i register file Detect ad forward/bypass data to depedet istructio Detect ad elimiate the depedece at the software level No eed for the hardware to detect depedece Predict the eeded value(s), execute speculatively, ad verify Do somethig else (fie-graied multithreadig) No eed to detect 75

76 Fie-Graied Multithreadig 76

77 Fie-Graied Multithreadig Idea: Hardware has multiple thread cotexts (PC+registers). Each cycle, fetch egie fetches from a differet thread. By the time the fetched brach/istructio resolves, o istructio is fetched from the same thread Brach/istructio resolutio latecy overlapped with executio of other threads istructios + No logic eeded for hadlig cotrol ad data depedeces withi a thread -- Sigle thread performace suffers -- Extra logic for keepig thread cotexts -- Does ot overlap latecy if ot eough threads to cover the whole pipelie 77

78 Fie-Graied Multithreadig (II) Idea: Switch to aother thread every cycle such that o two istructios from a thread are i the pipelie cocurretly Tolerates the cotrol ad data depedecy latecies by overlappig the latecy with useful work from other threads Improves pipelie utilizatio by takig advatage of multiple threads Thorto, Parallel Operatio i the Cotrol Data 66, AFIPS 964. Smith, A pipelied, shared resource MIMD computer, ICPP

79 Fie-Graied Multithreadig: History CDC 66 s peripheral processig uit is fie-graied multithreaded Thorto, Parallel Operatio i the Cotrol Data 66, AFIPS 964. Processor executes a differet I/O thread every cycle A operatio from the same thread is executed every cycles Deelcor HEP (Heterogeeous Elemet Processor) Smith, A pipelied, shared resource MIMD computer, ICPP threads/processor available ueue vs. uavailable (waitig) ueue for threads each thread ca have oly istructio i the processor pipelie; each thread idepedet to each thread, processor looks like a o-pipelied machie system throughput vs. sigle thread performace tradeoff 79

80 Fie-Graied Multithreadig i HEP Cycle time: s 8 stages à 8 s to complete a istructio assumig o memory access No cotrol ad data depedecy checkig 8

81 Multithreaded Pipelie Example Slide credit: Joel Emer 8

82 Su Niagara Multithreaded Pipelie Kogetira et al., Niagara: A 32-Way Multithreaded Sparc Processor, IEEE Micro

83 Fie-graied Multithreadig Advatages + No eed for depedecy checkig betwee istructios (oly oe istructio i pipelie from a sigle thread) + No eed for brach predictio logic + Otherwise-bubble cycles used for executig useful istructios from differet threads + Improved system throughput, latecy tolerace, utilizatio Disadvatages - Extra hardware complexity: multiple hardware cotexts (PCs, register files, ), thread selectio logic - Reduced sigle thread performace (oe istructio fetched every N cycles from the same thread) - Resource cotetio betwee threads i caches ad memory - Some depedecy checkig logic betwee threads remais (load/store) 83

84 Moder GPUs Are FGMT Machies 84

85 NVIDIA GeForce GTX 285 core 64 KB of storage for thread cotexts (registers) = data-parallel (SIMD) fuc. uit, cotrol shared across 8 uits = multiply-add = multiply = istructio stream decode = executio cotext storage Slide credit: Kayvo Fatahalia 85

86 NVIDIA GeForce GTX 285 core 64 KB of storage for thread cotexts (registers) Groups of 32 threads share istructio stream (each group is a Warp): they execute the same istructio o differet data Up to 32 warps are iterleaved i a FGMT maer Up to 24 thread cotexts ca be stored Slide credit: Kayvo Fatahalia 86

87 NVIDIA GeForce GTX 285 Tex Tex Tex Tex Tex Tex Tex Tex Tex Tex 3 cores o the GTX 285: 3,72 threads Slide credit: Kayvo Fatahalia 87

88 Ed of Fie-Graied Multithreadig 88

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award

More information

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues. Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,

More information

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 17: Pipelining Issues Prof. Onur Mutlu ETH Zurich Spring 2017 28 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Design of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 17: Out-of-Order, DataFlow, Superscalar Executio Prof. Our Mutlu ETH Zurich Sprig 2018 27 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures

More information

CHW 362 : Computer Architecture & Organization

CHW 362 : Computer Architecture & Organization CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali http://bu.edu.eg/staff/ahmedshalaby4# http://www.bu.edu.eg/staff/mona.abdelbaset Review: Instruction Formats R-Type

More information

Computer Architectures

Computer Architectures Computer Architectures Pipelined instruction execution Hazards, stages balancing, super-scalar systems Pavel Píša, Michal Štepanovský, Miroslav Šnorek Main source of inspiration: Patterson Czech Technical

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018

Design of Digital Circuits Lecture 16: Out-of-Order Execution. Prof. Onur Mutlu ETH Zurich Spring April 2018 Desig of Digital Circuits Lecture 16: Out-of-Order Executio Prof. Our Mutlu ETH Zurich Sprig 2018 26 April 2018 Ageda for Today & Next Few Lectures Sigle-cycle Microarchitectures Multi-cycle ad Microprogrammed

More information

Slide Set 7 for Lecture Section 01

Slide Set 7 for Lecture Section 01 Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017 ENCM 369 Winter

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Arquitectura de Computadores

Arquitectura de Computadores Arquitectura de Computadores Capítulo 2. Procesadores segmetados Based o the origial material of the book: D.A. Patterso y J.L. Heessy Computer Orgaizatio ad Desig: The Hardware/Software Iterface 4 th

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors CS252 Sprig 2017 Graduate Computer Architecture Lecture 6: Out-of-Order Processors Lisa Wu, Krste Asaovic http://ist.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 2 WU UCB CS252 SP17 Last Time i Lecture

More information

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle

More information

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05,

More information

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11 page of 8 ENCM 5 Winter 29 Assignment 6 for the Week of March Steve Norman Department of Electrical & Computer Engineering University of Calgary February 29 Assignment instructions and other documents

More information

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining Prof. Yanjing Li University of Chicago Administrative Stuff! Lab1 due at 11:59pm today! Lab2 out " Pipeline ARM simulator "

More information

CENG 5133 Computer Architecture Design Spring Sample Exam 2

CENG 5133 Computer Architecture Design Spring Sample Exam 2 CENG 533 Computer Architecture Design Spring 24 Sample Exam 2. (6 pt) Determine the propagation delay and contamination delay of the following circuit using the gate delays given below. Gate t pd (ps)

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units Desig of Digital Circuits Lecture 21: SIMD Processors II ad Graphics Processig Uits Dr. Jua Gómez Lua Prof. Our Mutlu ETH Zurich Sprig 2018 17 May 2018 New Course: Bachelor s Semiar i Comp Arch Fall 2018

More information

Computer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017

Computer Architecture Lecture 8: SIMD Processors and GPUs. Prof. Onur Mutlu ETH Zürich Fall October 2017 Computer Architecture Lecture 8: SIMD Processors ad GPUs Prof. Our Mutlu ETH Zürich Fall 2017 18 October 2017 Ageda for Today & Next Few Lectures SIMD Processors GPUs Itroductio to GPU Programmig Digitaltechik

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 3: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring 27 6 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Multi-Core Prof. Yajig Li Uiversity of Chicago Course Evaluatio Very importat Please fill out! 2 Lab3 Brach Predictio Competitio 8 teams etered the competitio,

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines This Uit: Damic Schedulig CSE 560 Computer Sstems Architecture Damic Schedulig Slides origiall developed b Drew Hilto (IBM) ad Milo Marti (Uiversit of Peslvaia) App App App Sstem software Mem CPU I/O Code

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1 Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

Design of A Six-stage Pipelined MIPS Processor Based on FPGA

Design of A Six-stage Pipelined MIPS Processor Based on FPGA Design of A Six-stage Pipelined MIPS Processor Based on FPGA Qiao-Zhi Sun, De-Chun Kong, Cheng-Long Zhao, and Hui-Bin Shi Department of Computer Science and Technology, Nanjing University of Aeronautics

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Review Istructio Set Architecture Istructio Set The repertoire of istructios of a computer Differet computers have differet istructio

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 20 Itroductio to Trasactio Processig Cocepts ad Theory Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Trasactio Describes local

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface ARM Editio Chapter 4 The Processor Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler CPI ad Cycle time Determied

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 8: Data Hazard and Resolution James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L08 S1, James C. Hoe, CU/ECE/CALC, 2018 Your goal today Housekeeping detect and resolve

More information

Design of Digital Circuits Lecture 20: SIMD Processors. Prof. Onur Mutlu ETH Zurich Spring May 2018

Design of Digital Circuits Lecture 20: SIMD Processors. Prof. Onur Mutlu ETH Zurich Spring May 2018 Desig of Digital Circuits Lecture 20: SIMD Processors Prof. Our Mutlu ETH Zurich Sprig 2018 11 May 2018 New Course: Bachelor s Semiar i Comp Arch Fall 2018 2 credit uits Rigorous semiar o fudametal ad

More information

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009 ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 6: Pipelining Review Bo Wu Colorado School of Mines Wake up! Time to do laundry! The Laundry Analogy Place one dirty load of clothes in the washer When the

More information

Uniprocessors. HPC Prof. Robert van Engelen

Uniprocessors. HPC Prof. Robert van Engelen Uiprocessors HPC Prof. Robert va Egele Overview PART I: Uiprocessors PART II: Multiprocessors ad ad Compiler Optimizatios Parallel Programmig Models Uiprocessors Multiprocessors Processor architectures

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea 5-1 Chapter 5 Processor Desig Advaced Topics Chapter 5: Processor Desig Advaced Topics Topics 5.3 Microprogrammig Cotrol store ad microbrachig Horizotal ad vertical microprogrammig 5- Chapter 5 Processor

More information

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory! Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio

More information

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

CPE 335 Computer Organization. Basic MIPS Pipelining Part I CPE 335 Computer Organization Basic MIPS Pipelining Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Pipelining

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Prof. Martha A. Kim December 7, 23 Name: First Last (Family) UNI (e.g., mak29) You are allowed 3 hours. You may consult your own

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1 Reliable Trasmissio Sprig 2018 CS 438 Staff - Uiversity of Illiois 1 Reliable Trasmissio Hello! My computer s ame is Alice. Alice Bob Hello! Alice. Sprig 2018 CS 438 Staff - Uiversity of Illiois 2 Reliable

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff Computer rchitecture Microcomputer rchitecture ad Iterfacig Colorado School of Mies Professor William Hoff Computer Hardware Orgaizatio Processor Performs all computatios; coordiates data trasfer Iput

More information

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames Uit 4, Part 3 Recursio Computer Sciece S-111 Harvard Uiversity David G. Sulliva, Ph.D. Review: Method Frames Whe you make a method call, the Java rutime sets aside a block of memory kow as the frame of

More information

A collection of open-sourced RISC-V processors

A collection of open-sourced RISC-V processors Riscy Processors A collectio of ope-sourced RISC-V processors Ady Wright, Sizhuo Zhag, Thomas Bourgeat, Murali Vijayaraghava, Jamey Hicks, Arvid Computatio Structures Group, CSAIL, MIT 4 th RISC-V Workshop

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

Working on the Pipeline

Working on the Pipeline Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder

More information

Topics. Pipelining Lessons (Task, Resource) Pipelining Lessons (Time cycles) Pipelining is Natural! Laundry Example

Topics. Pipelining Lessons (Task, Resource) Pipelining Lessons (Time cycles) Pipelining is Natural! Laundry Example Lecture : Pipelining opics Introduction to pipelining Perfmance Pipelined datapath Design issues Hazards in pipeline ypes F a program with billion instructions, Execution ime =? Pipelining Lessons (ask,

More information

Review: The ACID properties

Review: The ACID properties Recovery Review: The ACID properties A tomicity: All actios i the Xactio happe, or oe happe. C osistecy: If each Xactio is cosistet, ad the DB starts cosistet, it eds up cosistet. I solatio: Executio of

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS 61C: Great Ideas in Computer Architecture Control and Pipelining CS 6C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs6c/sp6 Datapath Control Signals ExtOp: zero, sign

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information