CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu

Size: px
Start display at page:

Download "CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu"

Transcription

1 CENG 3420 Compute Oganization and Design Lectue 07: MIPS Pocesso - II Bei Yu CEG3420 L07.1 Sping 2016

2 Review: Instuction Citical Paths q Calculate cycle time assuming negligible delays (fo muxes, contol unit, sign extend, PC access, shift left 2, wies) except: Instuction and Data Memoy (4 ns) and addes (2 ns) Registe File access (eads o wites) (1 ns) Inst. I Mem Reg Rd Op D Mem Reg W Total R- type load stoe beq jump CEG3420 L07.2 Sping 2016

3 Review: Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently the clock cycle must be timed to accommodate the slowest inst especially poblematic fo moe complex instuctions like floating point multiply Clk Cycle 1 Cycle 2 lw sw Waste q May be wasteful of aea since some functional units (e.g., addes) must be duplicated since they can not be shaed duing a clock cycle but q It is simple and easy to undestand CEG3420 L07.3 Sping 2016

4 How Can We Make It Faste? q Stat fetching and executing the next instuction befoe the cuent one has completed Pipelining (all?) moden pocessos ae pipelined fo pefomance Remembe the pefomance equation: CPU time = CPI * CC * IC q Unde ideal conditions and with a lage numbe of instuctions, the speedup fom pipelining is appoximately equal to the numbe of pipe stages A five stage pipeline is nealy five times faste because the CC is nealy five times faste q Fetch (and execute) moe than one instuction at a time Supescala pocessing stay tuned CEG3420 L07.4 Sping 2016

5 The Five Stages of Load Instuction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB q IFetch: Instuction Fetch and Update PC q Dec: Registes Fetch and Instuction Decode q Exec: Execute R-type; calculate memoy addess q Mem: Read/wite the data fom/to the Data Memoy q WB: Wite the esult data into the egiste file CEG3420 L07.5 Sping 2016

6 A Pipelined MIPS Pocesso q Stat the next instuction befoe the cuent one has completed impoves thoughput - total amount of wok done in a given time instuction latency (execution time, delay time, esponse time - time fom the stat of an instuction to its completion) is not educed Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type IFetch Dec Exec Mem WB - clock cycle (pipeline stage time) is limited by the slowest stage - fo some stages don t need the whole clock cycle (e.g., WB) - fo some instuctions, some stages ae wasted cycles (i.e., nothing is done duing that cycle fo that instuction) CEG3420 L07.6 Sping 2016

7 Single Cycle vesus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle 2 Clk lw sw Waste Pipeline Implementation (CC = 200 ps): 400 ps lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type IFetch Dec Exec Mem WB q To complete an entie instuction in the pipelined case takes 1000 ps (as compaed to 800 ps fo the single cycle case). Why? q How long does each take to complete 1,000,000 adds? CEG3420 L07.7 Sping 2016

8 Pipelining the MIPS ISA q What makes it easy all instuctions ae the same length (32 bits) - can fetch in the 1 st stage and decode in the 2 nd stage few instuction fomats (thee) with symmety acoss fomats - can begin eading egiste file in 2 nd stage memoy opeations occu only in loads and stoes - can use the execute stage to calculate memoy addesses each instuction wites at most one esult (i.e., changes the machine state) and does it in the last few pipeline stages (MEM o WB) opeands must be aligned in memoy so a single data tansfe takes only one data memoy access CEG3420 L07.8 Sping 2016

9 MIPS Pipeline Datapath Additions/Mods q State egistes between each pipeline stage to isolate them IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WiteBack IF/ID ID/EX EX/MEM Add PC 4 Instuction Memoy Read Addess Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data Shift left 2 Add Addess Wite Data Data Memoy Read Data MEM/WB Sign 16 Extend 32 System Clock CEG3420 L07.9 Sping 2016

10 MIPS Pipeline Contol Path Modifications q All contol signals can be detemined duing Decode and held in the state egistes between pipeline stages PCSc ID/EX EX/MEM IF/ID Contol PC 4 Instuction Memoy Read Addess Add RegWite Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data Sign 16 Extend 32 Shift left 2 Sc Add cntl Op Banch Addess Wite Data Data Memoy Read Data MemRead MEM/WB MemtoReg RegDst CEG3420 L07.10 Sping 2016

11 Pipeline Contol q IF Stage: ead Inst Memoy (always asseted) and wite PC (on System Clock) q ID Stage: no optional contol signals to set Reg Dst EX Stage MEM Stage WB Stage Op1 Op0 Sc Bch Mem Read Mem Wite Reg Wite Mem toreg R lw sw X X beq X X CEG3420 L07.11 Sping 2016

12 Gaphically Repesenting MIPS Pipeline q Can help with answeing questions like: How many cycles does it take to execute this code? What is the doing duing cycle 4? Is thee a hazad, why does it occu, and how can it be fixed? CEG3420 L07.12 Sping 2016

13 Othe Pipeline Stuctues Ae Possible q What about the (slow) multiply opeation? Make the clock twice as slow o let it take two cycles (since it doesn t use the DM stage) MUL q What if the data memoy access is twice as slow as the instuction memoy? make the clock twice as slow o let data memoy access take two cycles (and keep the same clock ate) IM Reg DM1 DM2 Reg CEG3420 L07.13 Sping 2016

14 Othe Sample Pipeline Altenatives q ARM7 IM Reg EX PC update IM access decode eg access op DM access shift/otate commit esult (wite back) q XScale PC update BTB access stat IM access IM1 IM2 Reg DM1 Reg SHFT DM2 IM access decode eg 1 access op shift/otate eg 2 access DM wite eg wite stat DM access exception CEG3420 L07.14 Sping 2016

15 Why Pipeline? Fo Pefomance! Time (clock cycles) I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 Once the pipeline is full, one instuction is completed evey cycle, so CPI = 1 Inst 4 Time to fill the pipeline CEG3420 L07.15 Sping 2016

16 Can Pipelining Get Us Into Touble? q Yes: Pipeline Hazads stuctual hazads: - a equied esouce is busy data hazads: - attempt to use data befoe it is eady contol hazads: - deciding on contol action depends on pevious instuction q Can usually esolve hazads by waiting pipeline contol must detect the hazad and take action to esolve hazads CEG3420 L07.16 Sping 2016

17 Stuctue Hazads q Conflict fo use of a esouce q In MIPS pipeline with a single memoy Load/stoe equies data access Instuction fetch equies instuction access q Hence, pipeline datapaths equie sepaate instuction/data memoies O sepaate instuction/data caches q Since Registe File CEG3420 L07.17 Sping 2016

18 Resolve Stuctual Hazad 1 Time (clock cycles) I n s t. lw Inst 1 Mem Reg Mem Reg Mem Reg Mem Reg Reading data fom memoy O d e Inst 2 Inst 3 Mem Reg Mem Reg Mem Reg Mem Reg Inst 4 Reading instuction fom memoy CEG3420 L07.18 Sping 2016 Mem Reg Mem Reg q Fix with sepaate inst and data memoies (I$ and D$)

19 Resolve Stuctual Hazad 2 Time (clock cycles) I n s t. O d e add $1, Inst 1 Inst 2 add $2,$1, Fix egiste file access hazad by doing eads in the second half of the cycle and wites in the fist half clock edge that contols egiste witing clock edge that contols loading of pipeline state egistes CEG3420 L07.19 Sping 2016

20 Data Hazads q Dependencies backwad in time cause hazads I n s t. O d e add $1, sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 q Read befoe wite data hazad CEG3420 L07.20 Sping 2016

21 Data Hazads: Registe Usage q Dependencies backwad in time cause hazads add $1, sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 q Read befoe wite data hazad CEG3420 L07.21 Sping 2016

22 Data Hazads: Load Memoy q Dependencies backwad in time cause hazads I n s t. O d e lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 q Load-use data hazad CEG3420 L07.22 Sping 2016

23 Resolve Data Hazads 1: Inset Stall I n s t. add $1, stall Can fix data hazad by waiting stall but impacts CPI O d e stall sub $4,$1,$5 and $6,$1,$7 CEG3420 L07.23 Sping 2016

24 Resolve Data Hazads 2: Fowading I n s t. add $1, sub $4,$1,$5 Fix data hazads by fowading esults as soon as they ae available to whee they ae needed O d e and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 CEG3420 L07.24 Sping 2016

25 Resolve Data Hazads 2: Fowading I n s t. add $1, sub $4,$1,$5 Fix data hazads by fowading esults as soon as they ae available to whee they ae needed O d e and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 CEG3420 L07.25 Sping 2016

26 Fowad Unit Output Signals CEG3420 L07.26 Sping 2016

27 Datapath with Fowading Hadwae PCSc ID/EX EX/MEM IF/ID Contol PC 4 Instuction Memoy Read Addess Add Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.27 Sping 2016

28 Datapath with Fowading Hadwae PCSc ID/EX EX/MEM IF/ID Contol PC 4 Instuction Memoy Read Addess Add Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB EX/MEM.RegisteRd ID/EX.RegisteRt ID/EX.RegisteRs Fowad Unit MEM/WB.RegisteRd CEG3420 L07.28 Sping 2016

29 Data Fowading Contol Conditions 1. EX Fowad Unit: if (EX/MEM.RegWite and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == ID/EX.RegisteRs)) FowadA = 10 if (EX/MEM.RegWite and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == ID/EX.RegisteRt)) FowadB = MEM Fowad Unit: if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (MEM/WB.RegisteRd == ID/EX.RegisteRs)) FowadA = 01 if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (MEM/WB.RegisteRd == ID/EX.RegisteRt)) FowadB = 01 Fowads the esult fom the pevious inst. to eithe input of the Fowads the esult fom the second pevious inst. to eithe input of the CEG3420 L07.29 Sping 2016

30 Fowading Illustation I n s t. add $1, sub $4,$1,$5 O d e and $6,$7,$1 EX fowading MEM fowading CEG3420 L07.30 Sping 2016

31 Yet Anothe Complication! q Anothe potential data hazad can occu when thee is a conflict between the esult of the WB stage instuction and the MEM stage instuction which should be fowaded? I n s t. O d e add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 CEG3420 L07.31 Sping 2016

32 Yet Anothe Complication! q Anothe potential data hazad can occu when thee is a conflict between the esult of the WB stage instuction and the MEM stage instuction which should be fowaded? I n s t. O d e add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 CEG3420 L07.32 Sping 2016

33 EX: Coected MEM Fowad Unit q MEM Fowad Unit: if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (EX/MEM.RegisteRd!= ID/EX.RegisteRs) and (MEM/WB.RegisteRd == ID/EX.RegisteRs)) FowadA = 01 if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (EX/MEM.RegisteRd!= ID/EX.RegisteRt) and (MEM/WB.RegisteRd == ID/EX.RegisteRt)) FowadB = 01 CEG3420 L07.33 Sping 2016

34 Memoy-to-Memoy Copies q Fo loads immediately followed by stoes (memoy-tomemoy copies) can avoid a stall by adding fowading hadwae fom the MEM/WB egiste to the data memoy input. Would need to add a Fowad Unit and a mux to the MEM stage I n s t. O d e lw $1,4($2) sw $1,4($3) CEG3420 L07.34 Sping 2016

35 Fowading with Load-use Data Hazads I n s t. O d e lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 IM Reg DM CEG3420 L07.35 Sping 2016

36 Fowading with Load-use Data Hazads I n s t. O d e lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 IM Reg DM q Will still need one stall cycle even with fowading CEG3420 L07.36 Sping 2016

37 Fowading with Load-use Data Hazads I n s t. O d e lw $1,4($2) stall sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 IM Reg DM q Will still need one stall cycle even with fowading CEG3420 L07.37 Sping 2016

38 Load-use Hazad Detection Unit (optional) q Need a Hazad detection Unit in the ID stage that insets a stall between the load and its use 1. ID Hazad detection Unit: if (ID/EX.MemRead and ((ID/EX.RegisteRt == IF/ID.RegisteRs) o (ID/EX.RegisteRt == IF/ID.RegisteRt))) stall the pipeline q The fist line tests to see if the instuction now in the EX stage is a lw; the next two lines check to see if the destination egiste of the lw matches eithe souce egiste of the instuction in the ID stage (the load-use instuction) q Afte this one cycle stall, the fowading logic can handle the emaining data hazads CEG3420 L07.38 Sping 2016

39 Adding the Hazad/Stall Hadwae (optional) PCSc Hazad Unit 0 ID/EX EX/MEM PC 4 Instuction Memoy Read Addess Add IF/ID Contol 1 Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.39 Sping 2016

40 Adding the Hazad/Stall Hadwae (optional) PCSc Hazad Unit 0 ID/EX ID/EX.MemRead EX/MEM PC 4 Instuction Memoy Read Addess Add IF/ID Contol 0 1 Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB ID/EX.RegisteRt Fowad Unit CEG3420 L07.40 Sping 2016

41 Contol Hazads q When the flow of instuction addesses is not sequential (i.e., PC = PC + 4); incued by change of flow instuctions Unconditional banches (j, jal, j) Conditional banches (beq, bne) Exceptions q Possible appoaches Stall (impacts CPI) Move decision point as ealy in the pipeline as possible, theeby educing the numbe of stall cycles Delay decision (equies compile suppot) Pedict and hope fo the best! q Contol hazads occu less fequently than data hazads, but thee is nothing as effective against contol hazads as fowading is fo data hazads CEG3420 L07.41 Sping 2016

42 Contol Hazads 1: Jumps Incu One Stall q Jumps not decoded until ID, so one flush is needed To flush, set IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tuning it into a nop) I n s t. j flush Fix jump hazad by waiting flush O d e j taget CEG3420 L07.42 Sping 2016 q Fotunately, jumps ae vey infequent only 3% of the SPECint instuction mix

43 Datapath Banch and Jump Hadwae Jump PCSc Shift left 2 ID/EX EX/MEM IF/ID Contol PC 4 Read Addess Add Instuction Memoy PC+4[31-28] Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.43 Sping 2016

44 Suppoting ID Stage Jumps Jump PCSc Shift left 2 ID/EX EX/MEM IF/ID Contol PC 4 Add Instuction Memoy Read 0 Addess PC+4[31-28] Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.44 Sping 2016

45 Contol Hazads 2: Banch Inst q Dependencies backwad in time cause hazads I n s t. O d e beq lw Inst 3 Inst 4 CEG3420 L07.45 Sping 2016

46 One Way to Fix a Banch Contol Hazad I n s t. beq flush Fix banch hazad by waiting flush but affects CPI O d e flush flush beq taget Inst 3 IM Reg DM CEG3420 L07.46 Sping 2016

47 Anothe Way to Fix a Banch Contol Hazad q Move banch decision hadwae back to as ealy in the pipeline as possible i.e., duing the decode cycle I n s t. beq flush Fix banch hazad by waiting flush O d e beq taget Inst 3 IM Reg DM CEG3420 L07.47 Sping 2016

48 Two Types of Stalls q Nop instuction (o bubble) inseted between two instuctions in the pipeline (as done fo load-use situations) Keep the instuctions ealie in the pipeline (late in the code) fom pogessing down the pipeline fo a cycle ( bounce them in place with wite contol signals) Inset nop by zeoing contol bits in the pipeline egiste at the appopiate stage Let the instuctions late in the pipeline (ealie in the code) pogess nomally down the pipeline q Flushes (o instuction squashing) wee an instuction in the pipeline is eplaced with a nop instuction (as done fo instuctions located sequentially afte j instuctions) Zeo the contol bits fo the instuction to be flushed CEG3420 L07.48 Sping 2016

49 Reducing the Delay of Banches q Move the banch decision hadwae back to the EX stage Reduces the numbe of stall (flush) cycles to two Adds an and gate and a 2x1 mux to the EX timing path q Add hadwae to compute the banch taget addess and evaluate the banch decision to the ID stage Reduces the numbe of stall (flush) cycles to one (like with jumps) - But now need to add fowading hadwae in ID stage Computing banch taget addess can be done in paallel with RegFile ead (done fo all instuctions only used when needed) Compaing the egistes can t be done until afte RegFile ead, so compaing and updating the PC adds a mux, a compaato, and an and gate to the ID timing path q Fo deepe pipelines, banch decision points can be even late in the pipeline, incuing moe stalls CEG3420 L07.49 Sping 2016

50 ID Banch Fowading Issues q MEM/WB fowading is taken cae of by the nomal RegFile wite befoe ead opeation WB add3 $1, MEM add2 $3, EX add1 $4, ID beq $1,$2,Loop IF next_seq_inst q Need to fowad fom the EX/MEM pipeline stage to the ID compaison hadwae fo cases like WB add3 $3, MEM add2 $1, EX add1 $4, ID beq $1,$2,Loop IF next_seq_inst if (IDcontol.Banch and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == IF/ID.RegisteRs)) FowadC = 1 if (IDcontol.Banch and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == IF/ID.RegisteRt)) FowadD = 1 Fowads the esult fom the second pevious inst. to eithe input of the compae CEG3420 L07.50 Sping 2016

51 ID Banch Fowading Issues, con t q If the instuction immediately befoe the banch poduces one of the banch souce opeands, then a stall needs to be inseted (between the WB add3 $3, MEM add2 $4, EX add1 $1, ID beq $1,$2,Loop IF next_seq_inst beq and add1) since the EX stage opeation is occuing at the same time as the ID stage banch compae opeation Bounce the beq (in ID) and next_seq_inst (in IF) in place (ID Hazad Unit deassets PC.Wite and IF/ID.Wite) Inset a stall between the add in the EX stage and the beq in the ID stage by zeoing the contol bits going into the ID/EX pipeline egiste (done by the ID Hazad Unit) q If the banch is found to be taken, then flush the instuction cuently in IF (IF.Flush) CEG3420 L07.51 Sping 2016

52 Suppoting ID Stage Banches (optional) PCSc Banch Hazad Unit 0 1 ID/EX EX/MEM IF/ID Contol 0 PC 4 Add Instuction Memoy Read 0 Addess IF.Flush Shift left 2 Read Add 1 RegFile Read Add 2 Read Data 1 Wite Add ReadData 2 Wite Data 16 Sign Extend Add 32 Compae cntl Data Memoy Read Data Addess Wite Data MEM/WB Fowad Unit Fowad Unit CEG3420 L07.52 Sping 2016

53 Delayed Banches q If the banch hadwae has been moved to the ID stage, then we can eliminate all banch stalls with delayed banches which ae defined as always executing the next sequential instuction afte the banch instuction the banch takes effect afte that next instuction MIPS compile moves an instuction to immediately afte the banch that is not affected by the banch (a safe instuction) theeby hiding the banch delay q With deepe pipelines, the banch delay gows equiing moe than one delay slot Delayed banches have lost populaity compaed to moe expensive but moe flexible (dynamic) hadwae banch pediction Gowth in available tansistos has made hadwae banch pediction elatively cheape CEG3420 L07.53 Sping 2016

54 Scheduling Banch Delay Slots A. Fom befoe banch B. Fom banch taget C. Fom fall though add $1,$2,$3 if $2=0 then delay slot sub $4,$5,$6 add $1,$2,$3 if $1=0 then delay slot q A is the best choice, fills delay slot and educes IC add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes becomes becomes add $1,$2,$3 if $2=0 then if $1=0 then add $1,$2,$3 add $1,$2,$3 if $1=0 then sub $4,$5,$6 sub $4,$5,$6 q In B and C, the sub instuction may need to be copied, inceasing IC q In B and C, must be okay to execute sub when banch fails CEG3420 L07.54 Sping 2016

55 Static Banch Pediction q Resolve banch hazads by assuming a given outcome and poceeding without waiting to see the actual banch outcome 1. Pedict not taken always pedict banches will not be taken, continue to fetch fom the sequential instuction steam, only when banch is taken does the pipeline stall If taken, flush instuctions afte the banch (ealie in the pipeline) - in IF, ID, and EX stages if banch logic in MEM thee stalls - In IF and ID stages if banch logic in EX two stalls - in IF stage if banch logic in ID one stall ensue that those flushed instuctions haven t changed the machine state automatic in the MIPS pipeline since machine state changing opeations ae at the tail end of the pipeline (MemWite (in MEM) o RegWite (in WB)) estat the pipeline at the banch destination CEG3420 L07.55 Sping 2016

56 Flushing with Mispediction (Not Taken) I n s t. 4 beq $1,$2,2 8 sub $4,$1,$5 O d e q To flush the IF stage instuction, asset IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tansfoming it into a nop) CEG3420 L07.56 Sping 2016

57 Flushing with Mispediction (Not Taken) I n s t. O d e 4 beq $1,$2,2 8 flush sub $4,$1,$5 16 and $6,$1,$7 20 o 8,$1,$9 q To flush the IF stage instuction, asset IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tansfoming it into a nop) CEG3420 L07.57 Sping 2016

58 Banching Stuctues q Pedict not taken woks well fo top of the loop banching stuctues Loop: beq $1,$2,Out But such loops have jumps at the bottom of the loop to etun to the top of the loop and incu the jump stall ovehead 1 nd loop inst... last loop inst j Loop Out: fall out inst q Pedict not taken doesn t wok well fo bottom of the loop banching stuctues Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst CEG3420 L07.58 Sping 2016

59 Static Banch Pediction, con t q Resolve banch hazads by assuming a given outcome and poceeding 2. Pedict taken pedict banches will always be taken Pedict taken always incus one stall cycle (if banch destination hadwae has been moved to the ID stage) Is thee a way to cache the addess of the banch taget instuction?? q As the banch penalty inceases (fo deepe pipelines), a simple static pediction scheme will hut pefomance. With moe hadwae, it is possible to ty to pedict banch behavio dynamically duing pogam execution 3. Dynamic banch pediction pedict banches at untime using un-time infomation CEG3420 L07.59 Sping 2016

60 Dynamic Banch Pediction q A banch pediction buffe (aka banch histoy table (BHT)) in the IF stage addessed by the lowe bits of the PC, contains bit(s) passed to the ID stage though the IF/ID pipeline egiste that tells whethe the banch was taken the last time it was execute Pediction bit may pedict incoectly (may be a wong pediction fo this banch this iteation o may be fom a diffeent banch with the same low ode PC bits) but the doesn t affect coectness, just pefomance - Banch decision occus in the ID stage afte detemining that the fetched instuction is a banch and checking the pediction bit(s) If the pediction is wong, flush the incoect instuction(s) in pipeline, estat the pipeline with the ight instuction, and invet the pediction bit(s) - A 4096 bit BHT vaies fom 1% mispediction (nasa7, tomcatv) to 18% (eqntott) CEG3420 L07.60 Sping 2016

61 Banch Taget Buffe q The BHT pedicts when a banch is taken, but does not tell whee its taken to! A banch taget buffe (BTB) in the IF stage caches the banch taget addess, but we also need to fetch the next sequential instuction. The pediction bit in IF/ID selects which next instuction will be loaded into IF/ID at the next clock edge - Would need a two ead pot instuction memoy O the BTB can cache the banch taken instuction while the instuction memoy is fetching the next sequential instuction PC BTB Instuction Memoy Read 0 Addess q If the pediction is coect, stalls can be avoided no matte which diection they go CEG3420 L07.61 Sping 2016

62 1-bit Pediction Accuacy q q A 1-bit pedicto will be incoect twice when not taken Assume pedict_bit = 0 to stat (indicating banch not taken) and loop contol is at the bottom of the loop code 1. Fist time though the loop, the pedicto mispedicts the banch since the banch is taken back to the top of the loop; invet pediction bit (pedict_bit = 1) 2. As long as banch is taken (looping), pediction is coect 3. Exiting the loop, the pedicto again mispedicts the banch since this time the banch is not taken falling out of the loop; invet pediction bit (pedict_bit = 0) Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst Fo 10 times though the loop we have a 80% pediction accuacy fo a banch that is taken 90% of the time CEG3420 L07.62 Sping 2016

63 2-bit Pedictos q A 2-bit scheme can give 90% accuacy since a pediction must be wong twice befoe the pediction bit is changed Taken Taken Pedict Taken Pedict Not Taken Not taken Taken Not taken Taken Pedict Taken Not taken Pedict Not Taken Not taken Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst CEG3420 L07.63 Sping 2016

64 2-bit Pedictos q A 2-bit scheme can give 90% accuacy since a pediction must be wong twice befoe the pediction bit is changed ight 9 times 1 Taken 0 Taken Pedict Taken 11 Pedict 01 Not Taken wong on loop fall out Not taken Taken ight on 1 st iteation Not taken Taken Pedict 10 Taken 1 Not taken 0 00Pedict Not Taken Not taken Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst q BHT also stoes the initial FSM state CEG3420 L07.64 Sping 2016

65 Dealing with Exceptions q Exceptions (aka inteupts) ae just anothe fom of contol hazad. Exceptions aise fom R-type aithmetic oveflow Tying to execute an undefined instuction An I/O device equest An OS sevice equest (e.g., a page fault, TLB exception) A hadwae malfunction q The pipeline has to stop executing the offending instuction in midsteam, let all pio instuctions complete, flush all following instuctions, set a egiste to show the cause of the exception, save the addess of the offending instuction, and then jump to a peaanged addess (the addess of the exception handle code) q The softwae (OS) looks at the cause of the exception and deals with it CEG3420 L07.65 Sping 2016

66 Two Types of Exceptions q Inteupts asynchonous to pogam execution caused by extenal events may be handled between instuctions, so can let the instuctions cuently active in the pipeline complete befoe passing contol to the OS inteupt handle simply suspend and esume use pogam q Taps (Exception) synchonous to pogam execution caused by intenal events condition must be emedied by the tap handle fo that instuction, so much stop the offending instuction midsteam in the pipeline and pass contol to the OS tap handle the offending instuction may be etied (o simulated by the OS) and the pogam may continue o it may be aboted CEG3420 L07.66 Sping 2016

67 Whee in the Pipeline Exceptions Occu q Aithmetic oveflow q Undefined instuction q TLB o page fault q I/O sevice equest q Hadwae malfunction Stage(s)? Synchonous? CEG3420 L07.67 Sping 2016

68 Whee in the Pipeline Exceptions Occu q Aithmetic oveflow q Undefined instuction q TLB o page fault q I/O sevice equest q Hadwae malfunction Stage(s)? EX ID IF, MEM any any Synchonous? yes yes yes no no q Bewae that multiple exceptions can occu simultaneously in a single clock cycle CEG3420 L07.68 Sping 2016

69 Multiple Simultaneous Exceptions I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 q Hadwae sots the exceptions so that the ealiest instuction is the one inteupted fist CEG3420 L07.69 Sping 2016

70 Multiple Simultaneous Exceptions I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 D$ page fault aithmetic oveflow undefined instuction Inst 4 I$ page fault q Hadwae sots the exceptions so that the ealiest instuction is the one inteupted fist CEG3420 L07.70 Sping 2016

71 Additions to MIPS to Handle Exceptions (optional) q Cause egiste (ecods exceptions) hadwae to ecod in Cause the exceptions and a signal to contol wites to it (CauseWite) q EPC egiste (ecods the addesses of the offending instuctions) hadwae to ecod in EPC the addess of the offending instuction and a signal to contol wites to it (EPCWite) Exception softwae must match exception to instuction q A way to load the PC with the addess of the exception handle Expand the PC input mux whee the new input is hadwied to the exception handle addess - (e.g., hex fo aithmetic oveflow) q A way to flush offending instuction and the ones that follow it CEG3420 L07.71 Sping 2016

72 Datapath with Contols fo Exceptions (optional) PC 4 Instuction Memoy Read 0 Addess hex Add IF.Flush PCSc IF/ID Hazad Unit Contol Shift left 2 Read Add 1 RegFile Read Add 2 Read Data 1 Wite Add ReadData 2 Wite Data 16 0 Sign Extend Fowad Unit Banch ID.Flush 1 0 Add 32 Compae ID/EX Cause EPC EX.Flush 0 0 cntl Fowad Unit EX/MEM Data Memoy Read Data Addess Wite Data MEM/WB CEG3420 L07.72 Sping 2016

73 Summay q All moden day pocessos use pipelining fo pefomance (a CPI of 1 and a fast CC) q Pipeline clock ate limited by slowest pipeline stage so designing a balanced pipeline is impotant q Must detect and esolve hazads Stuctual hazads esolved by designing the pipeline coectly Data hazads - Stall (impacts CPI) - Fowad (equies hadwae suppot) Contol hazads put the banch decision hadwae in as ealy a stage in the pipeline as possible - Stall (impacts CPI) - Delay decision (equies compile suppot) - Static and dynamic pediction (equies hadwae suppot) q Pipelining complicates exception handling CEG3420 L07.73 Sping 2016

CENG 3420 Lecture 07: Pipeline

CENG 3420 Lecture 07: Pipeline CENG 3420 Lectue 07: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L07.1 Sping 2017 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.2 Sping

More information

Computer Science 141 Computing Hardware

Computer Science 141 Computing Hardware Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431

More information

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)

Chapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards) Chapte 4 (Pat III) The Pocesso: Datapath and Contol (Pipeline Hazads) 陳瑞奇 (J.C. Chen) 亞洲大學資訊工程學系 Adapted fom class notes by Pof. M.J. Iwin, PSU and Pof. D. Patteson, UCB 1 吃感冒藥副作用怎麼辦? http://big5.sznews.com/health/images/attachement/jpg/site3/20120319/001558d90b3310d0c1683e.jpg

More information

The Processor: Improving Performance Data Hazards

The Processor: Improving Performance Data Hazards The Pocesso: Impoving Pefomance Data Hazads Monday 12 Octobe 15 Many slides adapted fom: and Design, Patteson & Hennessy 5th Edition, 2014, MK and fom Pof. May Jane Iwin, PSU Summay Pevious Class Pipeline

More information

COSC 6385 Computer Architecture. - Pipelining

COSC 6385 Computer Architecture. - Pipelining COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,

More information

Introduction To Pipelining. Chapter Pipelining1 1

Introduction To Pipelining. Chapter Pipelining1 1 Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?

More information

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20

Administrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20 Administivia CMSC 411 Compute Systems Achitectue Lectue 5 Basic Pipelining (cont.) Alan Sussman als@cs.umd.edu as@csu dedu Homewok poblems fo Unit 1 due today Homewok poblems fo Unit 3 posted soon CMSC

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam

More information

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines 1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep

More information

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned

More information

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture

Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture Compute Achitectue Pipelining and nstuction Level Paallelism An ntoduction Adapted fom COD2e by Hennessy & Patteson Slide 1 Outline of This Lectue ntoduction to the Concept of Pipelined Pocesso Pipelined

More information

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.

More information

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1 CMCS 611-101 Advanced Compute Achitectue Lectue 6 Intoduction to Pipelining Septembe 23, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Compute Achitectue 1 Pevious

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called

More information

CENG 3420 Lecture 06: Pipeline

CENG 3420 Lecture 06: Pipeline CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2019 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2

More information

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

CSE4201. Computer Architecture

CSE4201. Computer Architecture CSE 4201 Compute Achitectue Pof. Mokhta Aboelaze Pats of these slides ae taken fom Notes by Pof. David Patteson at UCB Outline MIPS and instuction set Simple pipeline in MIPS Stuctual and data hazads Fowading

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism Agenda CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuc>on Level Paallelism Instuctos: Randy H. Katz David A. PaJeson hjp://inst.eecs.bekeley.edu/~cs61c/fa10 Review Instuc>on Set Design

More information

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4) PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,

More information

Review from last lecture

Review from last lecture CSE820 Gaduate Compute Achitectue Week 3 Pefomance + Pipeline Review Based on slides by David Patteson Review fom last lectue Tacking and extapolating technology pat of achitect s esponsibility Expect

More information

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

CS 2461: Computer Architecture 1 Program performance and High Performance Processors Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks

More information

Lecture #22 Pipelining II, Cache I

Lecture #22 Pipelining II, Cache I inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html

More information

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011 CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuction Level Paallelism: Multiple Instuction Issue Guest Lectue: Justin Hsia Softwae Paallel Requests Assigned to compute e.g., Seach Katz

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue

CS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue CS 61C: Geat Ideas in Compute Achitectue Instuc(on Level Paallelism: Mul(ple Instuc(on Issue Instuctos: Kste Asanovic, Randy H. Katz hbp://inst.eecs.bekeley.edu/~cs61c/fa12 1 Paallel Requests Assigned

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade

Review: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade EECS 252 Gaduate Compute Achitectue Lectue 2 ℵ 0 Review of Instuction Sets, Pipelines, and Caches Januay 26 th, 2009 Review Mooe s Law John Kubiatowicz Electical Engineeing and Compute Sciences Univesity

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Lecture 9 Pipeline and Cache

Lecture 9 Pipeline and Cache Lecture 9 Pipeline and Cache Peng Liu liupeng@zju.edu.cn 1 What makes it easy Pipelining Review all instructions are the same length just a few instruction formats memory operands appear only in loads

More information

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

CPE 335 Computer Organization. Basic MIPS Pipelining Part I CPE 335 Computer Organization Basic MIPS Pipelining Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Pipelining

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

Any modern computer system will incorporate (at least) two levels of storage:

Any modern computer system will incorporate (at least) two levels of storage: 1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe

More information

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

MIPS An ISA for Pipelining

MIPS An ISA for Pipelining Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

Overview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design

Overview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design S 152 ompute chitectue and Engineeing Lectue 11 Multicycle ontolle Design Oveview of ontol ontol may be designed using one of seveal initial epesentations. The choice of sequence contol, and how logic

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies Acceleating Stoage with RDMA Max Gutovoy Mellanox Technologies 2018 Stoage Develope Confeence EMEA. Mellanox Technologies. All Rights Reseved. 1 What is RDMA? Remote Diect Memoy Access - povides the ability

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

A Memory Efficient Array Architecture for Real-Time Motion Estimation

A Memory Efficient Array Architecture for Real-Time Motion Estimation A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

CPE 335. Basic MIPS Architecture Part II

CPE 335. Basic MIPS Architecture Part II CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009 ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Introduction to Pipelined Datapath

Introduction to Pipelined Datapath 14:332:331 Computer Architecture and Assembly Language Week 12 Introduction to Pipelined Datapath [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 W12.1 Review:

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly

Pre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly 332 Advanced Compute Achitectue Chapte 1 Intoduction and eview of Pipelines, Pefomance, Caches, and Vitual Januay 2009 Paul H J Kelly These lectue notes ae patly based on the couse text, Hennessy and Patteson

More information

Getting Started PMW-EX1/PMW-EX3. 1 Rotate the grip with the RELEASE button pressed. Overview. Connecting the Computer and PMW-EX1/EX3

Getting Started PMW-EX1/PMW-EX3. 1 Rotate the grip with the RELEASE button pressed. Overview. Connecting the Computer and PMW-EX1/EX3 A PMW-EX1/PMW-EX3 Getting Stated Oveview This document descibes how to use the XDCAM EX Vesion Up Tool (heeafte Vesion Up Tool ) to upgade the PMW-EX1 and PMW-EX3 to vesion 1.20 (PMW-EX1) o vesion 1.10

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

CENG 3420 Lecture 06: Datapath

CENG 3420 Lecture 06: Datapath CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Design a MIPS Processor (2/2)

Design a MIPS Processor (2/2) 93-2Digital System Design Design a MIPS Processor (2/2) Lecturer: Chihhao Chao Advisor: Prof. An-Yeu Wu 2005/5/13 Friday ACCESS IC LABORTORY Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

Pipelining: Hazards Ver. Jan 14, 2014

Pipelining: Hazards Ver. Jan 14, 2014 POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4 IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE154A Introduction to Computer Architecture. Homework 4 solution ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID

More information

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma apreduce Optimizations and Algoithms 2015 Pofesso Sasu Takoma www.cs.helsinki.fi Optimizations Reduce tasks cannot stat befoe the whole map phase is complete Thus single slow machine can slow down the

More information