CENG 3420 Lecture 07: Pipeline
|
|
- Marvin Baldwin
- 5 years ago
- Views:
Transcription
1 CENG 3420 Lectue 07: Pipeline Bei Yu CENG3420 L07.1 Sping 2017
2 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.2 Sping 2017
3 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.3 Sping 2017
4 Clocking Methodologies q Clocking methodology defines when signals can be ead and when they can be witten falling (negative) edge clock cycle ising (positive) edge clock ate = 1/(clock cycle) e.g., 10 nsec clock cycle = 100 MHz clock ate 1 nsec clock cycle = 1 GHz clock ate q State element design choices level sensitive latch maste-slave and edge-tiggeed flipflops CENG3420 L07.4 Sping 2017
5 Review:Latches vs Flipflops q Output is equal to the stoed value inside the element q Change of state (value) is based on the clock Latches: output changes wheneve the inputs change and the clock is asseted (level sensitive methodology) - Two-sided timing constaint Flip-flop: output changes only on a clock edge (edgetiggeed methodology) - One-sided timing constaint A clocking methodology defines when signals can be ead and witten would NOT want to ead a signal at the same time it was being witten CENG3420 L07.5 Sping 2017
6 Review: Design A Latch q Stoe one bit of infomation: coss-coupled inveto = q How to change the value stoed? R: eset signal S: set signal SR-Latch othe Latch stuctues CENG3420 L07.6 Sping 2017
7 Review: Design A Flip-Flop q Based on Gated Latch = q Maste-slave positive-edge-tiggeed D flip-flop CENG3420 L07.7 Sping 2017
8 Review: Latch and Flip-Flop q Latch is level-sensitive q Flip-flop is edge tiggeed CENG3420 L07.8 Sping 2017
9 Ou Implementation q An edge-tiggeed methodology q Typical execution ead contents of some state elements send values though some combinational logic wite esults to one o moe state elements State element 1 Combinational logic State element 2 clock one clock cycle q Assumes state elements ae witten on evey clock cycle; if not, need explicit wite contol signal wite occus only when both the wite contol is asseted and the clock edge occus CENG3420 L07.9 Sping 2017
10 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.10 Sping 2017
11 Review: Instuction Citical Paths q Calculate cycle time assuming negligible delays (fo muxes, contol unit, sign extend, PC access, shift left 2, wies) except: Instuction and Data Memoy (4 ns) and addes (2 ns) Registe File access (eads o wites) (1 ns) Inst. I Mem Reg Rd Op D Mem Reg W Total R- type load stoe beq jump CENG3420 L07.11 Sping 2017
12 Review: Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently the clock cycle must be timed to accommodate the slowest inst especially poblematic fo moe complex instuctions like floating point multiply Clk Cycle 1 Cycle 2 lw sw Waste q May be wasteful of aea since some functional units (e.g., addes) must be duplicated since they can not be shaed duing a clock cycle but q It is simple and easy to undestand CENG3420 L07.12 Sping 2017
13 How Can We Make It Faste? q Stat fetching and executing the next instuction befoe the cuent one has completed Pipelining (all?) moden pocessos ae pipelined fo pefomance Remembe the pefomance equation: CPU time = CPI * CC * IC q Unde ideal conditions and with a lage numbe of instuctions, the speedup fom pipelining is appoximately equal to the numbe of pipe stages A five stage pipeline is nealy five times faste because the CC is nealy five times faste q Fetch (and execute) moe than one instuction at a time Supescala pocessing stay tuned CEG3420 L07.13 Sping 2016
14 The Five Stages of Load Instuction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB q IFetch: Instuction Fetch and Update PC q Dec: Registes Fetch and Instuction Decode q Exec: Execute R-type; calculate memoy addess q Mem: Read/wite the data fom/to the Data Memoy q WB: Wite the esult data into the egiste file CEG3420 L07.14 Sping 2016
15 A Pipelined MIPS Pocesso q Stat the next instuction befoe the cuent one has completed impoves thoughput - total amount of wok done in a given time instuction latency (execution time, delay time, esponse time - time fom the stat of an instuction to its completion) is not educed Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type IFetch Dec Exec Mem WB - clock cycle (pipeline stage time) is limited by the slowest stage - fo some stages don t need the whole clock cycle (e.g., WB) - fo some instuctions, some stages ae wasted cycles (i.e., nothing is done duing that cycle fo that instuction) CEG3420 L07.15 Sping 2016
16 Single Cycle vesus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle 2 Clk lw sw Waste Pipeline Implementation (CC = 200 ps): 400 ps lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type IFetch Dec Exec Mem WB q To complete an entie instuction in the pipelined case takes 1000 ps (as compaed to 800 ps fo the single cycle case). Why? q How long does each take to complete 1,000,000 adds? CEG3420 L07.16 Sping 2016
17 Pipelining the MIPS ISA q What makes it easy all instuctions ae the same length (32 bits) - can fetch in the 1 st stage and decode in the 2 nd stage few instuction fomats (thee) with symmety acoss fomats - can begin eading egiste file in 2 nd stage memoy opeations occu only in loads and stoes - can use the execute stage to calculate memoy addesses each instuction wites at most one esult (i.e., changes the machine state) and does it in the last few pipeline stages (MEM o WB) opeands must be aligned in memoy so a single data tansfe takes only one data memoy access CEG3420 L07.17 Sping 2016
18 MIPS Pipeline Datapath Additions/Mods q State egistes between each pipeline stage to isolate them IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WiteBack IF/ID ID/EX EX/MEM Add PC 4 Instuction Memoy Read Addess Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data Shift left 2 Add Addess Wite Data Data Memoy Read Data MEM/WB Sign 16 Extend 32 System Clock CEG3420 L07.18 Sping 2016
19 MIPS Pipeline Contol Path Modifications q All contol signals can be detemined duing Decode and held in the state egistes between pipeline stages PCSc ID/EX EX/MEM IF/ID Contol PC 4 Instuction Memoy Read Addess Add RegWite Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data Sign 16 Extend 32 Shift left 2 Sc Add cntl Op Banch Addess Wite Data Data Memoy Read Data MemRead MEM/WB MemtoReg RegDst CEG3420 L07.19 Sping 2016
20 Pipeline Contol q IF Stage: ead Inst Memoy (always asseted) and wite PC (on System Clock) q ID Stage: no optional contol signals to set Reg Dst EX Stage MEM Stage WB Stage Op1 Op0 Sc Bch Mem Read Mem Wite Reg Wite Mem toreg R lw sw X X beq X X CEG3420 L07.20 Sping 2016
21 Gaphically Repesenting MIPS Pipeline q Can help with answeing questions like: How many cycles does it take to execute this code? What is the doing duing cycle 4? Is thee a hazad, why does it occu, and how can it be fixed? CEG3420 L07.21 Sping 2016
22 Othe Pipeline Stuctues Ae Possible q What about the (slow) multiply opeation? Make the clock twice as slow o let it take two cycles (since it doesn t use the DM stage) MUL q What if the data memoy access is twice as slow as the instuction memoy? make the clock twice as slow o let data memoy access take two cycles (and keep the same clock ate) IM Reg DM1 DM2 Reg CEG3420 L07.22 Sping 2016
23 Othe Sample Pipeline Altenatives q ARM7 IM Reg EX PC update IM access decode eg access op DM access shift/otate commit esult (wite back) q XScale PC update BTB access stat IM access IM1 IM2 Reg DM1 Reg SHFT DM2 IM access decode eg 1 access op shift/otate eg 2 access DM wite eg wite stat DM access exception CEG3420 L07.23 Sping 2016
24 Why Pipeline? Fo Pefomance! Time (clock cycles) I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 Once the pipeline is full, one instuction is completed evey cycle, so CPI = 1 Inst 4 Time to fill the pipeline CEG3420 L07.24 Sping 2016
25 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CEG3420 L07.25 Sping 2016
26 Can Pipelining Get Us Into Touble? q Yes: Pipeline Hazads stuctual hazads: - a equied esouce is busy data hazads: - attempt to use data befoe it is eady contol hazads: - deciding on contol action depends on pevious instuction q Can usually esolve hazads by waiting pipeline contol must detect the hazad and take action to esolve hazads CEG3420 L07.26 Sping 2016
27 Stuctue Hazads q Conflict fo use of a esouce q In MIPS pipeline with a single memoy Load/stoe equies data access Instuction fetch equies instuction access q Hence, pipeline datapaths equie sepaate instuction/data memoies O sepaate instuction/data caches q Since Registe File CEG3420 L07.27 Sping 2016
28 Resolve Stuctual Hazad 1 Time (clock cycles) I n s t. lw Inst 1 Mem Reg Mem Reg Mem Reg Mem Reg Reading data fom memoy O d e Inst 2 Inst 3 Mem Reg Mem Reg Mem Reg Mem Reg Inst 4 Reading instuction fom memoy CEG3420 L07.28 Sping 2016 Mem Reg Mem Reg q Fix with sepaate inst and data memoies (I$ and D$)
29 Resolve Stuctual Hazad 2 Time (clock cycles) I n s t. O d e add $1, Inst 1 Inst 2 add $2,$1, Fix egiste file access hazad by doing eads in the second half of the cycle and wites in the fist half clock edge that contols egiste witing clock edge that contols loading of pipeline state egistes CEG3420 L07.29 Sping 2016
30 Data Hazads q Dependencies backwad in time cause hazads I n s t. O d e add $1, sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 q Read befoe wite data hazad CEG3420 L07.30 Sping 2016
31 Data Hazads: Registe Usage q Dependencies backwad in time cause hazads add $1, sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 q Read befoe wite data hazad CEG3420 L07.31 Sping 2016
32 Data Hazads: Load Memoy q Dependencies backwad in time cause hazads I n s t. O d e lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 q Load-use data hazad CEG3420 L07.32 Sping 2016
33 Resolve Data Hazads 1: Inset Stall I n s t. add $1, stall Can fix data hazad by waiting stall but impacts CPI O d e stall sub $4,$1,$5 and $6,$1,$7 CEG3420 L07.33 Sping 2016
34 Resolve Data Hazads 2: Fowading I n s t. add $1, sub $4,$1,$5 Fix data hazads by fowading esults as soon as they ae available to whee they ae needed O d e and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 CEG3420 L07.34 Sping 2016
35 Resolve Data Hazads 2: Fowading I n s t. add $1, sub $4,$1,$5 Fix data hazads by fowading esults as soon as they ae available to whee they ae needed O d e and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 CEG3420 L07.35 Sping 2016
36 Fowad Unit Output Signals CEG3420 L07.36 Sping 2016
37 Datapath with Fowading Hadwae PCSc ID/EX EX/MEM IF/ID Contol PC 4 Instuction Memoy Read Addess Add Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.37 Sping 2016
38 Datapath with Fowading Hadwae PCSc ID/EX EX/MEM IF/ID Contol PC 4 Instuction Memoy Read Addess Add Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB EX/MEM.RegisteRd ID/EX.RegisteRt ID/EX.RegisteRs Fowad Unit MEM/WB.RegisteRd CEG3420 L07.38 Sping 2016
39 Data Fowading Contol Conditions 1. EX Fowad Unit: if (EX/MEM.RegWite and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == ID/EX.RegisteRs)) FowadA = 10 if (EX/MEM.RegWite and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == ID/EX.RegisteRt)) FowadB = MEM Fowad Unit: if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (MEM/WB.RegisteRd == ID/EX.RegisteRs)) FowadA = 01 if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (MEM/WB.RegisteRd == ID/EX.RegisteRt)) FowadB = 01 Fowads the esult fom the pevious inst. to eithe input of the Fowads the esult fom the second pevious inst. to eithe input of the CEG3420 L07.39 Sping 2016
40 Fowading Illustation I n s t. add $1, sub $4,$1,$5 O d e and $6,$7,$1 EX fowading MEM fowading CEG3420 L07.40 Sping 2016
41 Yet Anothe Complication! q Anothe potential data hazad can occu when thee is a conflict between the esult of the WB stage instuction and the MEM stage instuction which should be fowaded? I n s t. O d e add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 CEG3420 L07.41 Sping 2016
42 Yet Anothe Complication! q Anothe potential data hazad can occu when thee is a conflict between the esult of the WB stage instuction and the MEM stage instuction which should be fowaded? I n s t. O d e add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 CEG3420 L07.42 Sping 2016
43 EX: Coected MEM Fowad Unit q MEM Fowad Unit: if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (EX/MEM.RegisteRd!= ID/EX.RegisteRs) and (MEM/WB.RegisteRd == ID/EX.RegisteRs)) FowadA = 01 if (MEM/WB.RegWite and (MEM/WB.RegisteRd!= 0) and (EX/MEM.RegisteRd!= ID/EX.RegisteRt) and (MEM/WB.RegisteRd == ID/EX.RegisteRt)) FowadB = 01 CEG3420 L07.43 Sping 2016
44 Memoy-to-Memoy Copies q Fo loads immediately followed by stoes (memoy-tomemoy copies) can avoid a stall by adding fowading hadwae fom the MEM/WB egiste to the data memoy input. Would need to add a Fowad Unit and a mux to the MEM stage I n s t. O d e lw $1,4($2) sw $1,4($3) CEG3420 L07.44 Sping 2016
45 Fowading with Load-use Data Hazads I n s t. O d e lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 IM Reg DM CEG3420 L07.45 Sping 2016
46 Fowading with Load-use Data Hazads I n s t. O d e lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 IM Reg DM q Will still need one stall cycle even with fowading CEG3420 L07.46 Sping 2016
47 Fowading with Load-use Data Hazads I n s t. O d e lw $1,4($2) stall sub $4,$1,$5 and $6,$1,$7 o $8,$1,$9 xo $4,$1,$5 IM Reg DM q Will still need one stall cycle even with fowading CEG3420 L07.47 Sping 2016
48 Load-use Hazad Detection Unit (optional) q Need a Hazad detection Unit in the ID stage that insets a stall between the load and its use 1. ID Hazad detection Unit: if (ID/EX.MemRead and ((ID/EX.RegisteRt == IF/ID.RegisteRs) o (ID/EX.RegisteRt == IF/ID.RegisteRt))) stall the pipeline q The fist line tests to see if the instuction now in the EX stage is a lw; the next two lines check to see if the destination egiste of the lw matches eithe souce egiste of the instuction in the ID stage (the load-use instuction) q Afte this one cycle stall, the fowading logic can handle the emaining data hazads CEG3420 L07.48 Sping 2016
49 Adding the Hazad/Stall Hadwae (optional) PCSc Hazad Unit 0 ID/EX EX/MEM PC 4 Instuction Memoy Read Addess Add IF/ID Contol 1 Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.49 Sping 2016
50 Adding the Hazad/Stall Hadwae (optional) PCSc Hazad Unit 0 ID/EX ID/EX.MemRead EX/MEM PC 4 Instuction Memoy Read Addess Add IF/ID Contol 0 1 Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB ID/EX.RegisteRt Fowad Unit CEG3420 L07.50 Sping 2016
51 Contol Hazads q When the flow of instuction addesses is not sequential (i.e., PC = PC + 4); incued by change of flow instuctions Unconditional banches (j, jal, j) Conditional banches (beq, bne) Exceptions q Possible appoaches Stall (impacts CPI) Move decision point as ealy in the pipeline as possible, theeby educing the numbe of stall cycles Delay decision (equies compile suppot) Pedict and hope fo the best! q Contol hazads occu less fequently than data hazads, but thee is nothing as effective against contol hazads as fowading is fo data hazads CEG3420 L07.51 Sping 2016
52 Contol Hazads 1: Jumps Incu One Stall q Jumps not decoded until ID, so one flush is needed To flush, set IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tuning it into a nop) I n s t. j flush Fix jump hazad by waiting flush O d e j taget CEG3420 L07.52 Sping 2016 q Fotunately, jumps ae vey infequent only 3% of the SPECint instuction mix
53 Datapath Banch and Jump Hadwae Jump PCSc Shift left 2 ID/EX EX/MEM IF/ID Contol PC 4 Read Addess Add Instuction Memoy PC+4[31-28] Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.53 Sping 2016
54 Suppoting ID Stage Jumps Jump PCSc Shift left 2 ID/EX EX/MEM IF/ID Contol PC 4 Add Instuction Memoy Read 0 Addess PC+4[31-28] Read Add 1 Registe Read Read Add 2Data 1 File Wite Add Read Data 2 Wite Data 16 Sign 32 Extend Shift left 2 Add cntl Banch Addess Data Memoy Wite Data Read Data MEM/WB Fowad Unit CEG3420 L07.54 Sping 2016
55 Contol Hazads 2: Banch Inst q Dependencies backwad in time cause hazads I n s t. O d e beq lw Inst 3 Inst 4 CEG3420 L07.55 Sping 2016
56 One Way to Fix a Banch Contol Hazad I n s t. beq flush Fix banch hazad by waiting flush but affects CPI O d e flush flush beq taget Inst 3 IM Reg DM CEG3420 L07.56 Sping 2016
57 Anothe Way to Fix a Banch Contol Hazad q Move banch decision hadwae back to as ealy in the pipeline as possible i.e., duing the decode cycle I n s t. beq flush Fix banch hazad by waiting flush O d e beq taget Inst 3 IM Reg DM CEG3420 L07.57 Sping 2016
58 Two Types of Stalls q Nop instuction (o bubble) inseted between two instuctions in the pipeline (as done fo load-use situations) Keep the instuctions ealie in the pipeline (late in the code) fom pogessing down the pipeline fo a cycle ( bounce them in place with wite contol signals) Inset nop by zeoing contol bits in the pipeline egiste at the appopiate stage Let the instuctions late in the pipeline (ealie in the code) pogess nomally down the pipeline q Flushes (o instuction squashing) wee an instuction in the pipeline is eplaced with a nop instuction (as done fo instuctions located sequentially afte j instuctions) Zeo the contol bits fo the instuction to be flushed CEG3420 L07.58 Sping 2016
59 Reducing the Delay of Banches q Move the banch decision hadwae back to the EX stage Reduces the numbe of stall (flush) cycles to two Adds an and gate and a 2x1 mux to the EX timing path q Add hadwae to compute the banch taget addess and evaluate the banch decision to the ID stage Reduces the numbe of stall (flush) cycles to one (like with jumps) - But now need to add fowading hadwae in ID stage Computing banch taget addess can be done in paallel with RegFile ead (done fo all instuctions only used when needed) Compaing the egistes can t be done until afte RegFile ead, so compaing and updating the PC adds a mux, a compaato, and an and gate to the ID timing path q Fo deepe pipelines, banch decision points can be even late in the pipeline, incuing moe stalls CEG3420 L07.59 Sping 2016
60 ID Banch Fowading Issues q MEM/WB fowading is taken cae of by the nomal RegFile wite befoe ead opeation WB add3 $1, MEM add2 $3, EX add1 $4, ID beq $1,$2,Loop IF next_seq_inst q Need to fowad fom the EX/MEM pipeline stage to the ID compaison hadwae fo cases like WB add3 $3, MEM add2 $1, EX add1 $4, ID beq $1,$2,Loop IF next_seq_inst if (IDcontol.Banch and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == IF/ID.RegisteRs)) FowadC = 1 if (IDcontol.Banch and (EX/MEM.RegisteRd!= 0) and (EX/MEM.RegisteRd == IF/ID.RegisteRt)) FowadD = 1 Fowads the esult fom the second pevious inst. to eithe input of the compae CEG3420 L07.60 Sping 2016
61 ID Banch Fowading Issues, con t q If the instuction immediately befoe the banch poduces one of the banch souce opeands, then a stall needs to be inseted (between the WB add3 $3, MEM add2 $4, EX add1 $1, ID beq $1,$2,Loop IF next_seq_inst beq and add1) since the EX stage opeation is occuing at the same time as the ID stage banch compae opeation Bounce the beq (in ID) and next_seq_inst (in IF) in place (ID Hazad Unit deassets PC.Wite and IF/ID.Wite) Inset a stall between the add in the EX stage and the beq in the ID stage by zeoing the contol bits going into the ID/EX pipeline egiste (done by the ID Hazad Unit) q If the banch is found to be taken, then flush the instuction cuently in IF (IF.Flush) CEG3420 L07.61 Sping 2016
62 Suppoting ID Stage Banches (optional) PCSc Banch Hazad Unit 0 1 ID/EX EX/MEM IF/ID Contol 0 PC 4 Add Instuction Memoy Read 0 Addess IF.Flush Shift left 2 Read Add 1 RegFile Read Add 2 Read Data 1 Wite Add ReadData 2 Wite Data 16 Sign Extend Add 32 Compae cntl Data Memoy Read Data Addess Wite Data MEM/WB Fowad Unit Fowad Unit CEG3420 L07.62 Sping 2016
63 Delayed Banches q If the banch hadwae has been moved to the ID stage, then we can eliminate all banch stalls with delayed banches which ae defined as always executing the next sequential instuction afte the banch instuction the banch takes effect afte that next instuction MIPS compile moves an instuction to immediately afte the banch that is not affected by the banch (a safe instuction) theeby hiding the banch delay q With deepe pipelines, the banch delay gows equiing moe than one delay slot Delayed banches have lost populaity compaed to moe expensive but moe flexible (dynamic) hadwae banch pediction Gowth in available tansistos has made hadwae banch pediction elatively cheape CEG3420 L07.63 Sping 2016
64 Scheduling Banch Delay Slots A. Fom befoe banch B. Fom banch taget C. Fom fall though add $1,$2,$3 if $2=0 then delay slot sub $4,$5,$6 add $1,$2,$3 if $1=0 then delay slot q A is the best choice, fills delay slot and educes IC add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes becomes becomes add $1,$2,$3 if $2=0 then if $1=0 then add $1,$2,$3 add $1,$2,$3 if $1=0 then sub $4,$5,$6 sub $4,$5,$6 q In B and C, the sub instuction may need to be copied, inceasing IC q In B and C, must be okay to execute sub when banch fails CEG3420 L07.64 Sping 2016
65 Static Banch Pediction q Resolve banch hazads by assuming a given outcome and poceeding without waiting to see the actual banch outcome 1. Pedict not taken always pedict banches will not be taken, continue to fetch fom the sequential instuction steam, only when banch is taken does the pipeline stall If taken, flush instuctions afte the banch (ealie in the pipeline) - in IF, ID, and EX stages if banch logic in MEM thee stalls - In IF and ID stages if banch logic in EX two stalls - in IF stage if banch logic in ID one stall ensue that those flushed instuctions haven t changed the machine state automatic in the MIPS pipeline since machine state changing opeations ae at the tail end of the pipeline (MemWite (in MEM) o RegWite (in WB)) estat the pipeline at the banch destination CEG3420 L07.65 Sping 2016
66 Flushing with Mispediction (Not Taken) I n s t. 4 beq $1,$2,2 8 sub $4,$1,$5 O d e q To flush the IF stage instuction, asset IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tansfoming it into a nop) CEG3420 L07.66 Sping 2016
67 Flushing with Mispediction (Not Taken) I n s t. O d e 4 beq $1,$2,2 8 flush sub $4,$1,$5 16 and $6,$1,$7 20 o 8,$1,$9 q To flush the IF stage instuction, asset IF.Flush to zeo the instuction field of the IF/ID pipeline egiste (tansfoming it into a nop) CEG3420 L07.67 Sping 2016
68 Banching Stuctues q Pedict not taken woks well fo top of the loop banching stuctues Loop: beq $1,$2,Out But such loops have jumps at the bottom of the loop to etun to the top of the loop and incu the jump stall ovehead 1 nd loop inst... last loop inst j Loop Out: fall out inst q Pedict not taken doesn t wok well fo bottom of the loop banching stuctues Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst CEG3420 L07.68 Sping 2016
69 Static Banch Pediction, con t q Resolve banch hazads by assuming a given outcome and poceeding 2. Pedict taken pedict banches will always be taken Pedict taken always incus one stall cycle (if banch destination hadwae has been moved to the ID stage) Is thee a way to cache the addess of the banch taget instuction?? q As the banch penalty inceases (fo deepe pipelines), a simple static pediction scheme will hut pefomance. With moe hadwae, it is possible to ty to pedict banch behavio dynamically duing pogam execution 3. Dynamic banch pediction pedict banches at untime using un-time infomation CEG3420 L07.69 Sping 2016
70 Dynamic Banch Pediction q A banch pediction buffe (aka banch histoy table (BHT)) in the IF stage addessed by the lowe bits of the PC, contains bit(s) passed to the ID stage though the IF/ID pipeline egiste that tells whethe the banch was taken the last time it was execute Pediction bit may pedict incoectly (may be a wong pediction fo this banch this iteation o may be fom a diffeent banch with the same low ode PC bits) but the doesn t affect coectness, just pefomance - Banch decision occus in the ID stage afte detemining that the fetched instuction is a banch and checking the pediction bit(s) If the pediction is wong, flush the incoect instuction(s) in pipeline, estat the pipeline with the ight instuction, and invet the pediction bit(s) - A 4096 bit BHT vaies fom 1% mispediction (nasa7, tomcatv) to 18% (eqntott) CEG3420 L07.70 Sping 2016
71 Banch Taget Buffe q The BHT pedicts when a banch is taken, but does not tell whee its taken to! A banch taget buffe (BTB) in the IF stage caches the banch taget addess, but we also need to fetch the next sequential instuction. The pediction bit in IF/ID selects which next instuction will be loaded into IF/ID at the next clock edge - Would need a two ead pot instuction memoy O the BTB can cache the banch taken instuction while the instuction memoy is fetching the next sequential instuction PC BTB Instuction Memoy Read 0 Addess q If the pediction is coect, stalls can be avoided no matte which diection they go CEG3420 L07.71 Sping 2016
72 1-bit Pediction Accuacy q q A 1-bit pedicto will be incoect twice when not taken Assume pedict_bit = 0 to stat (indicating banch not taken) and loop contol is at the bottom of the loop code 1. Fist time though the loop, the pedicto mispedicts the banch since the banch is taken back to the top of the loop; invet pediction bit (pedict_bit = 1) 2. As long as banch is taken (looping), pediction is coect 3. Exiting the loop, the pedicto again mispedicts the banch since this time the banch is not taken falling out of the loop; invet pediction bit (pedict_bit = 0) Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst Fo 10 times though the loop we have a 80% pediction accuacy fo a banch that is taken 90% of the time CEG3420 L07.72 Sping 2016
73 2-bit Pedictos q A 2-bit scheme can give 90% accuacy since a pediction must be wong twice befoe the pediction bit is changed Taken Taken Pedict Taken Pedict Not Taken Not taken Taken Not taken Taken Pedict Taken Not taken Pedict Not Taken Not taken Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst CEG3420 L07.73 Sping 2016
74 2-bit Pedictos q A 2-bit scheme can give 90% accuacy since a pediction must be wong twice befoe the pediction bit is changed ight 9 times 1 Taken 0 Taken Pedict Taken 11 Pedict 01 Not Taken wong on loop fall out Not taken Taken ight on 1 st iteation Not taken Taken Pedict 10 Taken 1 Not taken 0 00Pedict Not Taken Not taken Loop: 1 st loop inst 2 nd loop inst... last loop inst bne $1,$2,Loop fall out inst q BHT also stoes the initial FSM state CEG3420 L07.74 Sping 2016
75 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CEG3420 L07.75 Sping 2016
76 Dealing with Exceptions q Exceptions (aka inteupts) ae just anothe fom of contol hazad. Exceptions aise fom R-type aithmetic oveflow Tying to execute an undefined instuction An I/O device equest An OS sevice equest (e.g., a page fault, TLB exception) A hadwae malfunction q The pipeline has to stop executing the offending instuction in midsteam, let all pio instuctions complete, flush all following instuctions, set a egiste to show the cause of the exception, save the addess of the offending instuction, and then jump to a peaanged addess (the addess of the exception handle code) q The softwae (OS) looks at the cause of the exception and deals with it CEG3420 L07.76 Sping 2016
77 Two Types of Exceptions q Inteupts asynchonous to pogam execution caused by extenal events may be handled between instuctions, so can let the instuctions cuently active in the pipeline complete befoe passing contol to the OS inteupt handle simply suspend and esume use pogam q Taps (Exception) synchonous to pogam execution caused by intenal events condition must be emedied by the tap handle fo that instuction, so much stop the offending instuction midsteam in the pipeline and pass contol to the OS tap handle the offending instuction may be etied (o simulated by the OS) and the pogam may continue o it may be aboted CEG3420 L07.77 Sping 2016
78 Whee in the Pipeline Exceptions Occu q Aithmetic oveflow q Undefined instuction q TLB o page fault q I/O sevice equest q Hadwae malfunction Stage(s)? Synchonous? CEG3420 L07.78 Sping 2016
79 Whee in the Pipeline Exceptions Occu q Aithmetic oveflow q Undefined instuction q TLB o page fault q I/O sevice equest q Hadwae malfunction Stage(s)? EX ID IF, MEM any any Synchonous? yes yes yes no no q Bewae that multiple exceptions can occu simultaneously in a single clock cycle CEG3420 L07.79 Sping 2016
80 Multiple Simultaneous Exceptions I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 q Hadwae sots the exceptions so that the ealiest instuction is the one inteupted fist CEG3420 L07.80 Sping 2016
81 Multiple Simultaneous Exceptions I n s t. O d e Inst 0 Inst 1 Inst 2 Inst 3 D$ page fault aithmetic oveflow undefined instuction Inst 4 I$ page fault q Hadwae sots the exceptions so that the ealiest instuction is the one inteupted fist CEG3420 L07.81 Sping 2016
82 Additions to MIPS to Handle Exceptions (optional) q Cause egiste (ecods exceptions) hadwae to ecod in Cause the exceptions and a signal to contol wites to it (CauseWite) q EPC egiste (ecods the addesses of the offending instuctions) hadwae to ecod in EPC the addess of the offending instuction and a signal to contol wites to it (EPCWite) Exception softwae must match exception to instuction q A way to load the PC with the addess of the exception handle Expand the PC input mux whee the new input is hadwied to the exception handle addess - (e.g., hex fo aithmetic oveflow) q A way to flush offending instuction and the ones that follow it CEG3420 L07.82 Sping 2016
83 Datapath with Contols fo Exceptions (optional) PC 4 Instuction Memoy Read 0 Addess hex Add IF.Flush PCSc IF/ID Hazad Unit Contol Shift left 2 Read Add 1 RegFile Read Add 2 Read Data 1 Wite Add ReadData 2 Wite Data 16 0 Sign Extend Fowad Unit Banch ID.Flush 1 0 Add 32 Compae ID/EX Cause EPC EX.Flush 0 0 cntl Fowad Unit EX/MEM Data Memoy Read Data Addess Wite Data MEM/WB CEG3420 L07.83 Sping 2016
84 Summay q All moden day pocessos use pipelining fo pefomance (a CPI of 1 and a fast CC) q Pipeline clock ate limited by slowest pipeline stage so designing a balanced pipeline is impotant q Must detect and esolve hazads Stuctual hazads esolved by designing the pipeline coectly Data hazads - Stall (impacts CPI) - Fowad (equies hadwae suppot) Contol hazads put the banch decision hadwae in as ealy a stage in the pipeline as possible - Stall (impacts CPI) - Delay decision (equies compile suppot) - Static and dynamic pediction (equies hadwae suppot) q Pipelining complicates exception handling CEG3420 L07.84 Sping 2016
CENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu
CENG 3420 Compute Oganization and Design Lectue 07: MIPS Pocesso - II Bei Yu CEG3420 L07.1 Sping 2016 Review: Instuction Citical Paths q Calculate cycle time assuming negligible delays (fo muxes, contol
More informationComputer Science 141 Computing Hardware
Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431
More informationChapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)
Chapte 4 (Pat III) The Pocesso: Datapath and Contol (Pipeline Hazads) 陳瑞奇 (J.C. Chen) 亞洲大學資訊工程學系 Adapted fom class notes by Pof. M.J. Iwin, PSU and Pof. D. Patteson, UCB 1 吃感冒藥副作用怎麼辦? http://big5.sznews.com/health/images/attachement/jpg/site3/20120319/001558d90b3310d0c1683e.jpg
More informationThe Processor: Improving Performance Data Hazards
The Pocesso: Impoving Pefomance Data Hazads Monday 12 Octobe 15 Many slides adapted fom: and Design, Patteson & Hennessy 5th Edition, 2014, MK and fom Pof. May Jane Iwin, PSU Summay Pevious Class Pipeline
More informationCOSC 6385 Computer Architecture. - Pipelining
COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,
More informationIntroduction To Pipelining. Chapter Pipelining1 1
Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20
Administivia CMSC 411 Compute Systems Achitectue Lectue 5 Basic Pipelining (cont.) Alan Sussman als@cs.umd.edu as@csu dedu Homewok poblems fo Unit 1 due today Homewok poblems fo Unit 3 posted soon CMSC
More informationECE331: Hardware Organization and Design
ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam
More informationCOEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines
1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep
More informationCENG 3420 Lecture 06: Pipeline
CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2019 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2
More informationCS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia
CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned
More informationLecture 8 Introduction to Pipelines Adapated from slides by David Patterson
Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.
More informationComputer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture
Compute Achitectue Pipelining and nstuction Level Paallelism An ntoduction Adapted fom COD2e by Hennessy & Patteson Slide 1 Outline of This Lectue ntoduction to the Concept of Pipelined Pocesso Pipelined
More informationUCB CS61C : Machine Structures
inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called
More informationCMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1
CMCS 611-101 Advanced Compute Achitectue Lectue 6 Intoduction to Pipelining Septembe 23, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Compute Achitectue 1 Pevious
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationLecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining
EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining
More informationCSE4201. Computer Architecture
CSE 4201 Compute Achitectue Pof. Mokhta Aboelaze Pats of these slides ae taken fom Notes by Pof. David Patteson at UCB Outline MIPS and instuction set Simple pipeline in MIPS Stuctual and data hazads Fowading
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism
Agenda CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuc>on Level Paallelism Instuctos: Randy H. Katz David A. PaJeson hjp://inst.eecs.bekeley.edu/~cs61c/fa10 Review Instuc>on Set Design
More informationUser Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)
PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,
More informationCS 2461: Computer Architecture 1 Program performance and High Performance Processors
Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks
More informationReview from last lecture
CSE820 Gaduate Compute Achitectue Week 3 Pefomance + Pipeline Review Based on slides by David Patteson Review fom last lectue Tacking and extapolating technology pat of achitect s esponsibility Expect
More informationLecture #22 Pipelining II, Cache I
inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationYou Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011
CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuction Level Paallelism: Multiple Instuction Issue Guest Lectue: Justin Hsia Softwae Paallel Requests Assigned to compute e.g., Seach Katz
More informationCS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue
CS 61C: Geat Ideas in Compute Achitectue Instuc(on Level Paallelism: Mul(ple Instuc(on Issue Instuctos: Kste Asanovic, Randy H. Katz hbp://inst.eecs.bekeley.edu/~cs61c/fa12 1 Paallel Requests Assigned
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationCENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu
CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified
More informationReview: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade
EECS 252 Gaduate Compute Achitectue Lectue 2 ℵ 0 Review of Instuction Sets, Pipelines, and Caches Januay 26 th, 2009 Review Mooe s Law John Kubiatowicz Electical Engineeing and Compute Sciences Univesity
More informationCPE 335 Computer Organization. Basic MIPS Pipelining Part I
CPE 335 Computer Organization Basic MIPS Pipelining Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Pipelining
More informationAny modern computer system will incorporate (at least) two levels of storage:
1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe
More informationLecture 9 Pipeline and Cache
Lecture 9 Pipeline and Cache Peng Liu liupeng@zju.edu.cn 1 What makes it easy Pipelining Review all instructions are the same length just a few instruction formats memory operands appear only in loads
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationThe Processor: Improving the performance - Control Hazards
The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationOverview of Control. CS 152 Computer Architecture and Engineering Lecture 11. Multicycle Controller Design
S 152 ompute chitectue and Engineeing Lectue 11 Multicycle ontolle Design Oveview of ontol ontol may be designed using one of seveal initial epesentations. The choice of sequence contol, and how logic
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationA Memory Efficient Array Architecture for Real-Time Motion Estimation
A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationAccelerating Storage with RDMA Max Gurtovoy Mellanox Technologies
Acceleating Stoage with RDMA Max Gutovoy Mellanox Technologies 2018 Stoage Develope Confeence EMEA. Mellanox Technologies. All Rights Reseved. 1 What is RDMA? Remote Diect Memoy Access - povides the ability
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationCPE 335. Basic MIPS Architecture Part II
CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture
More informationCENG 3420 Lecture 06: Datapath
CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationIntroduction to Pipelined Datapath
14:332:331 Computer Architecture and Assembly Language Week 12 Introduction to Pipelined Datapath [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 W12.1 Review:
More informationPipelining: Hazards Ver. Jan 14, 2014
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationProcessor (II) - pipelining. Hwansoo Han
Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationDesign a MIPS Processor (2/2)
93-2Digital System Design Design a MIPS Processor (2/2) Lecturer: Chihhao Chao Advisor: Prof. An-Yeu Wu 2005/5/13 Friday ACCESS IC LABORTORY Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath
More informationECS 154B Computer Architecture II Spring 2009
ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20
More informationChapter 4 (Part II) Sequential Laundry
Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationPre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly
332 Advanced Compute Achitectue Chapte 1 Intoduction and eview of Pipelines, Pefomance, Caches, and Vitual Januay 2009 Paul H J Kelly These lectue notes ae patly based on the couse text, Hennessy and Patteson
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationELE 655 Microprocessor System Design
ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half
More informationGetting Started PMW-EX1/PMW-EX3. 1 Rotate the grip with the RELEASE button pressed. Overview. Connecting the Computer and PMW-EX1/EX3
A PMW-EX1/PMW-EX3 Getting Stated Oveview This document descibes how to use the XDCAM EX Vesion Up Tool (heeafte Vesion Up Tool ) to upgade the PMW-EX1 and PMW-EX3 to vesion 1.20 (PMW-EX1) o vesion 1.10
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationControl Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.
Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationPipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!
Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!
More informationLecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation
Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan
More information