Computer Architecture. Pipelining and Instruction Level Parallelism An Introduction. Outline of This Lecture
|
|
- Barnaby Cook
- 6 years ago
- Views:
Transcription
1 Compute Achitectue Pipelining and nstuction Level Paallelism An ntoduction Adapted fom COD2e by Hennessy & Patteson Slide 1 Outline of This Lectue ntoduction to the Concept of Pipelined Pocesso Pipelined Datapath and Pipelined Contol Pipeline Example: nstuctions nteaction Pipeline Hazads Fowading Stalls ntoduction to nstuction Level Paallelism Supescala, VLW Out-of-ode execution Banch Pediction Futue Chapte 6 - Pipelining Basics Slide 2
2 The Five Stages of Load F: nstuction Fetch Fetch the instuction fom the nstuction Memoy RF/D: Registes Fetch and nstuction Decode EX: Calculate the memoy addess MEM: Read the data fom the Data Memoy WB: Wite the data back to the egiste file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load F RF/D EX MEM WB Chapte 6 - Pipelining Basics Slide 3 Key deas Behind Pipelining Analogy Gading the mid tem exams: 6 poblems, six people gading the exam Each peson gades ONE poblem Pass exam to next peson as soon as one finishes he pat Assume each poblem takes 0.15 hou to gade Each individual exam still takes 0.9 hous to gade But with 6 people, all exams can be gaded much quicke: 100 exams: 90 hous, vs. 90 hs x 6 = 540 hous The load instuction has 5 stages: Five independent functional units to wok on each stage Each functional unit is used only once Anothe load can stat as soon as 1st finishes its F stage Each load still takes five cycles to complete The thoughput, howeve, is much highe Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 4
3 Pipelining the Load nstuction Five independent functional units in pipeline ae: nstuction Memoy fo the F stage Registe file s ead pots fo the RF/D stage fo the EX stage Data Memoy fo the MEM stage Registe File s Wite pot (bus W) fo the WB stage 1 instuction entes the pipeline evey cycle Clock 1 instuction comes out of pipeline (completes) evey cycle Effective Cycles pe nstuction (CP) is 1 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 1st lw F RF/D EX MEM WB 2nd lw F RF/D EX MEM WB 3d lw F RF/D EX MEM WB Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 5 Fou Stages of F: nstuction Fetch Fetch the instuction fom the nstuction Memoy RF/D: Registes Fetch and nstuction Decode EX: opeates on the two egiste opeands WB: Wite the output back to the egiste file Cycle 1 Cycle 2 Cycle 3 Cycle 4 F RF/D EX WB Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 6
4 Pipelining + Load We have a poblem: Two instuctions ty to wite to egiste file at same time! Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock F RF/D EX WB Ops! We have a poblem! F RF/D EX WB Load F RF/D EX MEM WB F RF/D EX WB F RF/D EX WB Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 7 mpotant Obsevation A functional unit can be used once pe instuction Each functional unit must be used at same stage fo all instuctions: Load uses Registe File s Wite Pot duing its 5th stage Load F RF/D EX MEM WB uses Registe File s Wite Pot duing its 4th stage F RF/D EX WB Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 8
5 Solution: Delay WB a Cycle Delay s egiste wite by one cycle: instuctions also use Reg File s wite pot at Stage 5 MEM stage is a NOOP stage: nothing is being done F RF/D EX MEM WB Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 F RF/D MEM EX WB F RF/D MEM EX WB Load F RF/D EX MEM WB F RF/D MEM EX WB F RF/D MEM EX WB Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 9 A Pipelined Datapath Clk F RF/D EX MEM WB RegW ExtOp Op Banch PC 1 0 PC+4 A Unit F/D Registe PC+4 mm16 Rs Ra Rb Rt RFile Rt Rw Di Rd D/Ex Registe 0 1 PC+4 mm16 busa busb EX Unit Ex/MEM Registe Zeo Data ME RAM Do WA Di MEM/WB Registe 1 Mux 0 RegDst Sc MemW MemtoReg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 10
6 How About Contol Signals? Contol Signals at Stage N = Func (nst. at Stage N) N = EX, MEM, o WB Example: Contols Signals at EX Stage Func(Load s EX) F RF/D EX MEM WB Op=Add RegW ExtOp=1 Banch PC 1 0 PC+4 A Unit F/D: PC+4 mm16 Rs Ra Rb Rt RFile Rt Rw Di Rd D/Ex Registe 0 1 PC+4 mm16 busa busb EX Unit Ex/MEM: Load s Addess Zeo Data ME RAM Do WA Di MEM/WB Registe 1 Mux 0 RegDst=0 Sc=1 MemW MemtoReg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 11 Pipeline Contol The Main Contol geneates the contol signals duing RF/D Contol signals fo EX (ExtOp, Sc,...) used 1 cycle late Contol signals fo MEM (MemW, Banch) used 2 cycles late Contol signals fo WB (MemtoReg MemW) used 3 cycles late RF/D EX MEM WB F/D Registe Main Contol ExtOp Sc Op RegDst MemW Banch MemtoReg RegW D/Ex Registe ExtOp Sc Op RegDst MemW Banch MemtoReg RegW Ex/MEM Registe MemW Banch MemtoReg RegW MEM/WB Registe MemtoReg RegW Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 12
7 Single Cycle, Multi-Cycle, Pipelined Clk Cycle 1 Cycle 2 Single Cycle mplementation: Load Stoe Waste Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Multiple Cycle mplementation: Load F Reg EX MEM WB Stoe F Reg EX MEM F Pipeline mplementation: Load F Reg EX MEM WB Stoe F Reg EX MEM WB F Reg EX MEM WB Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 13 Hazads Challenge to Pipelining Limits to pipelining: Hazads pevent next instuction fom executing duing its designated clock cycle stuctual hazads: HW cannot suppot this combination of instuctions ealie case of load and R-typ like a stuctual hazad, but nomally cannot fix by etiming instuction. data hazads: instuction depends on esult of pio instuction still in the pipeline contol hazads: pipelining of banches & othe instuctionscommon solution is to stall the late pat of the pipeline until the hazad pipeline Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 14
8 Data Hazad on 1 Dependencies backwads in time ae hazads n s t. O d e Time (clock cycles) F D/RF EX MEM WB add 1,2,3 sub 4,1,3 and 6,1,7 o 8,1,9 xo 10,1,11 m Reg Dm Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 15 HW Stalls to Resolve Hazad Dependencies backwads in time ae hazads eliminate evese time by a stall n s t. O d e Time (clock cycles) F D/RF EX MEM WB add 1,2,3 sub 4, 1,3 and 6,1,7 o 8,1,9 xo 10,1,11 m bubble bubble bubble Reg Dm Reg m Reg Dm m Reg m Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 16
9 nsight: Data is available! n s t. O d e Pipeline egistes aleady contain needed data Fowad the data to the appopiate unit Time (clock cycles) F D/RF EX MEM WB add 1,2,3 sub 4,1,3 and 6,1,7 o 8,1,9 xo 10,1,11 m Reg Dm Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 17 HW fo Fowading (Bypassing) ncease multiplexos to add paths fom egistes Assumes egiste ead duing wite gets new value (othewise moe esults to be fowaded) Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 18
10 Fowading Cannot Hide All Hazads n s t. O d e Time (clock cycles) F D/RF EX MEM WB lw 1, 0(2) sub 4,1,6 and 6,1,7 o 8,1,9 m Reg Dm Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 19 Option: HW Stalls to Resolve Hazad ntelock : checks fo hazad & stalls n s t. O d e Time (clock cycles) F D/RF EX MEM WB lw 1, 0(2) stall sub 4,1,3 and 6,1,7 o 8,1,9 m bubble bubble bubble bubble m Reg Dm Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 20
11 Option: SW esolves hazad n s t. O d e SW insets independent instuctions Wost case: pefomance no bette/wose Time (clock cycles) F D/RF EX MEM WB lw 1, 0(2) unelated instuction sub 4,1,3 and 6,1,7 o 8,1,9 m Reg Dm Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 21 Contol Hazad on Banches Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 22
12 Hazads on Banches Time (clock cycles) F D/RF EX MEM WB beq 1,2,L sub 4,1,3 and 6,2,7 o 8,7,9 L: add 1,2,1 m Reg Dm Reg Stall fo two cycles on evey banch! Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 23 CP mpact: Banch Stall mpact f CP = 1, 30% banch, Stall 2 cycles => new CP = 1.6! Reducing the banch penalty MPS banch aleady moe aggessive than most limited eq/neq allows us to detemine banch condition ealy (afte EX), athe than late (e.g., afte MEM) doing bette use sepaate compaato athe than and move banch decision to RF (had!!!) educes penalty to 1 cycle Going futhe Vaiety of techniques: sepaating banch and destination sepaating banch condition and banch decision hadwae pediction of banche Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 24
13 When is pipelining had? nteupts: 5 instuctions executing in 5 stage pipeline How to stop the pipeline? Restat? Who caused the inteupt? Stage Poblem inteupts occuing F Page fault on instuction fetch; misaligned memoy access; memoy-potection violation D Undefined o illegal opcode EX Aithmetic inteupt MEM Page fault on data fetch; misaligned memoy access; memoy-potection violation Load with data page fault, Add with instuction page fault? Solution 1: inteupt vecto/instuction, estat eveything incomplete Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 25 Fist Geneation RSC Pipelines All instuctions: 1 pipeline ode ( static schedule ). Registe wite in last stage + eads pefomed in fist stage afte issue. Simpliy/eliminate hazads Memoy access in stage 4 Avoid all memoy hazads Contol hazads use delayed banch (with fast path) RAW hazads use bypass, except on load esults Load esolved by delayed load o stall Good pipeline pefomance at little cost/complexity. Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 26
14 Summay of Pipelining Basics Speed Up = Pipeline Depth Hazads limit pefomance on computes: stuctual: need moe HW esouces data: need fowading, compile scheduling contol: ealy evaluation & PC, delayed banch, pediction nceasing length of pipe inceases hazads since pipelining helps instuction bandwidth, not latency Compiles can educe cost of data & contol hazads load delay slots banch delay slots Exceptions (also FP, SA) make pipelining hade Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 27 Advanced Pipelining Pipelining exploits paallelism among instuctions by ovelapping them Called nstuction Level Paallelism (LP) Limited by a vaiety of things: paallelism in the pogam compile technology in exposing paallelism functional unit capability: how many ovlapping instuctions ability of hadwae to find instuctions to un in paallel Exploiting LP is hot topic in pocesso design: Lots of diffeent appoaches Multiple instuctions/cycle compile vs. HW fo scheduling instuctions Both achitectue appoaches and compile appoaches Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 28
15 Technique Pipelining Supe-scala Exploiting Available LP ssue multiple scala instuctions pe cycle VLW Each instuction specifies multiple scala opeations F D Ex M W F D Ex M W F D Ex M W F D Ex M W F D Ex M W F D Ex M W F D Ex M W F D Ex M W F D Ex M W Ex M W Ex M W Ex M W HW Limitation ssue ate, FU stalls, FU depth Hazad esolution Packing Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 29 Easy Supescala -Cache nt Reg nst ssue and Bypass FP Reg nt Unit Load / Stoe Unit FP Add FP Mul D-Cache ssue intege and FP opeations in paallel! potential hazads? expected speedup? what combinations of instuctions make sense? Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 30
16 ssuing Multiple nstuction/ Cycle Supescala: 2 instuctions, 1 FP & 1 anything else Fetch 64-bits/clock cycle; nt on left, FP on ight Can only issue 2nd instuction if 1st instuction issues Moe pots fo FP egistes to do FP load & FP op in a pai Type Pipe Stages nt. instuction F D EX MEM W FP instuction F D EX MEM WB nt. instuction F D EX MEM WB FP instuction F D EX MEM WB nt. instuction F D EX MEM WB FP instuction F D EX MEM WB 1 cycle load delay expands to 3 instuction in SS instuction in ight half can t use esult, no can eithe instuction in next slot Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 31 Dynamic Banch Pediction Pedict diection of banches on past behavio keep a cache of banch behavio, look up pediction Pefomance = f(accuacy, cost of mispediction) Banch pediction buffe: lowe bits of PC addess index table of 1-bit values says whethe o not banch taken last time evaluate actual banch condition, if pediction incoect: ecove by flushing pipeline, estating fetch eset pediction Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 32
17 Speculative Supescala Execution Get all available paallelism acoss banches in face cache misses limited only by data dependences Goal: esouces and available bandwidth ae only HW limit Banch pediction execute instuctions speculatively Hazad detection and aggessive esolution out-of-ode execution (dynamic scheduling) in-ode completion Exception handling easie handles incoect speculation nstuction Fetch Decode nstuction Window Execution Units look ahead and pefetch instuctions ssue multiple instuctions to Execution Units when inputs ae available Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 33 Vaiety of Moden Micopocesso Pocesso nstuction Completion Rate Scheduling of pipeline Banch pediction PowePC Dynamic, nonspeculative HW MPS R Dynamic, speculative HW Pentium 4 Dynamic, nonspeculative HW UltaSPARC 4 Static HW Meced? Static? Static? Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 34
18 Limits to Multi-ssue Machines nheent limitations of LP 1 banch in 5 => 5-way VLW busy? Latencies of units => many opeations must be scheduled Need about Pipeline Depth x No. Functional Units of independentdifficulties in building HW Duplicate FUs to get paallel execution ncease pots to Registe File (3 x intege/fp ate) ncease pots to memoy Decoding challenge and impact on clock ate, pipeline depth Limitations specific to eithe SS o VLW implementation Decode issue in SS VLW code size: unoll loops + wasted fields in VLW VLW lock step => 1 hazad & all instuctions stall VLW & binay compatibility Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 35 Summay nstuction Level Paallelism in SW o HW Loop level paallelism is easiest to see SW dependencies/compile sophistication detemine if compile can unoll loops SW Scheduling HW scheduling Banch Pediction SupeScala and VLW CP < 1 Dynamic issue vs. Static issue Moe instuctions issue/clock, lage penalty of hazads Futue? Stay tuned Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 36
19 Single Memoy=>Stuctual Hazad Time (clock cycles) n s t. O d e Load nst 1 nst 2 nst 3 nst 4 MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 37 Stall to esolve Stuctual Hazad Time (clock cycles) n s t. O d e Load nst 1 nst 2 nst 3(stall) nst 4 MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg MEM Reg bubble MEM Reg MEM Reg MEM Reg MEM Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 38
20 Duplicate to Resolve Hazad Sepaate nstuction Cache (m) & Data Cache (Dm) Time (clock cycles) n s t. O d e Load nst 1 nst 2 nst 3 nst 4 m Reg Dm Reg Adapted fom COD2e by Hennessy & Patteson Chapte 6 - Pipelining Basics Slide 39
The Processor: Improving Performance Data Hazards
The Pocesso: Impoving Pefomance Data Hazads Monday 12 Octobe 15 Many slides adapted fom: and Design, Patteson & Hennessy 5th Edition, 2014, MK and fom Pof. May Jane Iwin, PSU Summay Pevious Class Pipeline
More informationCOEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines
1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep
More informationCOSC 6385 Computer Architecture. - Pipelining
COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped
More informationIntroduction To Pipelining. Chapter Pipelining1 1
Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?
More informationLecture 8 Introduction to Pipelines Adapated from slides by David Patterson
Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Gaduate Compute Achitectue Lectue 6 - Hazads Michela Taufe http://www.cis.udel.edu/~taufe/teaching/cis662f07 Powepoint Lectue Notes fom John Hennessy and David Patteson s: Compute Achitectue,
More informationComputer Science 141 Computing Hardware
Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431
More informationChapter 4 (Part III) The Processor: Datapath and Control (Pipeline Hazards)
Chapte 4 (Pat III) The Pocesso: Datapath and Contol (Pipeline Hazads) 陳瑞奇 (J.C. Chen) 亞洲大學資訊工程學系 Adapted fom class notes by Pof. M.J. Iwin, PSU and Pof. D. Patteson, UCB 1 吃感冒藥副作用怎麼辦? http://big5.sznews.com/health/images/attachement/jpg/site3/20120319/001558d90b3310d0c1683e.jpg
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 5. Data Hazard Even with Forwarding Figure A.9, Page A-20
Administivia CMSC 411 Compute Systems Achitectue Lectue 5 Basic Pipelining (cont.) Alan Sussman als@cs.umd.edu as@csu dedu Homewok poblems fo Unit 1 due today Homewok poblems fo Unit 3 posted soon CMSC
More informationECE331: Hardware Organization and Design
ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam
More informationUCB CS61C : Machine Structures
inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called
More informationCS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia
CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned
More informationCENG 3420 Computer Organization and Design. Lecture 07: MIPS Processor - II. Bei Yu
CENG 3420 Compute Oganization and Design Lectue 07: MIPS Pocesso - II Bei Yu CEG3420 L07.1 Sping 2016 Review: Instuction Citical Paths q Calculate cycle time assuming negligible delays (fo muxes, contol
More informationCMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1
CMCS 611-101 Advanced Compute Achitectue Lectue 6 Intoduction to Pipelining Septembe 23, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Compute Achitectue 1 Pevious
More informationCENG 3420 Lecture 07: Pipeline
CENG 3420 Lectue 07: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L07.1 Sping 2017 Outline q Review: Flip-Flop Contol Signals q Pipeline Motivations q Pipeline Hazads q Exceptions CENG3420 L07.2 Sping
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc>on Level Parallelism
Agenda CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuc>on Level Paallelism Instuctos: Randy H. Katz David A. PaJeson hjp://inst.eecs.bekeley.edu/~cs61c/fa10 Review Instuc>on Set Design
More informationLecture #22 Pipelining II, Cache I
inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html
More informationCSE4201. Computer Architecture
CSE 4201 Compute Achitectue Pof. Mokhta Aboelaze Pats of these slides ae taken fom Notes by Pof. David Patteson at UCB Outline MIPS and instuction set Simple pipeline in MIPS Stuctual and data hazads Fowading
More informationYou Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011
CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuction Level Paallelism: Multiple Instuction Issue Guest Lectue: Justin Hsia Softwae Paallel Requests Assigned to compute e.g., Seach Katz
More informationUser Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)
PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,
More informationCS 61C: Great Ideas in Computer Architecture Instruc(on Level Parallelism: Mul(ple Instruc(on Issue
CS 61C: Geat Ideas in Compute Achitectue Instuc(on Level Paallelism: Mul(ple Instuc(on Issue Instuctos: Kste Asanovic, Randy H. Katz hbp://inst.eecs.bekeley.edu/~cs61c/fa12 1 Paallel Requests Assigned
More informationLecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining
EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining
More informationReview from last lecture
CSE820 Gaduate Compute Achitectue Week 3 Pefomance + Pipeline Review Based on slides by David Patteson Review fom last lectue Tacking and extapolating technology pat of achitect s esponsibility Expect
More informationCS 2461: Computer Architecture 1 Program performance and High Performance Processors
Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks
More informationReview: Moore s Law. EECS 252 Graduate Computer Architecture Lecture 2. Review: Joy s Law in ManyCore world. Bell s Law new class per decade
EECS 252 Gaduate Compute Achitectue Lectue 2 ℵ 0 Review of Instuction Sets, Pipelines, and Caches Januay 26 th, 2009 Review Mooe s Law John Kubiatowicz Electical Engineeing and Compute Sciences Univesity
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationPipeline design. Mehran Rezaei
Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationPre-requisites. This is a textbook-based course. Chapter 1. Pipelines, Performance, Caches, and Virtual Memory. January 2009 Paul H J Kelly
332 Advanced Compute Achitectue Chapte 1 Intoduction and eview of Pipelines, Pefomance, Caches, and Vitual Januay 2009 Paul H J Kelly These lectue notes ae patly based on the couse text, Hennessy and Patteson
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21 Andy Carle CS 61C L19 Pipelining II (1) Review: Datapath for MIPS PC instruction memory rd rs rt registers
More informationECE4680 Computer Organization and Architecture. Designing a Pipeline Processor
ECE468 Computer Organization and Architecture Designing a Pipeline Processor Pipelined processors overlap instructions in time on common execution resources. ECE468 Pipeline. 22-4-3 Start X:4 Branch Jump
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationFinal Exam Spring 2017
COE 3 / ICS 233 Computer Organization Final Exam Spring 27 Friday, May 9, 27 7:3 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of Petroleum & Minerals
More informationCS420/520 Homework Assignment: Pipelining
CS42/52 Homework Assignment: Pipelining Total: points. 6.2 []: Using a drawing similar to the Figure 6.8 below, show the forwarding paths needed to execute the following three instructions: Add $2, $3,
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationELE 655 Microprocessor System Design
ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half
More informationECE154A Introduction to Computer Architecture. Homework 4 solution
ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationECS 154B Computer Architecture II Spring 2009
ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationCSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content
3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationFull Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI
CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationCpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath
CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CPE 442 single-cycle datapath.1 Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture?
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationECE473 Computer Architecture and Organization. Pipeline: Control Hazard
Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction
More informationModule 6 STILL IMAGE COMPRESSION STANDARDS
Module 6 STILL IMAE COMPRESSION STANDARDS Lesson 17 JPE-2000 Achitectue and Featues Instuctional Objectives At the end of this lesson, the students should be able to: 1. State the shotcomings of JPE standad.
More informationPipelined Processor Design
Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationDesign a MIPS Processor (2/2)
93-2Digital System Design Design a MIPS Processor (2/2) Lecturer: Chihhao Chao Advisor: Prof. An-Yeu Wu 2005/5/13 Friday ACCESS IC LABORTORY Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath
More informationTHE THETA BLOCKCHAIN
THE THETA BLOCKCHAIN Theta is a decentalized video steaming netwok, poweed by a new blockchain and token. By Theta Labs, Inc. Last Updated: Nov 21, 2017 esion 1.0 1 OUTLINE Motivation Reputation Dependent
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationCS 61C: Great Ideas in Computer Architecture Control and Pipelining
CS 6C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs6c/sp6 Datapath Control Signals ExtOp: zero, sign
More informationECE260: Fundamentals of Computer Engineering
ECE260: Fundamentals of Computer Engineering Pipelined Datapath and Control James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania ECE260: Fundamentals of Computer Engineering
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationAccelerating Storage with RDMA Max Gurtovoy Mellanox Technologies
Acceleating Stoage with RDMA Max Gutovoy Mellanox Technologies 2018 Stoage Develope Confeence EMEA. Mellanox Technologies. All Rights Reseved. 1 What is RDMA? Remote Diect Memoy Access - povides the ability
More informationWorking on the Pipeline
Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More information361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath
361 datapath.1 Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath Outline of Today s Lecture Introduction Where are we with respect to the BIG picture? Questions and Administrative
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationPipelined Processor Design
Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationCS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)
CS335B Computer Architecture Winter 25 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza www.csd.uwo.ca/courses/cs335b [Adapted from lectures on Computer Organization and Design,
More informationCMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions
CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked
More informationDYNAMIC STORAGE ALLOCATION. Hanan Samet
ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #19 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker CS61C L19 CPU Design : Designing a Single-Cycle CPU
More informationECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath
ECE468 Computer Organization and Architecture Designing a Single Cycle Datapath ECE468 datapath1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationDLX Unpipelined Implementation
LECTURE - 06 DLX Unpipelined Implementation Five cycles: IF, ID, EX, MEM, WB Branch and store instructions: 4 cycles only What is the CPI? F branch 0.12, F store 0.05 CPI0.1740.83550.174.83 Further reduction
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More information