Topics. Pipelining Lessons (Task, Resource) Pipelining Lessons (Time cycles) Pipelining is Natural! Laundry Example
|
|
- Steven Harrington
- 5 years ago
- Views:
Transcription
1 Lecture : Pipelining opics Introduction to pipelining Perfmance Pipelined datapath Design issues Hazards in pipeline ypes F a program with billion instructions, Execution ime =? Pipelining Lessons (ask, Resource) A way to parallelize: a load consists of multiple -tasks operating simultaneously using different resources (hardware equipments) Pipelining in parallel: a load is processed over multiple steps (called pipeline stages segments ) using cresponding tools in parallel Pipelining doesn t help latency of single task; it helps throughput of entire wkloads Potential speedup = Number of pipe-stages (Pipeline depth) Ideal could be: 8hrs/hrs= ime to fill pipeline time to drain reduce speedup: Speedup in this example: 8hrs/3.5hrs=.3 {Wash, Dry, Fold, Stash} 5 Pipelining is Natural! Laundry Example Pipelining Lessons (ime cycles) Use case scenario Ann, Brian, Cathy, Dave each has one load of clothes to wash, dry, fold, put away Load parameters Step Device Cycle Wash Washer 3 min Dry Dryer 3 min 3 Fold Folder 3 min Put away Stasher (put clothes into drawers) Efficient ganisation? 3 min wash Washer dry put away Dryer fold Folder Stasher 3 min a s k O r d e r A B C D ime 6 PM Suppose new Folder takes minutes, new Stasher takes minutes. How much faster is pipeline? /* see the last slide f answer */ Pipeline rate is limited by the slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup 6 Sequential Laundry Steps in Executing MIPS instruction instruction stages hardware resources Sequential laundry takes 8 hours f loads 3 ) IF: uction (PC -> memy) Hardware: uction () ) ID: uction (isters) Hardware: ister [read] () 3) EX: (Perfm Operation) Hardware: ) MEM: Access (f data) Hardware: () Load: Read from Ste: to IF ID EX 5) WB: Back ( to ister) Hardware: ister [write] () WB o simplify pipeline, every instruction takes the same pipeline depth (even if not needed; unused stages are NOP.); every stage has the same length. 7 Pipelined Laundry: Start wk ASAP Pipelined Execution Representation Single-Cycle Sequential uction uction uction Read / Read/ uction Pipelined Read/ Read / So the latency is 5 ps; ime between instructions is 5 ps ime (ps) /* see the Revision quiz slide f answer */ 3 uction Read/ Pipelined laundry takes 3.5 hours f loads! So the latency is ps; ime between instructions is ps Pipeline stages not balanced perfectly: instructions take the same pipeline depth; some instructions have NOP stages; every stage has the same length as the longest. 8
2 5: :6 :6 5: 5: 5: :6 :6 5: 5: Pipelining Abstraction instruction stages hardware resources Pipeline Hazards (conflicts) the nd half of the st half of the cycle the cycle lw $s, ($) lw $ $s add $s3, $t, $t $s, $s, $s5 $s5, $t5, $t6 sw $s6, ($s) add $t $t $s $s5 $s3 - $t5 $t6 sw $s $s $s5 $s6 ime (cycles) : uction : ister : Limits to pipelining: hazards prevent next instruction from executing during its designated clock cycle [intermediate input] Structural hazards [decision making] uction depends on the result of pri instruction that is still in the pipeline Attempt to use the same resource in two different ways at the same time; hardware cannot suppt a particular combination of instructions. Pipelining of branches other control instructions stall the pipeline until the next value of PC is known $s7,, $t $t $s7 o simplify pipeline, every instruction takes the same pipeline depth (even if not needed; unused stages are NOP.); every stage has the same length. 3 Pipelined path Single is a Structural Hazard PC' PC In s tru c tio n W E 3 A RD A RD ister W D 3 SrcA SrcB Zero Result W rite D a ta W E W D Read W rite R e g : PCPlus SignImm << PCBranch Pipeline registers Result O utw PC' PCF In strd In s tru c tio n W E 3 A RD A RD ister W D 3 SrcBE W ritee W ritee : W E OutM W ritem W D ReadW SignIm m E << PCPlusF PCPlusD PCPlusE Mem y W riteback State transition: need f propagating state at the end of every stage to push the instruction through the pipeline. Pipeline registers: add registers between every two stages (used as state elements) Each instruction is fetched from memy 3% of instructions are load ste Not possible to access same memy twice in same clock pulse PC' PCF PCPlusF Pipeline registers IF/ID ID/EX EX/MEM MEM/WB D Ins truc tio n WE3 5: A RD :6 A RD ister WD3 :6 5: SignImmE << 5: PCPlusD PCPlusE SrcBE W ritee E : OutW WE OutM ReadW W ritem WD M : W ritew : W riteback Pipeline registers separate pipeline stages Names of registers: names of two stages they separate IF/ID, ID/EX, EX/MEM, MEM/WB. Why no register WB/IF? Wide enough to ste all data ( control signals) passed between stages Structural Hazards Solutions Example: Each instruction is fetched from memy = memy access 3% of instructions are load ste = memy access otal no. of accesses expected per cycle = otal no. of accesses allowed per cycle = no duplicated rsc Solution: level cache is split into: instruction cache: I-Cache, each instruction needs one read from I-cache data cache: D-Cache 3% instructions need access to D-cache we can maintain CPI of /* see the Revision quiz slide f answer */ 5 Problems f Pipelining A B C D Limits to pipelining: hazards prevent next instruction from executing during its designated clock cycle $s add $s $s $s $s $s hazard: B, C, D depends on the result of A. Possible solutions to data hazard : Static Insert s in code at compile time Rearrange code at compile time $s $t $s $s5 $t - Dynamic $t ime (cycles) Fward data at run time Stall the process/pipeline at run time Later instructions use the results of the earlier instructions (attempt to use an item befe it is ready) Solution: Compile-ime Hazard Elimination Insert enough s f result to be ready Or find other instructions which will logically fit between add sequent instructions (move independent instructions fward) $s add $s $s $s $s $s $s $t $s $s5 ime (cycles) $t - $t 6
3 3:6 5: 5: :6 5: :6 5: 5: in : RAW, WAR Solution: Dynamic Hazard Elimination $s add $s3 fwarding: the result can be passed directly to the functional unit which needs it $s $s $s $s $s $t $s $s5 $t - Issue: he sequent instructions need this value $s earlier ADD gets value of $s after which stage? $t ime (cycles) AND needs befe its stage ; OR in stage ; SUB in stage /* see the Revision quiz slide f answer */ RAW: may not complete befe read is attempted his is the most common type of data hazard It is created by the logic of the program Obviously a value must be computed befe it is used. Fwarding can be used to overcome this hazard WAR: the location is overwritten befe read is completed his type of data hazard is rare (Antidependences) his can only occur if results are written early in the pipeline opers are read late in the pipeline. (only complex addressing modes require opers to be read late in the pipeline.) ister renaming can be used to overcome this hazard A large no of architecture registers reduces name dependencies. 7 Fwarding (rd) (rs) (rt) SrcA SrcB Fward result to EX stage of sequent instructions from either: PC' stage back stage of ADD PCF A RD uction PCPlusF D Control Unit Op Funct A A WD3 PCPlusD Sign Extend WE3 W rited MemtoD MemD ControlD : SrcD DstD BranchD ister RD RD SignImmD RsD RtD RdD W ritee M W ritew MemtoE MemtoM MemtoW MemE MemM ControlE : SrcE DstE PCSrcM BranchE BranchM WE OutM ReadW A RD SrcBE W ritee W ritem WD RsE OutW W ritee : M : W ritew : Hazard Unit FwardAE FwardBE SignImmE << PCPlusE M W 8 uctions in the pipeline are fetched in der of the program his flow would change when there is a branch, a procedure call Branch penalty: he pipeline has to stall until the address of the next instruction is determined Resolve the condition of the branch conditional branch only Calculate branch address conditional branch unconditional branch return from procedure In our pipeline When is branch address calculated? ID stage (stage ) When is condition evaluated? EX stage (stage 3) What resources are used? [Adder], Fwarding can fail where result is not ready lw gets value of $s in the which stage? ime (cycles) $ $s lw $s, ($) lw rouble! $s $t $s $s $t $s $s $t $s5 - Must delay/stall instruction ( pipeline bubble ) ime (cycles) $ $s lw $s, ($) lw $s $s $t $s $s $s $t $s Stall $s $t $s5 - [Stall when branch decoded]: 3 cycles lost per branch! (too much waste as there are ~3% branch instructions) Delayed branches: execute instructions not dependent on branch direction befe the address is resolved Predict Branch Not aken (e.g. f selections): 7% branches not taken on average Execution continued down the sequential path If branch taken, squash/cancel instructions in pipeline Predict Branch aken (e.g. f loops): 53% MIPS branches taken on average Can t proceed until we know the branch address (ID stage) Sophisticated branch prediction: execute the me likely stream of instructions, undo if guessed wrongly 3 in So far we considered data hazards limited to registers here may also be data hazards in memy A cache miss may delay load ste instruction Ste load instructions on the same address too close in time Name dependency: only happens to use the same name hree categies: Classification depending on der of reads writes: RAW read after write WAR write after read WAW write after write What about RAR read after read? some machines enfce memy accesses strictly in program der this affects pipeline perfmance some allow a different der in general me hardware is required f control Delayed branches Compilers move instruction that has no conflict with branch into delay slot Fills about 6% of branch delay slots About 8% of instructions executed in branch delay slots useful in computation 8% (6% x 8%) of slots usefully filled Where to get branch delay slot instructions? Befe branch instruction From fall through only valuable when branch-not-taken From the target address only valuable when branch-taken Original sequence add $ $5 $6 beq $ $ Redered to this beq $ $ add $ $5 $6
4 ime (cycles) States in -bit prediction scheme Predict-not-taken branch success instructions in sequence PC already calculated, so use it to get next instruction his scheme is simplest to implement Squash instructions in pipeline if branch taken the instructions in IF, ID EX stage are discarded nothing has been written so far $t beq $t, $t, $t - lw $s $s $s $s 8 Flush these instructions A loop of iterations befe exit: // f (i=; i<; i) a[i] = a[i] *.; aken N N Feedback Not aken Perfmance f loops one misprediction on the last execution of the loop one misprediction on the first execution of the loop (previous execution of this loop ended with branch not taken) f example if the loop executed times: 8% accuracy F less regular branches much lower accuracy may even be zero if the branch is alternately: not taken - taken C $s $s5-3 $s 6 slt, $s, $s3 slt $s3 slt 5 States in -bit prediction scheme Predict-taken branch (Early Branch Resolution) 8 C Can t proceed until we know the branch address (ID stage) MIPS still incurs one cycle branch penalty [Penalty now is only one lost cycle] Makes sense if branch conditions are me complex take longer to calculate $t beq $t, $t, lw $t $s $s 3 $s 6 slt, $s, $s3 slt $s3 slt ime (cycles) Flush this instruction Finite state automaton the -bit value is a saturating counter increment on taken decrement on not taken Use bits per entry instead of one Prediction must be wrong twice befe it is changed if predicted taken start execution at target as soon as target is known if predicted not-taken continue sequential execution until direction Change prediction only if getting misprediction twice 6 3 Pipelined Perfmance Example he rest is f self-study Static branch prediction can be perfmed by the compiler by hinting all fward branches not-taken all backward branches (loops) as taken unrolling loops to decrease the frequency of branches can be based on program traces f frequently executed programs we can determine which branches are me likely to be taken, which me likely to be not taken Dynamic branch prediction Prediction is based on the run-time behaviour of the program Behaviour of the branch Behaviour of the preceding branches Requires additional hardware to ste this infmation Branch Prediction Buffer Branch Histy able 7 SPECIN benchmark: 5% loads % stes % branches % jumps 5% R-type Suppose: % of loads used by next instruction 5% of branches mispredicted All jumps flush next instruction What is the average CPI? Load/Branch CPI = when no stalling, when stalling. hus, CPI lw = (.6) (.) =. CPIb eq = (.75) (.5) =.5 F a program with billion instructions, Execution ime =? Average CPI = (.5)(.) (.)() (.)(.5) (.)() (.5)() =.5 3 Branch Prediction Buffer [x8] beq $t,$t,l Pipelined Perfmance Example A table indexed by the lower ption of the address of the branch instruction Contains one bit per entry: recently taken not taken like a cache but every access is a hit he prediction MAY NO be f the branch currently executed it depends how many entries in the table me entries -> me hardware -> slower start branch execution accding to the prediction if prediction crect continue if prediction increct invert the prediction bit ste in the table cancel partially executed instructions start the proper sequence of instructions Pipelined process critical path: c = (t read t mux t eq t AND t mux t setup ) = [ ] ps = 55 ps Pipelined process critical path ) t pcq t mem t setup ) (t read t mux t eq t AND t mux t setup ) *half cycle c = max 3) t pcq t mux t mux t t setup ) t pcq t memwrite t setup 5) back (t pcq t mux t write ) *half cycle 8 3
5 Pipelined Perfmance Example F a program with billion instructions executing on a pipelined MIPS process, CPI =.5 c = 55 ps Execution ime = (# instructions) CPI c = ( )(.5)(55 - ) = 63 seconds What is speedup? 33 Summary Pipelining is a fundamental concept Multiple steps using distinct resources Exploiting parallelism in instructions What makes it easy? (MIPS vs. 8x86) All instructions are the same length simple instruction fetch Just a few instruction fmats read registers befe decode instruction opers only in loads stes fewer pipeline stages aligned memy access / load; memy access / ste What makes it hard? Structural hazards: suppose we had only one cache? Need me HW resources : need to wry about branch instructions? Branch prediction, delayed branch : an instruction depends on results of a previous instruction? need fwarding, compiler scheduling 3 Revision quiz How much faster is pipeline (slide 6)? Answer: min So the latency is 5 ps; ime between instructions is ps 5 List the steps f executing MIPS instructions through pipelining datapath. In addition, what functional hardware components are used? here are three types of pipeline hazards, what they are? answer (slide 5): answer (slide 7): With the delayed branch strategy, getting instructions from fall through is one solution. his solution is valuable only when branchnot-taken. ) rue ) False 35 Recommended readings ext readings are listed in eaching Schedule Learning Guide PH: ebook is available PH5: no ebook on library site yet PH5: companion materials (e.g. online sections f further readings) /?ISBN=
Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.
Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationCPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner
CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationLecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation
Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationCPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts
CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationLecture 6: Pipelining
Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationCSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements
CSE 4 Computer Architecture Spring 25 Lectures Exceptions and Introduction to Pipelining May 4, 25 Announcements Reading Assignment Sections 5.6, 5.9 The Processor Datapath and Control Section 6., Enhancing
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationPipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010
Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several
More informationChapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts
CS359: Computer Architecture Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University Parallel
More informationPage 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight
Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationPipeline: Introduction
Pipeline: Introduction These slides are derived from: CSCE430/830 Computer Architecture course by Prof. Hong Jiang and Dave Patterson UCB Some figures and tables have been derived from : Computer System
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationECE154A Introduction to Computer Architecture. Homework 4 solution
ECE154A Introduction to Computer Architecture Homework 4 solution 4.16.1 According to Figure 4.65 on the textbook, each register located between two pipeline stages keeps data shown below. Register IF/ID
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationChapter 8. Pipelining
Chapter 8. Pipelining Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationDEE 1053 Computer Organization Lecture 6: Pipelining
Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationOverview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP
Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined
More informationECEC 355: Pipelining
ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationComputer Systems Architecture Spring 2016
Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,
More informationELE 655 Microprocessor System Design
ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationComputer Architectures
Computer Architectures Pipelined instruction execution Hazards, stages balancing, super-scalar systems Pavel Píša, Michal Štepanovský, Miroslav Šnorek Main source of inspiration: Patterson Czech Technical
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationAppendix A. Overview
Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 1 Unpipelined
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationCS 61C: Great Ideas in Computer Architecture Control and Pipelining
CS 6C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs6c/sp6 Datapath Control Signals ExtOp: zero, sign
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationInstruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31
4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor
More informationWorking on the Pipeline
Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCS 251, Winter 2018, Assignment % of course mark
CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationThe Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationOrange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction
More informationECE260: Fundamentals of Computer Engineering
Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationChapter 3. Pipelining. EE511 In-Cheol Park, KAIST
Chapter 3. Pipelining EE511 In-Cheol Park, KAIST Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup
More informationChapter 4 (Part II) Sequential Laundry
Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30
More informationECE473 Computer Architecture and Organization. Pipeline: Control Hazard
Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction
More informationCS 251, Winter 2019, Assignment % of course mark
CS 251, Winter 2019, Assignment 5.1.1 3% of course mark Due Wednesday, March 27th, 5:30PM Lates accepted until 1:00pm March 28th with a 15% penalty 1. (10 points) The code sequence below executes on a
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationLecture 15: Pipelining. Spring 2018 Jason Tang
Lecture 15: Pipelining Spring 2018 Jason Tang 1 Topics Overview of pipelining Pipeline performance Pipeline hazards 2 Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 20
More informationLecture 2: Processor and Pipelining 1
The Simple BIG Picture! Chapter 3 Additional Slides The Processor and Pipelining CENG 6332 2 Datapath vs Control Datapath signals Control Points Controller Datapath: Storage, FU, interconnect sufficient
More informationProcessor (II) - pipelining. Hwansoo Han
Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number
More informationCOMPUTER ORGANIZATION AND DESIGN
ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationPipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College
Pipelining: Overview CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining the Wash Divide into 4 steps: Wash, Dry, Fold, Put Away Perform the steps in parallel Wash 1 Wash 2, Dry 1 Wash
More informationSI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,
SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty
More informationSuggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining
More informationLecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)
Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationBasic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?
Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOSC4201 Pipelining. Prof. Mokhtar Aboelaze York University
COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC
More informationELCT 501: Digital System Design
ELCT 501: Digital System Lecture 8: Pipelining Dr. Mohamed Abd El Ghany, Pipelining: Its Natural! Laundry Example Ann, brian, cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes
More informationCMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1
CMCS 611-101 Advanced Computer Architecture Lecture 8 Control Hazards and Exception Handling September 30, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Computer
More informationCSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;
CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency
More information