微算機系統第六章. Enhancing Performance with Pipelining 陳伯寧教授電信工程學系國立交通大學. Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold

Size: px
Start display at page:

Download "微算機系統第六章. Enhancing Performance with Pipelining 陳伯寧教授電信工程學系國立交通大學. Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold"

Transcription

1 微算機系統第六章 Enhancing Performance with Pipelining 陳伯寧教授電信工程學系國立交通大學 chap6- Pipeline is natural! Laundry Example Ann, Brian, athy, Dave each have one load of clothes to wash, dry, and fold A B D Washer takes 3 minutes Dryer takes 4 minutes Folder takes 2 minutes chap6-2

2 Sequential laundry Time T a s k O r d e r A B D Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? chap6-3 Pipeline laundry: Start work ASAP Time T a s k O r d e r A B D Pipelined laundry takes 3.5 hours for 4 loads chap6-4

3 T a s k O r d e r Pipeline lessons Time A B D Pipelining doesn t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage ultiple tasks operating simultaneously using different resources Potential speedup = Number of pipeline stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Stall for Dependences chap6-5 Pipeline is nature! Laundry Example Ann, Brian, athy, Dave each have one load of clothes to wash, dry, and fold Washer takes 3 minutes A B D Dryer takes 3 minutes Folder takes 3 minutes Storer takes 3 minutes to put clothes into drawers chap6-6

4 Sequential laundry T a s k O r d e r Time A B D Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? chap6-7 Pipelined laundry: Start work ASAP T a s k O r d e r A B D Time Pipelined laundry takes 3.5 hours for 4 loads! chap6-8

5 T a s k O r d e r Lessons lessons A B D Time Pipelining doesn t help latency of single task, it helps throughput of entire workload ultiple tasks operating simultaneously using different resources Potential speedup = Number of pipeline stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Stall for Dependences chap6-9 The five stages of load instruction ycle ycle 2 ycle 3 ycle 4 ycle 5 Load Ifetch /Dec Exec em Wr Ifetch: Fetch Fetch the instruction from the emory /Dec: (isters Fetch and) Decode Exec: alculate the address em: the from the emory Wr: the to the register file chap6-

6 Pipeline execution Time IFetch Dcd Exec em WB IFetch Dcd Exec em WB IFetch Dcd Exec em WB IFetch Dcd Exec em WB IFetch Dcd Exec em WB Program Flow IFetch Dcd Exec em WB Utilization? Now we just have to make it work chap6- Single cycle, multiple cycle vs. pipeline lk ycle ycle 2 Single ycle Implementation: Load Store Waste ycle ycle 2 ycle 3 ycle 4 ycle 5 ycle 6 ycle 7 ycle 8 ycle 9 ycle lk ultiple ycle Implementation: Load Ifetch Exec em Wr Store Ifetch Exec em R-type Ifetch Pipeline Implementation: Load Ifetch Exec em Wr Store Ifetch Exec em Wr R-type Ifetch Exec em Wr chap6-2

7 Why pipeline? class lw sw add, sub, and, or, slt branch fetch 2ps 2ps 2ps 2ps ister read ps ps ps ps operation 2ps 2ps 2ps 2ps access 2ps 2ps ister write ps ps Total time 8ps 7ps 6ps 5ps chap6-3 Why pipeline? Single cycle vs. pipelined performance Program execution Time order (in instructions) lw $, ($) lw $2, 2($) lw $3, 3($) fetch access 8 ps fetch 8 ps access fetch 8 ps Program execution Time order (in instructions) lw $, ($) fetch lw $2, 2($) 2 ps fetch lw $3, 3($) 2 ps fetch access access access Speed-up factor = 24/4=.7 2 ps 2 ps 2 ps 2 ps 2 ps chap6-4

8 Why pipeline? Single cycle vs. pipelined performance Suppose we add additional instructions to the previous 3 instructions Single ycle achine 24 + * 8 = 824 ps Pipelined machine 4 + * 2 = 24 ns The speed-up factor becomes 824/24 = chap6-5 Speed-up of pipeline The ideal speed-up from pipelining equals the number of pipeline stages: In the previous example, 5. However, due to the imbalance of time required by each stage, the time required by the longest instruction divides by the time required by the longest stage, i.e., 8ps/2ps = 4. As instruction count increases, the speed-up of pipeline will approach the ideal value, i.e., 4, in the previous example. Notable, 5 is an non-achievable speed-up ratio. chap6-6

9 Why pipeline? Because the resources are there! Time (clock cycles) I n s t r. O r d e r Inst Inst Inst 2 Inst 3 Inst 4 Im Dm Im Dm Im Dm Im Dm Im Dm chap6-7 an pipelining get us into trouble? Yes: Pipeline Hazards, defined as the moment when the next instruction cannot be executed in the following clock cycle.. structural hazards: attempt to use the same resource two different ways at the same time E.g., the next instruction and the to be written are placed in the same chip. 2. control hazards: attempt to make a decision before condition is evaluated E.g., branch instructions chap6-8

10 an pipelining get us into trouble? ( ps). If one cannot resolve the branch in the second stage, then an even larger slowdown will occur for stall on branches. E.g., gcc consists of 7% of conditional branches. chap6-9 an pipelining get us into trouble? Why nickname of pipeline stall is bubble? An example explanation. Some may wish to use Prediction to resolve the stall on branches. Just execute the next instruction anyway (guess the condition will fail). If the guess of executing the next instruction is wrong, just bubble the previous execution. A dynamic prediction based on the history is also possible! chap6-2

11 an pipelining get us into trouble? Guess is right! Guess is wrong! chap6-2 an pipelining get us into trouble? Another way to solve the control hazard - delayed branch Ask the assembler to re-order the program such that an instruction that is not affected by branch is placed after the conditional branch instruction. E.g., the assembly code in the previous slide can be as follows. chap6-22

12 an pipelining get us into trouble? 3. hazards: attempt to use item before it is ready, or more specifically, an instruction depends on the s of a previous instruction still in the pipeline E.g., instruction depends on of prior instruction still in the pipeline such as add $s, $t, $t; $s=$t+$t sub $t2, $s, $t3; $t2=$s-$t3 Since the second instruction has to wait for the first instruction to pass the fifth stage, additional three bubbles may need to be added in the pipeline. chap6-23 an pipelining get us into trouble? Solution to hazard forwarding or bypassing chap6-24

13 an pipelining get us into trouble? Sometimes additional stall is needed for forwarding or bypassing chap6-25 Summary Pipelining is a fundamental concept multiple steps using distinct resources Utilize capabilities of the path by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush latency) detect and resolve hazards pipeline control must detect the hazard take action (or delay action) to resolve hazards chap6-26

14 Pipelining What makes it easy (for IPS) all instructions are the same length just a few instruction formats operands appear only in loads and stores What makes it hard (in general)? structural hazards: suppose we had only one control hazards: need to worry about branch instructions hazards: an instruction depends on a previous instruction chap6-27 Pipelining We ll build a simple pipeline and look at these issues We ll talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc. chap6-28

15 Basic idea: What do we need to add to actually split the path into stages? IF: fetch ID: decode/ register file read EX: Execute/ address calculation Possible cause of control hazard E: emory access WB: back 4 Sift left 2 P ress ister ister register 2 isters 6 32 Sign extend A Zero L U ress Possible cause of hazard chap6-29 Pipelined path IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register isters 6 2 Sign extend 32 Zero ress chap6-3

16 Pipeline example: lw $s, 2($t) ( st stage) Fetch Shaded area = active area Right-half = read and left-half = write IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register isters 6 2 Sign extend 32 Zero ress chap6-3 Pipeline example: lw $s,2($t) (2 nd stage) Decode IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register isters 6 2 Sign extend 32 Zero ress chap6-32

17 Pipeline example: lw $s,2($t) (3 rd stage) Execution IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register isters 6 2 Sign extend 32 Zero ress chap6-33 Pipeline example: lw $s,2($t) (4 th stage) emory IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register isters 6 2 Sign extend 32 Zero ress chap6-34

18 Pipeline example: lw $s,2($t) (5 th stage) Back IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register isters 6 2 Sign extend 32 Zero ress chap6-35 Problem on the previous design The write register number has been changed by the subsequent instruction. Back IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register isters 6 2 Sign extend 32 Zero ress chap6-36

19 orrected path IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register 2 isters Zero ress 6 Sign extend 32 chap6-37 Graphically representing pipelines ultiple-clock-cycle pipeline diagrams An overall view of multiple clock snap-shot. Single-clock-cycle pipeline diagrams Each view shows one clock cycle snap-shot. chap6-38

20 ultiple-clock-cycle pipeline diagram Time (in clock cycles) Program execution order (in instructions) lw $, 2($) I D sub $, $2, $3 I D an help with answering questions like: how many cycles does it take to execute this code? what is the doing during cycle 4? use this representation to help understand paths chap6-39 Single-clock-cycle pipeline diagram lw $,2($) IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register 2 isters Zero ress 6 Sign extend 32 chap6-4

21 Single-clock-cycle pipeline diagram sub $,$2,$3 lw $,2($) IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register 2 isters Zero ress 6 Sign extend 32 chap6-4 Single-clock-cycle pipeline diagram sub $,$2,$3 lw $,2($) IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register 2 isters Zero ress 6 Sign extend 32 chap6-42

22 Single-clock-cycle pipeline diagram sub $,$2,$3 lw $,2($) IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register 2 isters Zero ress 6 Sign extend 32 chap6-43 Single-clock-cycle pipeline diagram sub $,$2,$3 lw $,2($) IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register 2 isters Zero ress 6 Sign extend 32 chap6-44

23 Single-clock-cycle pipeline diagram sub $,$2,$3 IF/ID ID/EX EX/E E/WB 4 Sift Left 2 P ress register register 2 register 2 isters Zero ress 6 Sign extend 32 chap6-45 Pipeline control PSrc IF/ID ID/EX EX/E E/WB P 4 ress register register 2 2 register isters [5 ] 6 Sign 32 extend [2 6] [5 ] Shift Left 2 Src 6 control Op Zero Branch em ress em emto Dst chap6-46

24 Pipeline control We have 5 stages. What needs to be controlled in each stage? Fetch and P Increment Decode / ister Fetch Execution emory Stage Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? chap6-47 Pipeline control Pass control signals along just like the Execution/ress alculation stage control lines emory access stage control lines em em -back stage control lines Dst Op Op Src Branch write R-format lw sw X X beq X X em to W B ontrol W B EX WB IF/ID ID/E X EX/ E E /W B chap6-48

25 path with control PSrc ontrol ID/EX WB EX/E WB E/WB IF/ID EX WB P 4 ress n register register 2 isters 2 register Shift left 2 Src Zero Branch ress em emto [5 ] 6 Sign 32 extend 6 control em [2 6] [5 ] Dst Op chap6-49 Dependences Problem with starting next instruction before the first one is finished dependencies that go backward in time are hazards chap6-5

26 Dependences Value of register $2: Program execution order (in instructions) Time (in clock cycles) sub $2, $, $ I / D and $2, $2, $5 I D or $3, $6, $2 I D add $4, $2, $2 I D sw $5, ($2) I D chap6-5 Software solution Have compiler guarantee no hazards Where do we insert the nop s? sub $2, $, $3 and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2) Problem: This really slows us down! sub $2, $, $3 nop nop and $2, $2, $5 or $3, $6, $2 add $4, $2, $2 sw $5, ($2) chap6-52

27 Forwarding Use temporary s, don t wait for them to be written register file forwarding to handle read/write to same register forwarding chap6-53 Forwarding Time (in clock cycles) Valueofregister$2: / Value of EX/E : X X X 2 X X X X X Value of E/WB : X X X X 2 X X X X Program execution order (in instructions) sub $2, $, $3 I D and $2, $2,$5 I D or $3, $6, $2 I D add $4, $2, $2 I D sw $5, ($2) I D chap6-54

28 Forwarding ID/EX WB EX/E ontrol WB E/WB IF/ID EX WB P isters u x u x u x IF/ID.isterRs Rs IF/ID.isterRt Rt IF/ID.isterRt IF/ID.isterRd Rt Rd u x EX/E.isterRd The two Rt s are identical. Forwarding unit E/WB.isterRd chap6-55 Load word can still cause a () hazard: an instruction tries to read a register, following a load instruction that writes to the same register. Thus, in additional to a forwarding unit, we need a (timebackward) hazard detection unit to stall the load instruction Program execution order (in instructions) lw $2,2($) Time (in clock cycles) I D This time-backward hazard cannot be solved by forwarding and $4, $2,$5 I D or $8, $2,$6 I D add $9, $4, $2 I D slt $, $6, $7 I D chap6-56

29 We can stall the pipeline by keeping an instruction in the same stage Program Time (in clock cycles) execution order (in instructions) lw $2, 2($) I D and $4, $2, $5 I D or $8, $2, $6 I I D bubble add $9, $4, $2 I D Repeat in 4 slt $, $6, $7 I D chap6-57 We can stall the pipeline by keeping an instruction in the same stage Time (in clock cycles) Program execution order (in instructions) lw $2, 2($) I D bubble and becomes nop I D add $4, $2, $5 I D or $8, $2, $6 I D add $9, $4, $2 I D chap6-58

30 Hazard Detection Unit Stall by letting an instruction that won t write anything go forward Hazard detection unit ID/EX.em ID/EX IF/ID IF/ID ontrol WB EX EX/E WB E/WB WB P P isters u x u x IF/ID.isterRs IF/ID.isterRt IF/ID.isterRt IF/ID.isterRd Rt Rd u x EX/E.isterRd ID/EX.isterRt Rs Rt Forwarding unit E/WB.isterRd chap6-59 Branch hazard When we decide to branch, other instructions are in the pipeline! Program execution order (in instructions) Time (in clock cycles) beq $, $3, 72 I D 44 and $2, $2, $5 I D 48 or $3, $6, $2 I D 52 add $4, $2, $2 I D 72 lw $4, 5($7) I D chap6-6

31 Branch hazard We are predicting branch not taken need to add hardware for flushing instructions (as well as the necessary inter-stage register contents) if we are wrong chap6-6 Flushing s: an early decision at the ID stage IF.Flush Hazard detection unit ID/EX WB EX/E ontrol WB E/WB IF/ID EX WB 4 Shift left 2 P isters = Sign extend Forwarding unit chap6-62

32 Dynamic branching prediction Assuming branch-not-taken is not the only prediction. If the branch is taken, we have a penalty of one cycle For our simple design, this is reasonable With deeper pipelines, penalty increases and static branch prediction drastically hurts performance Solution: dynamic branch prediction Branch prediction buffer or branch history table can be added to record if a branch was taken the last time. This helps improving the prediction accuracy, especially for loops. -bit prediction: The prediction status changes if the one prediction error occurs. 2-bit prediction: The prediction status changes if a prediction is wrong twice consecutively. chap6-63 Dynamic branching prediction Taken Predict taken Not taken Taken Predict taken Taken Not taken Predict not taken Not taken Taken Predict not taken Not taken A 2-bit prediction scheme chap6-64

33 Dynamic branching prediction Sophisticated Techniques: orrelating predictors that base prediction on global behavior and recently executed branches (e.g., prediction for a specific branch instruction based on what happened in previous branches) Tournament predictors that use different types of prediction strategies and keep track of which one is performing best. A branch delay slot which the compiler tries to fill with a useful instruction (make the one cycle delay part of the ISA) Branch prediction is especially important because it enables other more advanced pipelining techniques to be effective! odern processors predict correctly 95% of the time! chap6-65 Improving performance Try and avoid stalls! E.g., reorder these instructions: lw $t, ($t) lw $t2, 4($t) sw $t2, ($t) sw $t, 4($t) lw $t, ($t) lw $t2, 4($t) sw $t, 4($t) sw $t2, ($t) Dynamic Pipeline Scheduling Hardware chooses which instructions to execute next Will execute instructions out of order (e.g., doesn t wait for a dependency to be resolved, but rather keeps going!) Speculates on branches and keeps the pipeline full (may need to rollback if prediction incorrect) Trying to exploit instruction-level parallelism chap6-66

34 Advanced Pipelining Increase the depth of the pipeline Start more than one instruction each cycle (multiple issue) Loop unrolling to expose more Line Parallelism (better scheduling) Superscalar processors DE Alpha 2264: 9 stage pipeline, 6 instruction issue All modern processors are superscalar and issue multiple instructions usually with some limitations (e.g., different pipes ) Very long instruction word (VLIW), static multiple issue (relies more on compiler technology) This class has given you the background you need to learn more! chap6-67 Summary Pipelining does not improve latency, but does improve throughput ulticycle (Section 5.5) Deeply pipelined Pipelined ultiple issue with deep pipeline (Section 6.) ultiple-issue pipelined (Section 6.9) ultiple issue with deep pipeline (Section 6.) ultiple-issue pipelined (Section 6.9) Single-cycle (Section 5.4) Pipelined Deeply pipelined Single-cycle (Section 5.4) ulticycle (Section 5.5) Slower Faster s per clock (IP = /PI) Several Use latency in instructions chap6-68

35 Exceptions/Interrupts Another form of control hazard is the exception. E.g., add $, $2, $ happens to have overflow. Then we need to transfer control immediately to the exception routine at some specific location, because we do not want this invalid value to contaminate other registers or location. Result: Another flush signal should be generated after the execution unit. chap6-69 Basic Interrupt Processing for X86 How 886 system handles interrupts? The Software view. From H~3FFH, there are 256 types. Notably, type = interrupt vector. Basically, the first 32 interrupt vectors are reserved for system. The remaining 224 interrupt vectors are user interrupt vectors. FFFFFH 3FFH ~ H : 4243 S:IP reserved for interrupts chap6-7

36 Basic Interrupt Processing for X86 urrent executing program Example. INT 2 ; The above is equivalent to ; S = [4*2+2] ; = [5] = [32H] = 234H ; IP = [4*2] = [3H] = 5H ; Then type 2 interrupt subroutine is ; placed at 2345H 3FFH : 2H 34H H 5H : reserved for interrupts H ~ S:IP 2345H ~ 33H 32H 3H 3H return interrupt subroutine chap6-7 Basic Interrupt Processing for X86 7 Assignment of software interrupts. Type : Divide Error aused by DIV and IDIV when the quotient exceeds the maximum value that the division instruction allows. Example. DIV X (DX*65536+AX)/X = AX... DX If X=DX = H and AX=H then quotient=ax=h>ffffh. A type interrupt is launched. chap6-72

37 Basic Interrupt Processing for X86 Hardware interrupt chap6-73 Interrupt Vector Function Table Number PU interrupt P interrupt H Divide error Divide error H Single step Single step (debug) 2H NI(hardware interrupt) Nonmaskable interrupt pin 3H Breakpoint Breakpoint 4H Interrupt on overflow Arithmetic overflow 5H BOUND interrupt Print screen key and BOUND instruction 6H Invalid opcode Illegal instruction error 7H oprocessor emulation interrupt oprocessor not present interrupt 8H Double fault Timer tick (harware)(approximately 8.2Hz) 9H oprocessor segment overrun Keyboard(harware) AH Invalid task state segment Hardware interrupt 2(system bus)(cascade in AT) BH Segment not present Hardware interrupt 3(system bus) H Stack fault Hardware interrupt 4(system bus) DH General protection fault Hardware interrupt 5(system bus) EH Page fault Hardware interrupt 6(system bus) FH Reserved* Hardware interrupt 7(system bus) H Floating-point error Video BIOS H Alignment check interrupt Equipment environment 2H Reserved* oventional size 3H Reserved* Direct disk service 4H Reserved* Serial O port service 5H Reserved* iscellaneous service 6H Reserved* Keyboard service 7H Reserved* Parallel port LPT service 8H Reserved* RO BASI 9H Reserved* Reboot AH Reserved* lock service BH Reserved* ontrol-break handler H Reserved* User timer service chap6-74

38 Interrupt Vector Function Table Number PU interrupt P interrupt DH Reserved* Pointer for video parameter table EH Reserved* Pointer for disk drive parameter table FH Reserved* Pointer to graphics character pattern table 2H User interrupts Terminate program 2H User interrupts DOS services 22H User interrupts Program termination handler 23H User interrupts ontrol handler 24H User interrupts ritical error handler 25H User interrupts disk 26H User interrupts disk 27H User interrupts Terminate and stay resident 28H User interrupts DOS idle 29H User interrupts unused 2AH User interrupts unused 2BH User interrupts unused 2H User interrupts unused 2DH User interrupts unused 2EH User interrupts unused 2FH User interrupts ultiplex handler 3H-6FH User interrupts unused 7H User interrupts Hardware interrupts 8(AT style computer) 7H User interrupts Hardware interrupts 9(AT style computer) 72H User interrupts Hardware interrupts (AT style computer) 73H User interrupts Hardware interrupts (AT style computer) 74H User interrupts Hardware interrupts 2(AT style computer) 75H User interrupts Hardware interrupts 3(AT style computer) 76H User interrupts Hardware interrupts 4(AT style computer) 77H User interrupts Hardware interrupts 5(AT style computer) 78H-FFH User interrupts unused chap6-75 Superscalar and dynamic pipelining Superpipelining = longer pipelines Superscalar = Replicate the internal components of the computers so that multiple instructions in every pipeline stage can be performed. The hardware may still issue only one instruction if certain parallel condition is not met. Dynamic pipelining = Later ready-to-go instructions can proceed in parallel even if a hazard occurs and is currently under resolving in previous instruction. Sections after 6.9 will not be included in exams. chap6-76

39 Suggestive exercises 6., 6.2, 6.3, 6.4, 6.6, 6.7, 6.9, 6.2, 6.3, 6.35 chap6-77

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns Chapter Si Pipelining Improve perfomance by increasing instruction throughput eecutionı Time lw $, ($) 2 6 8 2 6 8 access lw $2, 2($) 8 ns access lw $3, 3($) eecutionı Time lw $, ($) lw $2, 2($) 2 ns 8

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

Lecture 6: Pipelining

Lecture 6: Pipelining Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

What do we have so far? Multi-Cycle Datapath (Textbook Version)

What do we have so far? Multi-Cycle Datapath (Textbook Version) What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001

More information

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts CS359: Computer Architecture Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University Parallel

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Designing a Pipelined CPU

Designing a Pipelined CPU Designing a Pipelined CPU CSE 4, S2'6 Review -- Single Cycle CPU CSE 4, S2'6 Review -- ultiple Cycle CPU CSE 4, S2'6 Review -- Instruction Latencies Single-Cycle CPU Load Ifetch /Dec Exec em Wr ultiple

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4 IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM

More information

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions? Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1, SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty

More information

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements CSE 4 Computer Architecture Spring 25 Lectures Exceptions and Introduction to Pipelining May 4, 25 Announcements Reading Assignment Sections 5.6, 5.9 The Processor Datapath and Control Section 6., Enhancing

More information

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,

More information

Chapter 8. Pipelining

Chapter 8. Pipelining Chapter 8. Pipelining Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

What do we have so far? Multi-Cycle Datapath

What do we have so far? Multi-Cycle Datapath What do we have so far? lti-cycle Datapath CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instrction being processed in datapath How to lower CPI frther? #1 Lec # 8 Spring2 4-11-2 Pipelining pipelining

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

The Pipelined MIPS Processor

The Pipelined MIPS Processor 1 The niversity of Texas at Dallas Lecture #20: The Pipeline IPS Processor The Pipelined IPS Processor We complete our study of AL architecture by investigating an approach providing even higher performance

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing (2) Pipeline Performance Assume time for stages is ps for register read or write

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Enhanced Performance with Pipelining

Enhanced Performance with Pipelining Chapter 6 Enhanced Performance with Pipelining Note: The slides being presented represent a mi. Some are created by ark Franklin, Washington University in St. Lois, Dept. of CSE. any are taken from the

More information

Pipeline: Introduction

Pipeline: Introduction Pipeline: Introduction These slides are derived from: CSCE430/830 Computer Architecture course by Prof. Hong Jiang and Dave Patterson UCB Some figures and tables have been derived from : Computer System

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20

More information

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15 Pipeline Processors Pipelining :: Slide 1 of 15 Pipeline Processors A common feature of modern processors Works like a series production line An operation is divided into k decoupled (independent) elementary

More information

Multi-cycle Datapath (Our Version)

Multi-cycle Datapath (Our Version) ulti-cycle Datapath (Our Version) npc_sel Next PC PC Instruction Fetch IR File Operand Fetch A B ExtOp ALUSrc ALUctr Ext ALU R emrd emwr em Access emto Data em Dst Wr. File isters added: IR: Instruction

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Lecture 19 Introduction to Pipelining

Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at

More information

Advanced Computer Architecture Pipelining

Advanced Computer Architecture Pipelining Advanced Computer Architecture Pipelining Dr. Shadrokh Samavi Some slides are from the instructors resources which accompany the 6 th and previous editions of the textbook. Some slides are from David Patterson,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Design a MIPS Processor (2/2)

Design a MIPS Processor (2/2) 93-2Digital System Design Design a MIPS Processor (2/2) Lecturer: Chihhao Chao Advisor: Prof. An-Yeu Wu 2005/5/13 Friday ACCESS IC LABORTORY Outline v 6.1 An Overview of Pipelining v 6.2 A Pipelined Datapath

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

Basic Pipelining Concepts

Basic Pipelining Concepts Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Appendix A. Overview

Appendix A. Overview Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 1 Unpipelined

More information

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining: Overview CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining the Wash Divide into 4 steps: Wash, Dry, Fold, Put Away Perform the steps in parallel Wash 1 Wash 2, Dry 1 Wash

More information

Chapter 5 (a) Overview

Chapter 5 (a) Overview Chapter 5 (a) Overview (a) The principles of pipelining (a) A pipelined design of SRC (b) Pipeline hazards (b) Instruction-level parallelism (ILP) Superscalar processors Very Long Instruction Word (VLIW)

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

Overview of Pipelining

Overview of Pipelining EEC 58 Compter Architectre Pipelining Department of Electrical Engineering and Compter Science Cleveland State University Fndamental Principles Overview of Pipelining Pipelined Design otivation: Increase

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues. Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency

More information

Lecture 15: Pipelining. Spring 2018 Jason Tang

Lecture 15: Pipelining. Spring 2018 Jason Tang Lecture 15: Pipelining Spring 2018 Jason Tang 1 Topics Overview of pipelining Pipeline performance Pipeline hazards 2 Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 20

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

MIPS An ISA for Pipelining

MIPS An ISA for Pipelining Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch

More information

ELCT 501: Digital System Design

ELCT 501: Digital System Design ELCT 501: Digital System Lecture 8: Pipelining Dr. Mohamed Abd El Ghany, Pipelining: Its Natural! Laundry Example Ann, brian, cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes

More information

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1 CMCS 611-101 Advanced Computer Architecture Lecture 8 Control Hazards and Exception Handling September 30, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Computer

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information