Computer Architectures. DLX ISA: Pipelined Implementation

Size: px
Start display at page:

Download "Computer Architectures. DLX ISA: Pipelined Implementation"

Transcription

1 Computer Architectures L ISA: Pipelined Implementation 1

2 The Pipelining Principle Pipelining is nowadays the main basic technique deployed to speed-up a CP. The key idea for pipelining is general, and is currently applied to several industry fields (productions lines, oil pipelines, ) A system S, has to execute N times a task A: A 1, A 2, A 3 A N S R 1, R 2, R 3 R N Latency : time occurring between the beginning and the end of task A (T A ). Throughput : frequency at which each task is completed. 2

3 1) Sequential System The Pipelining Principle A 1 A 2 A 3 A N t T A Latency (execution time of a single instruction) = T A Throughput(1) = 2) Pipelined System A 1 T A P 1 P 2 P 3 P 4 t S i : pipeline stage S 1 S 2 S 3 S 4 S 3

4 A 1 T P P 1 P 2 A 2 P 1 P 2 A 3 A 4 A n P 3 P 1 The Pipelining Principle P 4 P 3 P 2 P 1 P 4 P 3 P 4 P 2 P 3 P 4 T P : pipeline cycle Latency(2) = 4 *T P = T A Throughput(2) t 1 T P = 4 T A = 4 * Throughput(1) S 1 S 2 S 3 S 4 S 4

5 The Pipelining Principle (2) Pipelining does not decrease the amount of time needed for carrying out each single task: Latency(2) = Latency(1) Pipelining, instead, increases the Throughput, by multiplying it of a factor K equal to the number of stages of the pipeline: Throughput(2) = K * Throughput(1) This yields a reduction, by the same factor K, of the total execution time of a sequence of N tasks (T N ): T N = N Throughput T N (1) = N Throughput(1), T N(2) = N Throughput(2) Speedup 2 vs 1 = T N 1 T N 2 = Throughput(2) Throughput(1) =K 5

6 The Pipelining Principle (2) Ideal case: Real case: T P = T Pi = T A K T P = max T P1, T P2,.., T PK perfectly balanced pipeline (slightly) unbalanced pipeline Speedup = K Speedup < K Example: T A = 20 t (t: time unit) T P1 = 5t, T P2 = 5t, T P3 = 6t, T P4 = 4t T P = 6t Speedup 2 vs 1 = T A T P = 20t 6t =(<4) 6

7 Pipelining in a CP (L) Tasks: A 1, A 2, A 3 A N Instructions: I 1, I 2, I 3 I N I Combinatorial circuits E E WB t Registers (Pipeline Registers, FFs) / /E E/E E/WB E E WB CP (datapath) N.B. this architecture is COPLETELY different from the sequential one Pipeline Cycle Clock Cycle elay of the slowest stage CPI=1 (ideally!) 7

8 Pipeline in the L Instr i Instr i+1 Instr i+2 E E WB E E WB E E WB CPI (ideally) = 1 Instr i+3 Instr i+4 E E WB E E WB Overhead introduced by the Pipeline Registers: T clk = T d + T P + T su Clock Cycle elay of the Input stage register elay of the slowest combinatorial stage Set-up of the output stage register 8

9 Tp Combinatorial Circuit elay of the Input stage register elay of the slowest combinatorial stage Set-up of the output stage register 9

10 Requirements for implementation of the pipeline Each stage has to be active during each clock cycle. The PC has to be incremented in the stage (instead of ). An AER has to be introduced (PC <-- PC+4 PC <-PC+1) in the stage. Since instructions are aligned, a 30 bit register (counter) is incremented each clock cycle (2 ls bits are always 0). Two Rs are required (that will be referred to as LR e SR) to handle the situation where a LOA is immediately followed by a STORE (WB-E overlapping two data waiting to be written (one in memory, the other one in RF) are overlapping. At every clock cycle, it has to be possible to execute 2 memory accesses (, E): Instruction emory (I) and ata emory (): Harvard Architecture The CP clock is determined by the slowest stage: I, have to be cache memories (on-chip) Pipeline Registers store both data and control information (the Control nit is distributed among the pipeline stages) 10

11 Actually, it is a programmable counter since the two least-significative bits are always 0 4 A L Pipelined atapath For SCn E for Branch (also <0 and >0) E WB [acts on the output] if jumping PC EC For computing the new PC when branching =0? JL and JLR (PC stored in R31) PC INSTR E RS1 RS2 R RF =0? A L ATA E Sign extension Number of dest. registers in case of LOA and AL instr. SE For operations with immediates Number of destination register / /E E/E E/WB ata

12 stage / /E IR (J; JL)) IR 15-0 (Offset/Immediate/JR/Branch/Load est. reg. ) 26 (J and JL) 32 I R IR (Opcode) IR (R Instr.) EC LB SW Info travelling with the instruction 32 P C IR IR IR 15 IR 25 RS1 RS2 R RF Sign extension SE PC 31-0 (JAL,JALR) A B (31-16) Immed./Branch (31-26) Jump ata (from WB stage) 12 Number of the dest. register (from WB stage)

13 L Pipelined atapath for Branch for SCn E (also <0 e >0) E WB [acts on the output] PC 4 A Address I ata SR: Store emory ata Register LR: Load emory ata Register IRi : Instruction Register i : AL output, or AR, or Branch Target Address Y: data computed from prev. stages P C 1 I R 1 EC RF SE P C 2 A B I R 2 =0? / /E E/E E/WB =0? A L nr. destination register P C 3 C O N S R I R 3 JL JLR (PC in R31) P C 4 L R Y I R 4

14 Pipelined execution of an AL instruction NOTE: for these instructions, RS2/R need to be carried along the pipeline and up to the WB stage IR <- [PC] ; PC <- PC + 4 ; PC1 <- PC + 4 ecoded opcode is carried along all stages E E WB A <- RS1; B <- RS2; PC2 <- PC1; IR2<-IR1 /E <- Instruction decode; <- A op B or <- A op (IR2 15 ) 16 ## IR Y <- (temp. storing, waiting for WB) R <- Y [IR4 <- IR3] [PC3 <- PC2] [IR3 <- IR2] [PC4 <- PC3] NOTE: IRi bits that are not needed for all instructions are dropped during successive stages. From a stage to the next one, those bits that are needed for all instructions are kept : ALOTPT (in E/E), Y : ALOTPT1 14

15 Pipelined execution of a E instruction ecoded opcode is carried along all stages E E WB IR <- [PC] ; PC <- PC + 4 ; PC1 <- PC + 4 A <- RS1; B <- RS2; PC2 <- PC1; IR2<-IR1 /E <- Instruction decode; <- A op (IR2 15 ) 16 ## IR SR <- B LR <- [] (LOA) or [] <- SR (STORE) R <- R (LOA) [Sign ext.] [IR3 <.- IR2 [PC3 <- PC2] [PC4 <- PC3] [IR4 <.- IR3] : AR (ata emory Address Registrer) 15

16 Pipelined execution of a BRANCH instruction (normally after a SCn instruction) E IR <- [PC] ; PC <- PC + 4 ; PC1 <- PC + 4 A <- RS1; B <- RS2; PC2 <- PC1; IR2<-IR1 /E <- Instruction decode; <- PC2 op (IR 15 ) 16 ## IR Cond <- A op 0 [PC3 <- PC2] [IR3 <.- IR2] ecoded opcode is carried along all stages E WB if (Cond) PC <- (NOP) [PC4 <- PC3] [IR4 <.- IR3 If the branch is taken, the PC is overwritten in this stage : BTA (BRANCH TARGET ARESS) Branch performed on the current value on register A 16

17 Pipelined execution of a JR instruction IR <- [PC] ; PC <- PC + 4 ; PC1 <- PC + 4 A <- RS1; B <- RS2; PC2 <- PC1; IR2<-IR1 /E <- Instruction decode; ecoded opcode is carried along all stages E E E <- A WB PC <- [IR3 <.- IR2] [PC3 <- PC2] [IR4 <.- IR3] [PC4 <- PC3] WB (NOP) What would the stage sequence be for a J instruction? 17

18 Pipelined execution of a JL or JLR instruction E IR <- [PC] ; PC <- PC + 4 ; PC1 <- PC + 4 A <- RS1; B <- RS2; PC2 <- P1; IR2<-IR1 /E <- Instruction decode; PC3 <- PC2 [IR3 <.- IR2] <- A (If JLR) <- PC2 + (IR 25 ) 6 ## IR (If JL) E WB PC <- ; PC4<- PC3 R31 <- PC4 [IR4 <- IR3] In this case PCi values are used ecoded opcode is carried along all stages NOTE: Writing on R31 can NOT be done on-the-fly since it could overlap with another register write operation 18

19 What would be the sequence in case of SCn (ex SLT R1,R2,R3)? E E WB IR <- [PC] ; PC <- PC + 4 ; PC1 <- PC + 4 A <- RS1; B <- RS2; PC2 <- P1; IR2<-IR1 /E <- Instruction decode;??? 19

20 Pipeline hazards A Hazard occurs when, in a specific clock cycle, an instruction currently flowing through a pipeline stage can not be executed in the same clock cycle. Structural Hazards The same resource is used by two different pipeline stages: the instructions currently in those stages can not be executed simultaneously. ata Hazards they are due to instruction dependencies. For example, an instruction that needs to read a register not yet written by a previous instruction (Rear After Write - RAW). Control Hazards The instructions that follow a branch depend from the branch result (taken/not taken). The instruction that can not be executed has to be stopped ( pipeline stall or pipeline bubbling ), together with all the following instructions, while the previous instructions can proceed normally (so as to eliminate the hazard). 20

21 Hazards and stalls The consequence of a data hazard: if instruction I i needs thre result of instruction I i-1 (data are read in the stage), it has to wait until after WB of I i-1 Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 Clk 6 Clk 7 Clk 8 Clk 9 Clk 10 Clk 11 Clk 12 I i-3 E E WB I i-2 E E WB I i-1 E E WB I i S S S WB I i+1 S S S WB Stall: the clock signal for I i,i i+1,.. is stopped for three cycles T 5 = 8 * CLK = (5 + 3) * CLK T N = N * 1 * CLK T 5 = 5 * (1 + 3/5 ) * CLK T N = N * (1 + S ) * CLK ideal CPI Stalls per Instruction effective CPI

22 Forwarding Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 Clk 6 Clk 7 Clk 8 Clk 9 A R3, R1, R4 E E WB SB R7, R3, R5 hazard E E WB OR R1, R3, R5 hazard E E WB LW R6, 100 (R3) hazard E E WB AN R9, R5, R3 no hazard E E WB Here too the data is not yet in RF since it is written on the positive clock edge at the end of WB (the register value is read in ) Forwarding allows eliminating almost all RAW hazards of the L pipeline without stalling the pipeline. (NOTE: in the L, registers are modified only in WB) 22

23 Forwarding implementation Alternatively, SPLIT-CYCLE (see next) write before read Comparison between RS1, RS2 and R1, R2 and the Opcodes Forwarding nit R1 (destination register/opcode) R2/OpCode RS1/RS2 OPCOE F RF A B Offset A L /E E/E E/WB Often performed inside the RF It allows anticipating the register on /E control: / opcode and comparison of R with RS1 and RS2 (/) 23

24 Forwarding nit Within the Forwarding nit, the opcodes of the instructions in the E, E and WB stages are decoded. If the instruction in the E stage needs a register value (either A or B i.e. an AL instruction, NOT a J or Branch instruction) the opcodes of the instructions in the E and WB stages are examined. If they require a register update, the number of the involved register is compared with the register numbers of the instruction in the E stage. If there is a match then the corresponding data is forwarded to the E stage, thus replacing the data read from the register file The bypass es (inputs of the /E barrier) are needed because a fetched instruction can require the contents of registers whose numbers can match that of the instruction in the WB stage (if it must store a register value). In this case data must be read from the E/WB barrier instead from the register file. Alternatively, split-cycle: T In this half-period the register is written In this half-period the register is read 24

25 ata hazard due to LOA instructions LW R1,32(R6) E E WB A R4,R1,R7 SB R5,R1,R8 AN R6,R1,R7 E E E NOTE: the datum required by the A is available only at the end of the E stage. The hazard can not be eliminated by means of forwarding (unless there is an additional input in the s between memory and AL and everything is done in the same clock cycle delays, there is a memory access in between which is already slow by itself!) As a matter of fact, the clock signal is not generated. The clock block is propagated along the pipeline one stage at a time. LW R1,32(R6) A R4,R1,R7 SB R5,R1,R8 AN R6,R1,R7 E E WB The pipeline needs to be stalled S E E S E S From the end of this stage onwards: standard forwarding E->E 25

26 elayed load In many RISC CPs, the hazard associated with the LOA instruction is not handled by the hardware through pipeline stalling, instead it is handled via software by the compiler (delayed load): LOA Instruction delay slot Next instruction The compiler tries to fill-in the delay-slot with a useful instruction (worst case: NOP). LW R1,32(R6) LW R3,10 (R4) A R5,R1,R3 LW R6, 20 (R7) LW R8, 40(R9) LW R1,32(R6) LW R3,10 (R4) LW R6, 20 (R7) A R5,R1,R3 LW R8, 40(R9) 26

27 PC BEQZ R4, 200 PC+4 PC+8 PC+12 Control Hazards SB R7, R3, R5 OR R1, R3, R5 LW R6, 100 (R8) Next Instruction Address R4 = 0 : Branch Target Address (taken) R4 0 : PC+4 (not taken) PC (BTA) BEQZ R4, 200 SB R7, R3, R5 OR R1, R3, R5 LW R6, 100 (R8) AN R9, R5, R3 Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 E E WB E New computed PC value (Aluout) New value in PC (one clock after) E E Clk 6 Clk 7 Clk 8 WB E WB E E WB E E WB Fetch with the new PC 27

28 Instruction Fetch L Pipelined atapath Instruction ecode (Branch or JP) Execute NOTE if the feedback signal of the new PC was taken directly from the AL instead than from ALOT the required stalls would obviously be 2 but: slower clock! 4 A EC BEQZ R4, 200 emory Write Back =0? =0? PC I RF A L When the new PC acts on the I three instructions have already travelled through the first three stages (E included) / SE /E E/E E/WB

29 BEQZ R4,200 Handling the Control Hazards Always Stall (three-clock block being propagated) the previous instruction has not been decoded yet Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 E E WB S S S S S Clk 6 Clk 7 Clk 8 Fetch at new PC Hyp.: Branch Freq.= 25 % CPI = (1 + S ) = ( * 0.25 ) = 1.75 Real situation repeated PC <- PC - 4 Predict Not Taken NOP NOP NOP BEQZ R4, 200 SB R7, R3, R5 OR R1, R3, R5 LW R6, 100 (R8) Here the new value is sampled by the PC No problem since no instruction has gone through WB! Clk 1 Clk 2 Clk 3 Clk 4 Clk 5 E E WB E E E Clk 6 Clk 7 Clk 8 WB E WB E Branch Completion E WB Flush: they become NOP 29

30 Stalls with jumps (1/3) E E WB if jumping forced NOP for jumping 4 A N O P PC EC N O P N O P =0? PC INSTR E On the first positive clock edge after sampling the assertion of the jumping condition, 3 NOPs must be inserted to replace the 3 unwanted instructions already present in the pipeline. RS1 RS2 R RF SE A L ATA E / /E E/E E/WB =0? ata

31 NOTE in this case the jump condition and the new PC are sent to the in the same clock cycle as the processing of the condition 4 A Stalls when jumping(2/3) E E WB if jumping forced NOP when jumping N O P PC EC N O P =0? PC INSTR E On the first positive clock edge after sampling the assertion of the jumping condition, 2 NOPs must be inserted to replace the 2 unwanted instructions RS1 RS2 R RF SE A L ATA E / /E E/E E/WB =0? ata

32 NOTE In this case the jumping condition and the new PC control the in the same Stalls when jumping (3/3) moment as the processing of the condition E E WB if jumping 4 A PC EC NOP when jumping N O P =0? PC INSTR E On the first positive clock edge after the assertion of the jumping condition, a NOP is inserted to replace the instruction currently in the / stage RS1 RS2 R RF SE A L ATA E / /E E/E E/WB =0? ata

33 NOTE here there is only one stall since the new value is inserted in the PC on the positive clock edge that ends the stage while, in the previous case, it was inserted after the E stage, that is, two clock later!! To reduce the number of stalls Independent AL for BRANCH/JP IR <- [PC] ; PC <- PC + 4; PC1 <- PC + 4 (New fetch only one stall) A <- RS1; B <- RS2; PC2 <- PC1 /E <- ecode; /E <- Opc ext. BTA <-PC1+ (IR 15 ) 16 ## IR 15-0 /(IR 25 ) 6 ## IR if Branch: if (RS1 op 0) PC <- BTA if JP always PC <- BTA E AL (additional full adder) E WB N.B. The full adder is separated from the adder +4 (this means it overlaps with the addition required to compute the next instruction!), otherwise the same adder has to be used together with some multiplexers (so to select whether to add 4 or the offset, and whether to use PC or PC1) 33

34 Standard addition Branch 4 BRANCH/JP 1 stall A E R isplacement of the Branch instruction PC of the Branch instruction P C 1 EC ## The new PC is selected according to the opcode and the value of the branch test register This actually coincide with the current value in PC (can be avoided) P C 2 PC I I R 1 RF A B Offset and sign extension = 0? NOTE: for the nconditional Jump instructions there is an analogous situation: we only need to provide further inputs to the s of the PC by taking into consideration either the RS1 register (JR and JRL) or the 26 less-significant bits of the IR with SE (J and JL) to be added to the current PC) / SE For Branches /E

35 elayed branch Similarly to the LOA case, with several RISC CPs the hazard associated with BRANCH instructions is handled via SW by the compiler (delayed branch): BRANCH instruction delay slot delay slot delay slot The compiler tries to fill-in the delay-slots with useful instructions (worst case: NOP). Next instruction 35

36 elayed branch/jump Original Add R5, R4, R3 Sub R6, R5, R2 Or R14, R6, R21 Sne R1, R8, R9 ; branch condition Br R1, +100 Obviously in this group of instructions there must be no jumps!!! Compiled Sne R1, R8, R9 ; branch condition Br R1, +100 Add R5, R4, R3 Sub R6, R5, R2 Or R14, R6, R21 Executed in both cases Instead of one or more postponed instructions, the compiler inserts NOPs in case no suitable instructions are available 36

37 Handling the Control Hazards ynamic Prediction: Branch Target Buffer -> no stall (almost..) PC TAGS Predicted PC T/NT N.B. Here the branch slot is selected during the clock cycle that loads IR1 in / = HIT : Fetch with predicted PC ISS : Fetch with PC + 4 Correct prediction : Wrong prediction : no stall 1-3 stalls (correct fetch in or E, see before) 37

38 Prediction Buffer: the simplest implementation uses a single bit that indicates what happened when the last branch occurred. Loop1 Loop2 When exiting loop2, the prediction fails (branch predicted as taken but actually it is untaken), then it fails again when it predicts as untaken whilst entering once again loop2 In case of predominance of one prediction, when the opposite situation occurs we have two consecutive errors. 38

39 Hence, usually two bits are used for branch prediction: TAKEN TAKEN NTAKEN TAKEN TAKEN TAKEN NTAKEN NTAKEN NTAKEN TAKEN NTAKEN NTAKEN 39

DLX computer. Electronic Computers M

DLX computer. Electronic Computers M DLX computer Electronic Computers 1 RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set Computer In CISC architectures the 10% of the instructions are used in 90%

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Pipelining: Basic Concepts

Pipelining: Basic Concepts Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

What do we have so far? Multi-Cycle Datapath (Textbook Version)

What do we have so far? Multi-Cycle Datapath (Textbook Version) What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Pipeline Architecture RISC

Pipeline Architecture RISC Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Lecture 19 Introduction to Pipelining

Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at

More information

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently Pipelining Computational assembly line Each step does a small fraction of the job All steps ideally operate concurrently A form of vertical concurrency Stage/segment - responsible for 1 step 1 machine

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!

More information

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder

More information

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC

More information

Pipelining: Hazards Ver. Jan 14, 2014

Pipelining: Hazards Ver. Jan 14, 2014 POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution Important guidelines: Always state your assumptions and clearly explain your answers. Please upload your solution document

More information

6.823 Computer System Architecture Datapath for DLX Problem Set #2

6.823 Computer System Architecture Datapath for DLX Problem Set #2 6.823 Computer System Architecture Datapath for DLX Problem Set #2 Spring 2002 Students are allowed to collaborate in groups of up to 3 people. A group hands in only one copy of the solution to a problem

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Appendix A. Overview

Appendix A. Overview Appendix A Pipelining: Basic and Intermediate Concepts 1 Overview Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 2 1 Unpipelined

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1 Classical concepts (prerequisite) 1. Instruction

More information

EECS 322 Computer Architecture Improving Memory Access: the Cache

EECS 322 Computer Architecture Improving Memory Access: the Cache EECS 322 Computer Architecture Improving emory Access: the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow

More information

EE 457 Unit 6a. Basic Pipelining Techniques

EE 457 Unit 6a. Basic Pipelining Techniques EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version: SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs

More information

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

The Pipelined MIPS Processor

The Pipelined MIPS Processor 1 The niversity of Texas at Dallas Lecture #20: The Pipeline IPS Processor The Pipelined IPS Processor We complete our study of AL architecture by investigating an approach providing even higher performance

More information

ECE 154A Introduction to. Fall 2012

ECE 154A Introduction to. Fall 2012 ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan

More information

Improve performance by increasing instruction throughput

Improve performance by increasing instruction throughput Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access

More information

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1 55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002, Appendix C Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Pipelining Multiple instructions are overlapped in execution Each is in a different stage Each stage is called

More information

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers? Pipelined CPUs Where are the registers? Study Chapter 4 of Text Second Quiz on Friday. Covers lectures 8-14. Open book, open note, no computers or calculators. L17 Pipelined CPU I 1 Review of CPU Performance

More information

MIPS An ISA for Pipelining

MIPS An ISA for Pipelining Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land Speeding Up DLX 1 DLX Execution Stages Version 1 Clock Cycle 1 I 1 enters Instruction Fetch (IF) Clock Cycle2 I 1 moves to Instruction Decode (ID) Instruction Fetch (IF) holds state fixed Clock Cycle3

More information