Module 4c: Pipelining

Size: px
Start display at page:

Download "Module 4c: Pipelining"

Transcription

1 Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E P A T T E R S O N A N D H E N N E S S Y, C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N

2 Pipeline Performance of a computer can be increased by increasing the performance of the CPU. This can be done by executing more than one tasks at a time. This procedure is referred to as pipelining. The concept of pipelining is to allow the processing of a new task even though the processing of previous task has not ended. 2

3 The root of the single cycle processor s problems: The cycle time has to be long enough for the slowest instruction Solution: Break the instruction into smaller steps Execute each step (instead of the entire instruction) in one cycle Cycle time: time it takes to execute the longest step Keep all the steps to have similar length This is the essence of the multiple cycle processor

4 The advantages of the multiple cycle processor: Cycle time is much shorter Different instructions take different number of cycles to complete Load takes five cycles Jump only takes three cycles Allows a functional unit to be used more than once per instruction

5 Pipeline A single task is divided into several small independent processes. 5 Process T3 T2 T1 Segment 1 Segment 2 Segment 3

6 Analogy: Pipelined Laundry Non-pipelined approach: 1. run 1 load of clothes through washer 2. run load through dryer 3. fold the clothes (optional step for students) 4. put the clothes away (also optional). 6 Two loads? Start all over.

7 Analogy: Pipelined Laundry 7 While the first load is drying, put the second load in the washing machine. When the first load is being folded and the second load is in the dryer, put the third load in the washing machine.

8 T i m e T a s k o r d e r A B C D 6 P M A M Non-pipelined 16 units of time T i m e 6 P M A M T a s k o r d e r A B C D 8 Pipelined 7 units of time Notice that all the process have the same length.

9 Pipeline: Space Time Diagram Illustrates the behaviour of a pipeline S1 P1 P2 P3 P clock S2 S3 S4 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 T1 n-1 S = segment; P = process;

10 Pipelining Lessons Pipelining doesn t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Stall for Dependences

11 Design Issues Since each segment are connected to each other in a sequence, the next segment cannot start execution until it has received the result from the previous segment (in this case, pipelining is not ideal). 12 So, the cycle time of the segments must be the same. However, it is known that the execution time of each segment is not the same. Therefore for synchronization, the cycle time for the pipeline will be based on, the longest execution time of the segment in the pipeline.

12 Pipeline performance: Degree of Speedup Given t n is the cycle time for non-pipelining and t p for pipelining. An ideal pipeline divides a task into k independent sequential processes: Each process requires t p time unit to complete, The task itself then requires k t p time units to complete. For n iterations of the task, the execution times will be: With no pipelining: n t n time units, With pipelining: k t p + (n-1) t p time units. Degree of speedup is thus: S = Ex. Time non-pipelining / Ex. Time with pipelining = (n t n )/ [k+(n-1)] t p If n is too large from k, n >> k and when t n = k t p, thus max. speedup: S max = k 13 An ideal pipeline each process has requires the same time unit

13 The Laundromat In a Laundromat the following process occur run 1 load of clothes through washer : 8 run load through dryer : 4 fold the clothes : 5 put the clothes away : 3 Answer these: Determine the cycle time for pipelined and non-pipelined process? Determine the execution time for both pipelined and nonpipelined execution. What is the maximum speedup possible? What is the real speedup value?

14 Non- Ideal Pipeline Structure: Example The operands pass through all segments in a fixed sequence 15 Control Unit Segments are separated by registers, that hold the intermediate results between stages. R1 S1 R2 S2... Rm Sm Data In Segment 1 Each segment Segment consists of 2 a circuit Segment m that performs sub-operation Data Out

15 Example 1: Pipeline (Non-ideal case) Given a 4 segment pipeline whereby each segment has a delay time as follows: - Segment 1 : 40 ns - Segment 2 : 25 ns - Segment 3 : 45 ns - Segment 4 : 45 ns The delay time for the interface register is 5ns. Calculate the: i) cycle time of the non-pipeline and pipeline, ii) execution time for 100 tasks, iii) real speedup, and iv) maximum speedup. 16

16 Example 1: Pipeline 17 Control Unit Data Input Segment 1 Segment 2 Segment 3 Segment 4 Data Output Seg.1 : 40 ns Cycle time Seg.2 : 25 ns Seg.3 : 45 ns Seg.4 : 45 ns i) Cycle time: t n = ( ) ns = 160ns t p = the longest delay for execution + interface delay = (45 + 5) ns = 50 ns

17 Example 1: Pipeline ii. Execution time for 100 tasks 18 = k x t p + (n-1) t p (k + (n 1)) t p = ((4 * 50) + (99 * 50)) ns = (50 * 103) ns = 5150 ns For non-pipeline system, the total execution time for 100 tasks = n t n = 100 * 160ns = ns iii. The real speedup for 100 task Speed up = Execution time for non-pipeline /Execution time for pipeline = / 5150= iv. Maximum speedup, S max = k = 4

18 Instruction Pipeline The instruction cycle clearly shows the sequence of operations that take place in order to execute a single instruction. A good design goal of any system is to have all of its components performing useful work all of the time high efficiency. Following the instruction cycle in a sequential fashion does not permit this level of efficiency. Analogy: Automobile assembly line Perform all tasks concurrently, but on different (sequential) instructions 20 The result is temporal parallelism Result is the instruction pipeline This cause the instruction fetch and execute phases to overlap and perform simultaneous operations.

19 Implementation of Instruction Pipeline: Case 1 Divide the instruction cycle into two processes 22 Instruction fetch (Fetch cycle) Everything else (Execution phase) While one instruction is in execution, pre-fetching of the next instruction Assumes the memory bus will be idle at some point during the execution phase. Reduces the time to fetch an instruction to zero (ideal situation). Instruction prefetching is also known as fetch overlap

20 Implementation of Instruction Pipeline: Case 1 23 Sequential Fetch #1 Execute #1 Fetch #2 Execute #2 Pipeline Fetch #1 Execute #1 Fetch #2 Execute #2

21 Implementation of Instruction Pipeline: Case 1 Fetch and execute are not the same size, take different times to operate so its not really neat or ideal Execution is generally longer than fetch Whenever the execution unit is not using memory, the control increments the program counter, and read the consecutive instructions from memory efficient Fetch and put into queue But sometimes execution needs to branch, so all prefetching is useless. e.g. needs instruction 123 rather than 102 flush the queue and fetch again Also, if there is a branch, fetch must wait for the address from execute Execute must wait until that address is fetched.

22 Implementation of Instruction Pipeline: Case 2 Complex computer instructions require other finer phases Having more stages can further speedup Example: use a 6-stage pipeline: Fetch instruction (FI) read next instruction into buffer Decode instruction (DI) determine opcode and operand Calculate operands (CO) calculate effective address of operand Fetch operands (FO) fetch operand from memory Execute instruction (EI) Write (store) operand (WO) store result in memory The various stages will be more of equal duration Lets assume equal duration 26

23 Implementation of Instruction Pipeline: Case 2 Use multiple execution functional units to parallelize the actual execution phase of several instructions Use branching strategies to minimize branch impact 27

24 Space Time Diagram 28 A 6-stage pipeline can reduce execution time of 9 ins. From 54 to 14 time units. Non-pipelined: 9 ins. X 6 stages = 54 time units.

25 Comments on diagram Assumes each instruction goes thru all 6 stages not true Example: load instruction don t need WO stage Assumes there are no memory conflicts Example: FI, FO, WO involve memory access cannot do it simultaneously Unless FO and WO stage is null or value needed is already in cache then this is not a problem. Assumes no interrupt or branching happens Assumes no data dependency Where the CO stage may depend on contents of a register that could be altered by a previous instruction still in the pipeline

26 6-stage CPU Instruction Pipeline

27 Example 2 The estimated timings for each of the stages of an instruction pipeline: 31 Instruction Fetch Register Read ALU Operation Data Memory Register Write 2ns 1ns 2ns 2ns 1ns

28 P r o g r a m e x e c u t i o n T i m e o r d e r ( i n i n s t r u c t i o n s ) mov ax, num I n s t r u c t i o n D a t a R e g A L U R e g f e t c h a c c e s s mov bx, num2 8 n s I n s t r u c t i o n f e t c h R e g A L U D a t a a c c e s s R e g mov cx, num3 8 n s I n s t r u c t i o n f e t c h 8 n s... P r o g r a m e x e c u t i o n T i m e o r d e r ( i n i n s t r u c t i o n s ) mov ax, num I n s t r u c t i o n D a t a R e g A L U R e g f e t c h a c c e s s mov bx, num2 2 n s I n s t r u c t i o n f e t c h R e g A L U D a t a a c c e s s R e g mov cx, num3 2 n s I n s t r u c t i o n f e t c h R e g A L U D a t a a c c e s s R e g 2 n s 2 n s 2 n s 2 n s 2 n s 32

29 Pipeline Limitations

30 Pipeline Limitations Difficulties sometimes called limitations or hazards In general there are 3 major difficulties that causes instruction pipeline to deviate from its normal operation (i.e. affect pipeline performance) Resource conflict (or resource hazard or structural hazard) Caused by access to memory by 2 segments at the same time Data dependency (or data hazard) Conflict arise when an instruction depends on the result of a previous instruction, but this result is not yet available Branch difficulties (or control hazard) Arise from branch and other instructions that change the PC value Pipeline depth is often not included in the list but it does have an effect on pipeline performance (so it will be discussed)

31 Pipeline Limitations Hazards in pipeline can make the pipeline to stall Eliminating a hazard often requires that some instructions in the pipeline to be allowed to proceed while others are delayed When an instruction is stalled, instructions issued later than the stalled instruction are stopped, while the ones issued earlier must continue No new instructions are fetched during the stall

32 Pipeline Limitation: Pipeline depth If the speedup is based on the number of stages, why not build lots of stages? 37 Each stage uses latches at its input (output) to buffer the next set of inputs. More stages means more hardware overhead If the stage granularity is reduced too much, the latches and their control become a significant hardware overhead. Also suffer a time overhead in the propagation time through the latches. Limits the rate at which data can be clocked through the pipeline. Logic to handle memory and register use and to control the overall pipeline increases significantly with increasing pipeline depth.

33 6 Deep i.e. 6 stages

34 Pipeline Limitation : Data dependency 39 Pipelining must insure that computed results are the same as if computation was performed in strict sequential order. With multiple stages, two instructions in execution in the pipeline may have data dependencies -- must design the pipeline to prevent this. Data dependencies limit when an instruction can be input to the pipeline. Data dependency is shown in the following portion of a program: A = B + C; D = E + A; C = G x H; A = D / H; D needs A but cant read value of A because it still being written by previous instruction

35 Say that this is a 5-staged pipeline I1: ADD AX, BX ; [AX] [AX] + [BX] I2: SUB AX, CX ; [AX] [AX] - [CX] ADD ins. does not update reg. AX until stage 5 But SUB ins. needs the value at beginning of stage 3 cannot fetch something that is not ready To maintain correct operation, the pipeline must stall for 2 clock cycles Results in inefficient pipeline usage

36 Solutions to Data Dependency Hardware interlocks 42 An interlock is a circuit that detects instructions with data dependency and inserts required delays to resolve conflicts Operand forwarding Use circuit to detect possible conflict. Then uses extra hardware to keep results for future use down the pipeline Allowing the result of ALU directly sent as input to ALU to be used by other ALU operations in the next instruction cycle. Delayed load (NOP) Compiler is used to detect conflict and reorder instructions to delay loading of conflicting data. Using NOP instruction.

37 Solution to Data Dependency: Operand Forwarding 43 Src1, Src2 RSLT ALU Operation Operand Store Allowing result of ALU to be used as input to ALU Forwarding Data Path

38 Solution to Data Dependency: Delay Load (NOP) Fetch Instruction Decode Instruction Execute Instruction Store Result I1 NOP I2 I3 I4 I1 NOP I2 I3 I4 I1 NOP I2 I3 I4 I1 NOP I2 I3 I4

39 Pipeline Limitation: Conflict of Resources Occurs when two segment need to access memory at the same time Can be solved by implementing modular memory. Fetch Instruction Decode Instruction Fetch I4 I1 I2 I3 I4 I5 I1 I2 I3 I4 I5 All need access to memory at the same time Execute Instruction Store Result Fetching indirect operand for I3 I1 I2 I3 I4 I5 I1 I2 I3 I4 I5 Store/Write I1

40 5-staged pipeline, ideal case We have a conflict with 2 instruction needing the same resource Assume that memory has a single port so data reads and writes can only happen 1 at a time. Assume that the source operand for I1 is in memory. A delay in any stage can cause pipeline stalls. The FI stage for I3 must idle for 1 clock cycle before beginning

41 Handling Resource Conflict 49 This scenario describes a memory conflict caused by the instruction fetch of I3 and memory resident operand fetch of I1.

42 Handling Resource Conflict 50 Harvard architecture alleviates this issue.

43 Pipeline Limitation 4: Branching For the pipeline to have the desired operational speedup, we must feed it with long strings of instructions. However, 15-20% of instructions in an assemblylevel stream are (conditional) or not ; until it is actually executed. branches. 51 There must be a steady flow of instructions Conditional branches impossible to determine whether the branch will be taken Of these, 60-70% take the branch to a target address. Impact of the branch is that pipeline never really operates at its full capacity limiting the performance improvement that is derived from the pipeline

44 Example 4: Branching 52 Instruction that need to be flushed out Fetch Instruction Decode Instruction Execute Instruction Store Result idle

45 Solution for Branching 53 Delayed branch (using NOP) Rearranging the instructions Implementation of instruction queue

46 Solution for Branching: Delay Branch Using NOP 54 When the compiler detect a branch, it will automatically insert several NOP so that there is no interruptions in the pipeline. Example: I4 I5 I6 I7 MOV INC ADD RET KK MOV INC ADD RET KK The number of NOPs used is 1 less than the number of segments If k = Number of segments Then, NOP to be used = k 1 = 4 1 = 3 I8 NOP NOP NOP

47 Solution for Branching: Delay Branch Using NOP Fetch Inst.(FI) Decode Inst.(DI) Execute Instr.(EI) Store Result I4 I5 I6 I7 NOP NOP NOP KK I4 I5 I6 I7 NOP NOP NOP KK I4 I5 I6 I7 NOP NOP NOP KK I4 I5 I6 I7 NOP NOP NOP KK 4 segments 3 NOPs Wasted 3 pipeline clock cycles per segment

48 Solution for Branching: Rearranging Instructions Tasks are rearranged so that the pipelined can operate efficiently. Fetch Instruction Decode Instruction Execute Instruction Store Result Mov Add Inc Ret 57 Ret Mov Add Inc How to arrange? By bringing the branch instruction up few notches. The number of notches = No segment 1 = k 1 = 4 1 = 3 Ret Mov Add Inc I1 I2 I3 I4 I5 Ret Mov Add Inc I1 I2 I3 I4 Ret Mov Add Inc I1 I2 I3 Ret Mov Add Inc I1 I2

49 Solution for Branching: Rearranging Instructions I4 I5 I6 I7 I8 MOV INC ADD RET KK Fetch Inst.(FI) Decode Inst.(DI) Execute Instr.(EI) I7 RET KK I4 MOV I5 INC I6 ADD I7 I4 I5 I6 KK I7 I4 I5 I6 KK How to arrange? By bringing the branch instruction up few notches. The number of notches = No segment 1 = k 1 = 4 1 = 3 I7 I4 I5 I6 KK Store Result I7 I4 I5 I6 KK

50 I1 I2 I3 I4 I5 Example: Rearranging Instructions MOV AX,BX ADD BX,CX INC CX JMP LOSSY MOV DX,1 59 Pipeline with branching hazard I6 INC BX : I11 LOSSY: DEC AX Flushed out Idle Fetch Inst.(FI) I1 I2 I3 I4 I5 I6 I11 Decode Inst.(DI) I1 I2 I3 I4 I5 I11 Execute Instr.(EI) I1 I2 I3 I4 I11 Store Result I1 I2 I3 I4 I11

51 I1 I2 I3 Example: Rearranging Instructions MOV AX,BX ADD BX,CX INC CX I4 I1 I2 JMP LOSSY MOV AX,BX 60 ADD BX,CX Bring JMP inst. up by 3 notches I4 I5 I6 JMP LOSSY MOV DX,1 INC BX I3 I5 I6 INC CX MOV DX,1 INC BX Pipeline AFTER rearranging instructions : I11 LOSSY: DEC AX : I11 LOSSY: DEC AX Fetch Inst.(FI) Decode Inst.(DI) Execute Instr.(EI) Store Result I4 I1 I2 I3 I11 I4 I1 I2 I3 I11 I4 I1 I2 I3 I11 I4 I1 I2 I3 I11

52 Instruction Queue Solution for Branching: Instruction Queue 61 Fetch Instruction Prefetch target instruction: Prefetch both possible next instructions in the case of a conditional branch Fetch Operand Execute Instruction Store Result Instruction Queue S1 S2 JMP S4 When FI segment detect a JMP the next address instruction regarding to the JMP will be determined. S4 will be deleted and the new instruction will be fetched.

53 Example 5: Instruction Pipeline Given a pipeline that consist of 5 segments; Fetch Instruction, Decode Instruction, Fetch Operand, Execute Instruction and Store Result. 62 Say, n = 3 : I n : MOV AX, NUM ; AX NUM I n+1 : ADD BX, AX ; BX BX + AX I n+2 : JMP LOOP1 ; I n+10 I n+3 : : : : : I n+10 : LOOP1:

54 : I 3 MOV AX,NUM ; AX <-- NUM I 4 ADD BX, AX ; Here BX <-- we have BX + a AX I 5 JMP LOOP1 ; branching I13 problem I 6 : : : I 13 LOOP1: Draw the space time diagram for the execution of the instructions- Identify the branching and data dependency hazard flushed Fetch Inst.(FI) I3 I4 I5 I6 I7 I8 I13 Pipeline is idle until JMP is completed and I13 is fetched and completed Decode Inst.(DI) I3 I4 I5 I6 I7 I13 Fetch Opr.(FO) I3 I4 I5 I6 I13 Execute Instr.(EI) When I5 (JMP) is executed, all fetched instructions will be flushed I3 I4 I5 I13 Store Result I3 I4 I5 I13

55 : I 3 MOV AX,NUM ; AX <-- NUM I 4 ADD BX, AX ; BX <-- BX + AX I 5 JMP LOOP1 ; I13 I 6 : : : Here we have data dependency problem Draw the space time diagram for the execution of the instructions- Identify the branching and data dependency hazard I4 needs AX, but AX is still being processed by I3 I 13 LOOP1: Fetch Inst.(FI) I3 I4 I5 I6 I7 I8 I13 Decode Inst.(DI) I3 I4 I5 I6 I7 I13 Fetch Opr.(FO) I3 I4 I5 I6 I13 Execute I3 I4 I5 I13 Instr.(EI) Here is where data dependency is shown Store Result I3 I4 I5 I13 in the diagram

56 : I 3 MOV AX,NUM ; AX <-- NUM I 4 ADD BX, AX ; BX <-- BX + AX I 5 JMP LOOP1 ; I13 I 6 : : : I 13 LOOP1: Solve the data dependency problem using NOP instructions. Insert NOP between I3 and I4 I3 NOP I4 I5 : : Fetch Inst.(FI) I3 NOP I4 I5 I13 Decode Inst.(DI) I3 NOP I4 I5 I13 Fetch Opr.(FO) I3 NOP I4 I5 I13 Execute Instr.(EI) I3 NOP I4 I5 I13 Store Result I3 NOP I4 I5 I13

57 : I 3 MOV AX,NUM ; AX <-- NUM NOP I 4 ADD BX, AX ; BX <-- BX + AX I 5 JMP LOOP1 ; I13 I 6 : : : I 13 LOOP1: Solve the branching problem by rearranging the instructions. The number of notches = No segment 1 = 5 1 = 4 I3 NOP I4 I5 : : I5 I2 I3 NOP I4 : : Fetch Inst.(FI) I5 I2 I3 NOP I4 I13 Decode Inst.(DI) Fetch Opr.(FO) Execute Instr.(EI) I5 I2 I3 NOP I4 I13 I5 I2 I3 NOP I4 I13 I5 I2 I3 NOP I4 I13 Store Result I5 I2 I3 NOP I4 I13

58 : I 3 MOV AX,NUM ; AX <-- NUM NOP I 4 ADD BX, AX ; BX <-- BX + AX I 5 JMP LOOP1 ; I13 I 6 : : : I 13 LOOP1: Solve the branching problem using NOP instructions. I3 NOP I4 I5 : : The number of NOPs = No segment 1 = 5 1 = Fetch Inst.(FI) I3 NOP I4 I5 NOP NOP NOP NOP I13 Decode Inst.(DI) I3 NOP I4 I5 NOP NOP NOP NOP I13 Fetch Opr.(FO) I3 NOP I4 I5 NOP NOP NOP NOP I13 Execute Instr.(EI) I3 NOP I4 I5 NOP NOP NOP NOP I13 Store Result I3 NOP I4 I5 NOP NOP NOP NOP I13

59 Try this out Given a pipeline that consist of 4 segments; Fetch Instruction, Decode Instruction, Execute Instruction and Store Result. Use this information to answer the following questions. -Draw the space time diagram for I1 MOV AX,BX the execution. I2 SUB BX,2 -Label clearly the data dependency I3 ADD BX, NUMB problem, the flushed instructions and the empty/idle slots. I4 JMP LOSSY I5 INC BX 16 DEC NUMB I7 LOSSY: DEC AX -Solve the data dependency using NOP instructions -Solve the branching problem by rearranging the instructions - Draw a new space time diagram

60 I1 I2 I3 I4 MOV AX,BX SUB BX,2 ADD BX, NUMB JMP LOSSY Draw the space time diagram for the execution. Label the data dependency problem, the flushed instructions and the empty/idle slots. I5 INC BX 16 DEC NUMB I7 LOSSY: DEC AX Fetch Inst.(FI) Decode Inst.(DI) Execute Instr.(EI) I1 I2 I3 I4 I5 I6 I7 I1 I2 I3 I4 I5 I7 Data dependency problem Flushed instructions Empty/idle slots I1 I2 I3 I4 I7 Store Result I1 I2 I3 I4 I7

61 I1 MOV AX,BX I2 SUB BX,2 I3 ADD BX, NUMB I4 JMP LOSSY I5 INC BX 16 DEC NUMB I7 LOSSY: DEC AX A I1 I2 NOP I3 I4 B I1 I4 I2 NOP I3 -Solve the data dependency using NOP instructions A -Solve the branching problem by rearranging the instructions B - Draw a new space time diagram C Fetch Inst.(FI) I1 I4 I2 NOP I3 I7 Decode Inst.(DI) Execute Instr.(EI) I1 I4 I2 NOP I3 I7 I1 I4 I2 NOP I3 I7 Store Result I1 I4 I2 NOP I3 I7

62 Arithmetic Pipeline The process of a complex arithmetic is subdivided into several task segments Every parameter that is involve with the arithmetic operations is processed in the segments and is control by the CPU clock. Example: S i = A i * B i +C i 71 Z i = A i * B i +C i *D i K i = A i * B i +C i /D i

63 Example 6: Designing Arithmetic Pipeline 72 Design an arithmetic pipeline for the operation S = A i * B i + C i where i = 1, 2, 3,, n. Specifications: 3 segment pipeline Each segment is allow to process only two tasks. Component: Register (R), Adder (A), Multiplier (M).

64 Example 6 S = A i * B i + C i where i = 1, 2, 3,, n. 73 The sequence of operation (follow the standard mathematical operational: Access the require data for the registers. Multiplying operation Addition operation

65 Example 6: S = A i * B i + C i Segment definition: Segment 1 : Reg 1 A i, Reg 2 B i Segment 2 : Multiply Reg 3 A i * B i, Reg 4 C i Segment 3 : Add Reg 5 Reg 3 + C i, S Reg 5 74 Notice that all segments only execution of 2 subtasks

Chapter 8. Pipelining

Chapter 8. Pipelining Chapter 8. Pipelining Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.

More information

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.

More information

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15 Pipeline Processors Pipelining :: Slide 1 of 15 Pipeline Processors A common feature of modern processors Works like a series production line An operation is divided into k decoupled (independent) elementary

More information

Lecture 15: Pipelining. Spring 2018 Jason Tang

Lecture 15: Pipelining. Spring 2018 Jason Tang Lecture 15: Pipelining Spring 2018 Jason Tang 1 Topics Overview of pipelining Pipeline performance Pipeline hazards 2 Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 20

More information

Pipeline: Introduction

Pipeline: Introduction Pipeline: Introduction These slides are derived from: CSCE430/830 Computer Architecture course by Prof. Hong Jiang and Dave Patterson UCB Some figures and tables have been derived from : Computer System

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Chapter 5 (a) Overview

Chapter 5 (a) Overview Chapter 5 (a) Overview (a) The principles of pipelining (a) A pipelined design of SRC (b) Pipeline hazards (b) Instruction-level parallelism (ILP) Superscalar processors Very Long Instruction Word (VLIW)

More information

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight Pipelining: Its Natural! Chapter 3 Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes A B C D Dryer takes 40 minutes Folder

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Lecture 19 Introduction to Pipelining

Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) 1 Lecture 19 Introduction to Pipelining CSE 30321 Lecture 19 Pipelining (Part 1) Basic pipelining basic := single, in-order issue single issue one instruction at

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions

More information

Chapter 14 - Processor Structure and Function

Chapter 14 - Processor Structure and Function Chapter 14 - Processor Structure and Function Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 14 - Processor Structure and Function 1 / 94 Table of Contents I 1 Processor Organization

More information

MC9211Computer Organization. Unit 4 Lesson 1 Processor Design

MC9211Computer Organization. Unit 4 Lesson 1 Processor Design MC92Computer Organization Unit 4 Lesson Processor Design Basic Processing Unit Connection Between the Processor and the Memory Memory MAR PC MDR R Control IR R Processo ALU R n- n general purpose registers

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as

More information

Pipelining. Principles of pipelining Pipeline hazards Remedies. Pre-soak soak soap wash dry wipe. l Chapter 4.4 and 4.5

Pipelining. Principles of pipelining Pipeline hazards Remedies. Pre-soak soak soap wash dry wipe. l Chapter 4.4 and 4.5 Pipelining Pre-soak soak soap wash dry wipe Chapter 4.4 and 4.5 Principles of pipelining Pipeline hazards Remedies 1 Multi-stage process Sequential execution One process begins after previous finishes

More information

Pipelining. Parts of these slides are from the support material provided by W. Stallings

Pipelining. Parts of these slides are from the support material provided by W. Stallings Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance

More information

ELCT 501: Digital System Design

ELCT 501: Digital System Design ELCT 501: Digital System Lecture 8: Pipelining Dr. Mohamed Abd El Ghany, Pipelining: Its Natural! Laundry Example Ann, brian, cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

UNIT 3 - Basic Processing Unit

UNIT 3 - Basic Processing Unit UNIT 3 - Basic Processing Unit Overview Instruction Set Processor (ISP) Central Processing Unit (CPU) A typical computing task consists of a series of steps specified by a sequence of machine instructions

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

CPU Structure and Function

CPU Structure and Function Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com http://www.yildiz.edu.tr/~naydin CPU Structure and Function 1 2 CPU Structure Registers

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

CPU Structure and Function

CPU Structure and Function CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

EE 457 Unit 6a. Basic Pipelining Techniques

EE 457 Unit 6a. Basic Pipelining Techniques EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does

More information

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining: Overview CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining the Wash Divide into 4 steps: Wash, Dry, Fold, Put Away Perform the steps in parallel Wash 1 Wash 2, Dry 1 Wash

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1 Pipelining and Vector Processing Parallel Processing: The term parallel processing indicates that the system is able to perform several operations in a single time. Now we will elaborate the scenario,

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory Pipelining the Idea Similar to assembly line in a factory Divide instruction into smaller tasks Each task is performed on subset of resources Overlap the execution of multiple instructions by completing

More information

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version: SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Working on the Pipeline

Working on the Pipeline Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 9. Pipelining Design Techniques

Chapter 9. Pipelining Design Techniques Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask

More information

ECE-7 th sem. CAO-Unit 6. Pipeline and Vector Processing Dr.E V Prasad

ECE-7 th sem. CAO-Unit 6. Pipeline and Vector Processing Dr.E V Prasad ECE-7 th sem. CO-Unit 6 Pipeline and Vector Processing Dr.E V Prasad 12.10.17 Contents Parallel Processing Pipelining rithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing rray Processors

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005

East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005 Points missed: Student's Name: Total score: /100 points East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005 Section

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17 Short Pipelining Review! ! Readings! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining

More information

Lecture 5: Pipelining Basics

Lecture 5: Pipelining Basics Lecture 5: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3) 1 The Assembly Line Unpipelined Start and finish a job

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Lecture 6: Pipelining

Lecture 6: Pipelining Lecture 6: Pipelining i CSCE 26 Computer Organization Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed from various books, websites, authors pages, and other

More information

DC57 COMPUTER ORGANIZATION JUNE 2013

DC57 COMPUTER ORGANIZATION JUNE 2013 Q2 (a) How do various factors like Hardware design, Instruction set, Compiler related to the performance of a computer? The most important measure of a computer is how quickly it can execute programs.

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts CS359: Computer Architecture Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University Parallel

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx

Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx Microprogram Control Practice Problems (Con t) The following microinstructions are supported by each CW in the CS: RR ALU opx RA Rx RB Rx RB IR(adr) Rx RR Rx MDR MDR RR MDR Rx MAR IR(adr) MAR Rx PC IR(adr)

More information

CS 3510 Comp&Net Arch

CS 3510 Comp&Net Arch CS 3510 Comp&Net Arch Pipeline Dr. Ken Hoganson 2010 Enhancing Performance We observed that we can obtain better performance in executing instructions, if a single cycle accomplishes multiple operations:

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 9 Pipelining Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Data Hazards Instruction Hazards Advanced Reliable Systems (ARES) Lab.

More information

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements CSE 4 Computer Architecture Spring 25 Lectures Exceptions and Introduction to Pipelining May 4, 25 Announcements Reading Assignment Sections 5.6, 5.9 The Processor Datapath and Control Section 6., Enhancing

More information

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP Overview Appendix A Pipelining: Basic and Intermediate Concepts Basics of Pipelining Pipeline Hazards Pipeline Implementation Pipelining + Exceptions Pipeline to handle Multicycle Operations 1 2 Unpipelined

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization

More information

UNIT V: CENTRAL PROCESSING UNIT

UNIT V: CENTRAL PROCESSING UNIT UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC

More information

Processor Architecture

Processor Architecture ECPE 170 Jeff Shafer University of the Pacific Processor Architecture 2 Lab Schedule Ac=vi=es Assignments Due Today Wednesday Apr 24 th Processor Architecture Lab 12 due by 11:59pm Wednesday Network Programming

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

DSP VLSI Design. Pipelining. Byungin Moon. Yonsei University

DSP VLSI Design. Pipelining. Byungin Moon. Yonsei University Byungin Moon Yonsei University Outline What is pipelining? Performance advantage of pipelining Pipeline depth Interlocking Due to resource contention Due to data dependency Branching Effects Interrupt

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information