Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Size: px
Start display at page:

Download "Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining"

Transcription

1 Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining

2 Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction execution into stages The Five Stages for MIPS execution Add memory to store state OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 2 2

3 Pipeline Hazards Hazard: Situation when next instruction cannot execute in the following clock cycle Types of Hazards Structural hazards Control hazards Data hazards OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 3 3

4 Structural Hazards Use the same resource in different ways at the same time and the hadware cannot support the combination Example: Use a single memory for instruction & data If we had more than 4 instructions, 1st instruction will be accessing data 4th instruction fetching the instruction Both need to access the memory in the same clock cycle Since MIPS was designed with two distinct memories, we don t encounter this problem => No hazards OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 4 4

5 Structural Hazards MIPS can easily avoid other structural hazards We can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 5 5

6 Structural Hazards Single Memory is a Structural Hazard Time (clock cycles) I n s t r. O r d e r Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg Mem ALU Mem Reg Mem Reg ALU Reg Mem Reg ALU Mem Reg Mem Reg Two memory accesses: If the same memory is used OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 6 6

7 Control Hazards Attempt to make a decision, based on the result of one instruction, before condition is evaluated (Caused in the branch instruction) First solution: Pipeline stall (Bubble) Pause (Wait) before continuing the pipeline, until the decision is clear calculate the branch address, update PC during the second stage OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 7 7

8 Control Hazards First solution: Pipeline stall (Bubble) Next instruction halts until condition result is known Like a no-operation is inserted in the 3rd step Next instruction will be executed in 4th step P rogram ex ecution order (in instructions) add $4, $5, $6 T im e Instruction fetch R eg A L U D ata access R eg beq $1, $2, 40 2ns Instruction fetch R eg A L U D ata access R eg lw $3, 300($0) bu bble 4 ns This period has no fetch [bubble) 4 seconds only after adding the extra HW Instruction fetch OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 8 8 2ns R eg A L U D ata access R eg

9 Control Hazards Disadvantages of first solution (Stall) Stall slows down the pipeline Second solution: Predict Guess one direction, then backup if wrong Always predict that branch will fail If you are right, pipeline proceeds at full speed (1 clock cycle) If you are wrong, Do pipeline stall(2 clock cycles) OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 9 9

10 Reduce Delay of Branches IF.Flus h Move branch execution earlier in pipeline Test beq condition using XOR instead of subtraction Faster since no carry is required M ux H az ar d det e cti o n u ni t I D / E X W B E X / M E M C ontr ol 0 M u x M W B M EM / W B IF /I D E X M WB PC 4 Instr ucti on me mor y S hift left 2 R eg ist er s = M u x A L U D at a m e m or y M ux M ux Sig n exte nd M ux F or w ar ding u nit OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 10 10

11 Control Hazards Second solution: Predict (fig. 6.50) For beq, the branch decision is done at cycle 4. The 3 following instructions will be fetched & begin execution If branch is not performed, no time is lost (no stall) If the branch should be performed, these instructions have to be flushed Flushing usually replaces the instruction with nop instruction P r o g ra m e x e c u tio n o r d e r ( in in s tr u c t io n s ) T im e (in c l o c k c y c le s ) C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C b e q $ 1, $ 3, 72 IM R e g D M R e g 4 4 a n d $ 1 2, $ 2, $ 5 IM R e g D M R e g 4 8 o r $ 1 3, $ 6, $ 2 IM R e g D M R e g 5 2 a d d $ 1 4, $ 2, $ 2 IM R e g D M R e g 7 2 lw $ 4, 5 0 ($ 7 ) I M R e g D M R e g OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 11 11

12 Control Hazards Second solution: Predict Pipeline when branch is not taken No time is wasted Progr am execution order (in instructions) add $4, $5, $6 Time Instruction fetch Reg ALU Data access Reg beq $1, $2, 40 2 ns Instruction fetch Reg ALU Data access Reg lw $3, 300($0) 2 ns Instruction fetch Reg ALU Data access Reg OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 12 12

13 Control Hazards Second solution: Predict When branch is taken 2 ns wasted Moved test branch decision in 2nd stage Program execution order (in instructions) add $4, $5,$6 Time Instruction fetch Reg ALU Data access Reg beq $1, $2, 40 2 ns Instruction fetch Reg ALU Data access Reg bubble bubble bubble bubble bubble or $7, $8, $9 4 ns Instruction fetch Reg ALU Data access Reg OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 13 13

14 Control Hazards Disadvantage of second solution (Predict) Rigid and does not account for the specific branches Third solution: Dynamic Branch Prediction Guess depending on previous behavior of branch If right, pipeline proceeds at full speed If wrong, do pipeline stall and change prediction for next time Prediction changes over the lifetime of the program Prediction hardware has ~90% accuracy Cost of mis-prediction is higher OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 14 14

15 Control Hazards Fourth solution: Delayed branch Operation Always execute next instruction immediately after branch instruction, that dependent on the branch, or Try to execute branch first and delay next instruction This is not visible to the programmer Compilers fill ~50% of delays with useful instructions Instructions switched Program execution order (in instructions) beq $1, $2, 40 add $4, $5, $6 (Delayed branch slot) lw $3, 300($0) Time Instruction fetch 2 ns Reg Instruction fetch 2 ns OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) AL U Reg Instruction fetch 2 ns Data access ALU Reg Reg Data access AL U Reg Data access Reg

16 Data Hazards Problem: Instruction depends on the result of previous instruction still in the pipeline Attempt to use an item before it is ready Solution: Forwarding (Bypassing): Supply the needed intermediate results to the next instruction s stages as soon as they are evaluated Get the item early from the internal resources Forwarding: Result is passed forward from an earlier to a later instruction Bypassing: Passing the result by the register file to the desired unit OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 16 16

17 Data Hazards- Dependencies Progra m exe cution order (in instructions) Time (in clock cycles) Value of registe r $2: sub $2, $1, $3 CC 1 CC 2 CC 3 CC 4 CC 5 C C 6 IM Reg CC 7 CC 8 CC / DM Reg and $12, $2, $5 IM Reg DM R eg or $13, $6, $2 IM Reg D M Reg add $14, $2, $2 IM Reg DM Reg sw $15, 100($2) IM R eg DM Reg Backward dependencies are data hazards Forward dependencies are not hazards OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 17 17

18 Data Hazards- Dependencies Example: Problem with starting next instruction before first is finished sub instruction writes into $S2 All following instructions read $S2 Proper value is unavailable until the register is written (in cycle 5) Dependencies that go backward in time are data hazards OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 18 18

19 Data Hazards Example: add $s0, $t0, $t1 sub $t2, $s0, $t3 The subtract instruction immediately uses $s0 that is filled by the add instruction The add instruction doesn t write the result until the 5th stage Without intervention, a data hazard could severely stall the pipeline Solution: As soon as the ALU creates the sum for the add, forward it as an input for the subtract OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 19 19

20 Data Hazards Forwarding Only valid if the destination stage is later in time than the source stage Output of ALU (EX) of add instruction is forwarded to the input of ALU stage for sub instruction Program execution order Time (in instructions) add $s0, $t0, $t IF ID EX MEM WB sub $t2, $s0, $t3 IF ID EX MEM WB OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 20

21 Forwarding For some instruction types, we need to stall even with forwarding When an R-format instruction comes immediately after a load instruction This is done to prevent backward dependencies Program Time execution order (in instructions) lw $s0, 20($t1) IF ID EX MEM WB bubble bubble bubble bubble bubble sub $t2, $s0, $t3 IF ID EX MEM WB OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 21 21

22 Forwarding Without bubble: Backward (Data hazard) Program Time execution order (in instructions) lw $s0, 20($t1) IF ID EX MEM WB sub $t2, $s0, $t3 With bubble: Forward (No hazard) Program Time execution order (in instructions) lw $s0, 20($t1) IF ID EX MEM WB IF ID EX MEM WB bubble bubble bubble bubble bubble sub $t2, $s0, $t3 IF ID EX MEM WB OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 22

23 Forwarding Solution: Supply inputs to ALU by forwarding results as soon as they are evaluated Don t wait for the result to be written into register file Register file forwarding Handles read/write to same register ALU forwarding OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 23

24 Forwarding Example: $s2 will have 10 at the beginning and -20 at the end of cycle T i me (in c l ock cyc l es) Va l ue of register $2 : CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 Value of EX/MEM : Value of MEM/WB : X X X 20 X X X X X Program execution order X (in instruction) X X X 20 X X X X sub $2, $1, $3 IM Reg DM Reg and $12, $2, $5 IM Reg DM Reg or $13, $6, $2 IM Reg DM Reg add $14, $2, $2 IM Reg DM Reg sw $15, 100($2) Pipeline registers used to forward data IM Reg DM Reg OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 24

25 Improving Performance Exercise: For the following code that resembles a swap procedure: # $t1 = Addr v[k] lw $t0, 0($t1) # $t0(temp)= v[k] lw $t2, 4($t1) # $t2 = v[k+1] sw $t2, 0($t1) # v[k] = $t2 sw $t0, 4($t1) #v[k+1]= $t0 Draw the pipeline Find the hazards in this code Find out how can to reorder these instructions to avoid stalls OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 25

26 Improving Performance What about this order? # $t1 = Addr v[k] lw $t0, 0($t1) # $t0(temp)= v[k] lw $t2, 4($t1) # $t2 = v[k+1] sw $t0, 4($t1) #v[k+1]= $t0 sw $t2, 0($t1) # v[k] = $t2 On a machine with forwarding, the reordered sequence will take 4 clock cycles OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 26

27 Recent Trends in Performance Super-pipelining Super-scalar Dynamic pipelining OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 27 27

28 Super-Pipelining Remember: Speedup is related to # stages Idea: Make longer pipelines (more stages) Rebalance remaining steps so they are the same length Example: laundry Divide washing into: wash, rinse, & spin => 6 stages instead of 4 Recent microprocessors have >= 8 stages OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 28

29 Super-Scalar Idea: Replicate internal components to launch multiple instructions at the same time Effect: Instruction execution rate exceeds clock rate (CPI < 1) Example: laundry 3 washers 3 dryers 3 assistants to fold 3 assistants to put away laundry Example: Vote count OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 29

30 Super-Scalar Today s super-scalar computers have 2-6 instructions in every pipeline stage Problem: Difficult to implement if the instruction stream is dependent or doesn t meet the criteria OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 30

31 Super-Scalar MIPS Assumptions 2 instructions issued per clock cycle ALU/Branch instruction, in parallel with Load/Store instruction Need to fetch & decode 64 bits of instructions We examine the instructions & possibly swap them before sending them to the ALU or memory unit to reduce hazards Need extra HW OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 31 31

32 Super-Scalar MIPS Need extra hardware: Separate ALU for address calculation M ux M ux 4 ALU PC Instruction memory Registers M ux Write data Data memory Sign extend Sign extend ALU Address M ux OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 32

33 Super-Scalar MIPS Example Example(page 513) Loop: lw addu sw addi bne $t0, 0($s1) $t0, $t0,$s2 $t0, 0($s1) $s1, $s1,-4 $s1, $zero, Loop Assumption: $s1 contains +16 => Loop iterates 4 times 5 instructions (each needs 4 cycles) Original number of cycles needed = 5 * 4 =20 cycles Exercise: Draw the pipeline for the 4 iterations OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 33

34 Super-Scalar MIPS Example When adding another ALU => CPI should be ~.5 4 cycles per loop iteration Number of cycles = 4 * 4 = 16 => CPI (Performance) = 16/20 =0.8 => CPI value far from optimal ALU / Branch Data Transfer Instruction cycle Loop: lw $t0, 0($s1) 1 addi $s1, $s1, -4 2 addu $t0, $t0, $s2 3 bne $s1, $zero, Loop sw, $t0, 4($s1) 4 OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 34

35 Super-Scalar MIPS Loop Unrolling Example Multiple copies of the body of the loop are made Different iterations are scheduled together Code in Super-scalar MIPS with loop unrolling: (4 copies of loop body) 12 out of 14 instructions work in super-scalar mode Total number of cycles = 8 CPI (Performance) = 8/20=0.4 => Better performance OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 35

36 Loop Unrolling Example Loop: addi $s1, $s1, -16 lw $t0, 0($s1) 1 lw $t1, 12($s1) 2 addu $t0, $t0, $s2 lw $t2, 8($s1) 3 addu $t1, $t1, $s2 lw $t3, 4($s1) 4 addu $t2, $t2, $s2 sw, $t0, 0($s1) 5 addu $t3, $t3, $s2 sw, $t1, 12($s1) 6 sw, $t2, 8($s1) 7 bne $s1, $zero, Loop sw, $t3, 4($s1) 8 OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 36

37 Dynamic Pipelining Later instructions are executed while waiting for stall to be resolved Pipeline divided into 3 major units Instruction fetch & issue unit Send instructions in order Execution units Can execute in parallel (or out-of-order) Commit Unit Send instructions out in order again OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 37 37

38 Dynamic Pipelining Instruction fetch and decode unit In-order issue R eservation station R eservation station R eservation station R eservation station Execution Unit F unctional units Integer Integer F loating point Load/ Store O ut-of-order execute In-order commit Commit unit OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 38

39 Dynamic Pipelining Instruction fetch & issue unit: Fetches instructions Decodes instructions Send instruction to the corresponding functional unit Execution unit: 5-10 functional unit to hold operands & operators Each functional unit has a unit buffer (reservation station) When all operands are in the buffer & functional unit is ready, result is calculated Commit Unit: Decides when it is safe to put result back into register file or memory OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 39

40 Dynamic Pipelining The hardware performs the scheduling HW tries to find instructions to execute Out of order or parallel execution is possible Speculative execution: Combining dynamic scheduling & branch prediction OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 40

41 Dynamic Pipelining All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: Branch history table Compiler technology is important as well as HW OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 41 41

42 Pentium Pro & PowerPC 604 pipeline organization PC Instruction cache Data cache Branch prediction Instruction queue Decode/dispatch unit Register file Reservation station Reservation station Reservation station Reservation station Reservation station Reservation station Branch Integer Integer Floating point Store Complex integer Load Load/ store Commit unit Reorder buffer OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 42

43 Summary Pipelining is a fundamental concept multiple steps using distinct resources Utilize capabilities of the Datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/sink) detect and resolve hazards OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 43

44 G4 Processor OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 44

45 OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 45

46 Athlon Processor OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 46

47 OCC - CS/CIS CS116-Ch00-Orientation 1998 Morgan Kaufmann Publishers (Augmented 1998 Morgan & Modified Kaufmann by M.Malaty Publishers ( and Augmented M. Beers) & Modified by M.Malaty) 47 47

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4 IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction

More information

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining: Overview CPSC 252 Computer Organization Ellen Walker, Hiram College Pipelining the Wash Divide into 4 steps: Wash, Dry, Fold, Put Away Perform the steps in parallel Wash 1 Wash 2, Dry 1 Wash

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Static, multiple-issue (superscaler) pipelines

Static, multiple-issue (superscaler) pipelines Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions Tutorial Questions 2. [AY2014/5 Semester 2 Exam] Refer to the following MIPS program: # register $s0 contains a 32-bit

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4 12 1 CMPE110 Fall 2006 A. Di Blas 110 Fall 2006 CMPE pipeline concepts Advanced ffl ILP ffl Deep pipeline ffl Static multiple issue ffl Loop unrolling ffl VLIW ffl Dynamic multiple issue Textbook Edition:

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions? Basic Instruction Timings Pipelining 1 Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction

More information

Lecture 15: Pipelining. Spring 2018 Jason Tang

Lecture 15: Pipelining. Spring 2018 Jason Tang Lecture 15: Pipelining Spring 2018 Jason Tang 1 Topics Overview of pipelining Pipeline performance Pipeline hazards 2 Sequential Laundry 6 PM 7 8 9 10 11 Midnight Time T a s k O r d e r A B C D 30 40 20

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

DEE 1053 Computer Organization Lecture 6: Pipelining

DEE 1053 Computer Organization Lecture 6: Pipelining Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add; CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 5 th Edition Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 20 SE-273: Processor Design Courtesy: Prof. Vishwani Agrawal

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

Chapter 4. The Processor. Jiang Jiang

Chapter 4. The Processor. Jiang Jiang Chapter 4 The Processor Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 4 The Processor 2 Introduction CPU performance

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information