Processor Design Pipelined Processor. Hung-Wei Tseng

Size: px
Start display at page:

Download "Processor Design Pipelined Processor. Hung-Wei Tseng"

Transcription

1 Processor Design Pipelined Processor Hung-Wei Tseng

2 Pipelining 7

3 Pipelining Break up the logic with isters into pipeline stages Each stage can act on different instruction/data States/Control signals of instructions are hold in isters latch latch 8

4 Pipelining cycle # cycle #2 cycle #3 cycle #4 cycle #5 After the 5th cycle, the processor can do 5 instructions in parallel 9

5 Pipelining cycle #6 cycle #7 cycle #8 cycle #9 cycle # The processor can complete instruction each cycle CPI == if everything works perfectly!

6 Single-cycle v.s. pipeline v.s.

7 Cycle time of a pipeline processor Critical path is the longest possible delay between two registers in a design. The critical path sets the cycle time, since the cycle time must be long enough for a signal to traverse the critical path. change performance Lengthening or shortening non-critical paths does not Ideally, all paths are about the same length 3

8 Designing a 5-stage pipeline processor for MIPS 5

9 Basic steps of execution Instruction fetch: where? instruction memory Decode: What s the instruction? Where are the operands? registers Execute ALUs Memory access data memory Where is my data? Where to put the result Processor Write back registers 8bf94: 8 8 Determine the next PC 8bf98: c bf9c: instruction memory ALU PC R R R2... R3 registers 27a3: fbb27 ldah gp,5(t2) 27a34: 59cbd23 lda gp,-2552(gp) 27a38: 5d24 ldah t,(gp) 27a3c: bd24 ldah t4,(gp) 27a4: 2ca422a ldl t,-2358(t) 27a44: 32e4 beq t,27a94 27a48: 3d24 ldah t,(gp) 27a4c: 2ca4e2b3 stl zero,-2358(t) 8bf94: 8 8 8bf98: c2f bf9c: 8 8 8bf9: c2f data memory 8bf9: c2e

10 Pipeline a MIPS processor Instruction Fetch from instruction memory Decode Instruction Fetch () Figure out the incoming instruction? Instruction Decode () Fetch the operands from the registers Execution Perform ALU functions Memory access /write data memory Write back results to registers Write to the register file Execution () Memory Access () Write Back () 7

11 PC From single-cycle to pipeline Instruction Fetch Instruction Decode Execution PCSrc = Branch & Zero PCSrc Memory Access Write Back Control 4 Address Add Instruc(on Memory inst[3:] inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg inst[5:] RegDst Data 2 Write Data 6 signextend 32 ALUSrc Shi> le> 2 Zero ALU ALUop Add Address MemWrite Write Data Data Memory Mem Data MemtoReg / /EX EX/ / Will this work? 8

12 PC Pipelined processor PCSrc Control 4 Address Add Instruc(on Memory add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$,$ sw $, ($2) inst[3:] inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg inst[5:] RegDst Data 2 Write Data 6 signextend 32 ALUSrc Shi> le> 2 Zero ALU ALUop Add Address MemWrite Write Data Data Memory Mem Data MemtoReg / /EX EX/ / 9

13 PC Pipelined processor PCSrc Control 4 Address Add Instruc(on Memory add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$,$ sw $, ($2) inst[3:] inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg inst[5:] RegDst Data 2 Write Data 6 signextend 32 ALUSrc Shi> le> 2 Zero ALU ALUop Add Address MemWrite Write Data Data Memory Mem Data MemtoReg / /EX EX/ / 2

14 PC Pipelined processor PCSrc 4 Address Add Instruc(on Memory add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$,$ sw $, ($2) inst[3:] Control inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg inst[5:] RegDst Data 2 Write Data 6 signextend 32 ME EX ALUSrc Shi> le> 2 Zero ALU ALUop Add Address MemWrite Write Data Data Memory Mem Data MemtoReg / /EX EX/ / Where can I find these? ME 2

15 PC Pipelined processor PCSrc 4 Address Add Instruc(on Memory add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$,$ sw $, ($2) inst[3:] Control inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg inst[5:] RegDst Data 2 Write Data 6 signextend 32 ME EX ALUSrc Shi> le> 2 Zero ALU ALUop Add Address MemWrite Write Data Data Memory Mem Data MemtoReg / /EX EX/ / ME 22

16 PC Pipelined processor PCSrc Is this right? RegWrite 4 Address Add Instruc(on Memory add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$,$ sw $, ($2) inst[3:] Control inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg inst[5:] Data 2 RegDst Write Data 6 signextend 32 ME EX ALUSrc Shi> le> 2 Zero ALU ALUop Add Address MemWrite Write Data Data Memory Mem Data MemtoReg / /EX EX/ / ME 23

17 PC Pipelined processor 4 PCSrc Address Add Instruc(on Memory inst[3:] / /EX EX/ / inst[5:] Control inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 signextend 32 ME EX ALUSrc Shi> le> 2 RegDst Zero ALU ALUop Add ME Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg 24

18 PC 5-stage pipelined processor 4 PCSrc Address Add Instruc(on Memory inst[3:] / /EX EX/ / inst[5:] Control inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 signextend 32 ME EX ALUSrc Shi> le> 2 RegDst Zero ALU ALUop Add ME Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg 25

19 Simplified pipeline diagram Use symbols to represent the physical resources with the abbreviations for pipeline stages.,,,, Horizontal axis represent the timeline, vertical axis for the instruction stream Example: add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$,$ sw $, ($2) 26

20 Pipeline hazards 28

21 Pipeline hazards Even though we perfectly divide pipeline stages, it s still hard to achieve CPI ==. Pipeline hazards: Structural hazard The hardware does not allow two pipeline stages to work concurrently Data hazard A later instruction in a pipeline stage depends on the outcome of an earlier instruction in the pipeline Control hazard The processor is not clear about what s the next instruction to fetch 29

22 Can we get the right result? Given the current 5-stage pipeline, how many of the following MIPS code can work correctly? a: b: c: d: e: add $, $2, $3 lw $4, ($) sub $6, $7, $8 sub $9,$,$ sw $, ($2) I II III IV add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9, $, $ sw $, ($2) add $, $2, $3 lw $4, ($5) bne $, $7, L sub $9,$,$ sw $, ($2) add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$,$ sw $, ($2) b cannot get $ produced by a before Data hazard both a and d are accessing $ at 5th cycle Structural hazard We don t know if d & e will be executed or not Control hazard 3

23 Structural hazard 3

24 Structural hazard The hardware cannot support the combination of instructions that we want to execute at the same cycle two instructions competing the same register. The original pipeline incurs structural hazard when Solution: write early, read late Writes occur at the clock edge and complete long enough before the end of the clock cycle. This leaves enough time for outputs to settle for reads The revised register file is the default one from now! add $, $2, $3 lw $4, ($5) sub $6, $7, $8 sub $9,$, $ sw $, ($2) 33

25 Structural hazard The design of hardware causes structural hazard We need to modify the hardware design to avoid structural hazard 35

26 Data hazard 36

27 Data hazard When an instruction in the pipeline needs a value that is not available Data dependences The output of an instruction is the input of a later instruction May result in data hazard if the later instruction that consumes the result is still in the pipeline 38

28 Sol. of data hazard I: Stall When the source operand of an instruction is not ready, stall the pipeline Suspend the instruction and the following instruction Allow the previous instructions to proceed This introduces a pipeline bubble: a bubble does nothing, propagate through the pipeline like a nop instruction Disable the PC update How to stall the pipeline? Disable the isters on the earlier pipeline stages When the stall is over, re-enable the isters, PC updates 4

29 PC PCWrite PCSrc 4 Address Hazard detection & stall hazard detection unit Add Instruc(on Memory /Write inst[3:] Check if the destination register of EX == source register of the instruction in / /EX EX/ / inst[5:] Control inst[3:25],inst[5:] inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 RegWrite /EX.Mem signextend 32 ME EX ALUSrc Shi> le> 2 RegDst Zero ALU ALUop Insert a noop if we need to stall Add ME Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg Check if the destination register of == source register of the instruction in 4

30 Performance of stall Insert a noop in stage Insert another noop in stage, previous noop goes to stage add $, $2, $3 lw $4, ($) sub $5, $2, $4 sub $, $3, $ sw $, ($5) 5 cycles! CPI == 3 (If there is no stall, CPI should be just!) 42

31 Sol. of data hazard II: Forwarding The result is available after and stage, but publicized in! The data is already there, we should use it right away! Also called bypassing add $, $2, $3 lw $4, ($) sub $5, $2, $4 sub $, $3, $ sw $, ($5) We can obtain the result here! 43

32 Sol. of data hazard II: Forwarding Take the values, where ever they are! add $, $2, $3 lw $4, ($) sub $5, $2, $4 sub $, $3, $ sw $, ($5) cycles! CPI == 2 (Not optimal, but much better!) 44

33 When can/should we forward data? If the instruction entering the stage consumes a result from a previous instruction that is entering stage or stage A source of the instruction entering stage is the destination of an instruction entering / stage The previous instruction must be an instruction that updates register file 46

34 PC 4 PCSrc Address Forwarding in hardware Add Instruc(on Memory inst[3:] / /EX EX/ / inst[5:] revious instruction (Ins#) urernt instruction (Ins#2) How about load? Control inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 Rs of Ins#2 Rt of Ins#2 signextend 32 ME EX Control of Ins#2 ForwardA Shi> le> 2 RegDst ForwardB ForwardA ForwardB Zero ALU ALUop Add forwarding unit ALUSrc ME Control of Ins# Address MemWrite Write Data Data Memory Mem Data MemtoReg 47 RegWrite ALU result of Ins# destination of Ins#

35 PC 4 PCSrc Address Forwarding in hardware Add Instruc(on Memory inst[3:] / /EX EX/ / inst[5:] Control inst[3:25],inst[5:] RegWrite inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 signextend 32 ME EX ForwardA Shi> le> 2 RegDst ForwardB Zero ALU ALUop Add ME ALU/ result of Ins# Control of Ins# Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg ForwardA ForwardB forwarding unit ALUSrc Rd of Ins# 48

36 There is still a case that we have to stall... Revisit the following code: add $, $2, $3 lw $4, ($) sub $5, $2, $4 sub $, $3, $ sw $, ($5) lw generates result at stage, we have to stall If the instruction entering stage depends on a load instruction that does not finish its stage yet, we have to stall! We call this hazard detection We need to know the following:. If an instruction in EX/ updates a register (RegWrite) 2. If an instruction in EX/ reads memory (Mem) 3. If the destination register of EX/ is a source of /EX (rs, rt of /EX == rt of EX/ #) 49

37 PC Hazard detection with forwarding hazard detection unit PCWrite PCSrc 4 Address Add Instruc(on Memory /Write inst[3:] / /EX EX/ / inst[5:] Control inst[3:25],inst[5:] inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 RegWrite /EX.Mem signextend 32 ME EX ForwardA Shi> le> 2 RegDst ForwardB Zero ALU ALUop Add ME Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg ForwardA ForwardB forwarding unit ALUSrc 5

38 Control hazard 5

39 Control hazard The processor cannot determine the next PC to fetch LOOP: lw $t3, ($s) addi $t, $t, add $v, $v, $t3 addi $s, $s, 4 bne $t, $t, LOOP lw $t3, ($s) stall 7 cycles per loop 54

40 Reducing the overhead of control hazards 55

41 Solution I: Delayed branches An agreement between ISA and hardware Branch delay slots: the next N instructions after a branch are always executed Compiler decides the instructions in branch delay slots Reordering the instruction cannot affect the correctness of the program MIPS has one branch delay slot Good Simple hardware Bad N cannot change Sometimes cannot find good candidates for the slot 56

42 Solution I: Delayed branches LOOP: lw $t3, ($s) addi $t, $t, add $v, $v, $t3 addi $s, $s, 4 bne $t, $t, LOOP branch delay slot LOOP: lw $t3, ($s) addi $t, $t, add $v, $v, $t3 bne $t, $t, LOOP addi $s, $s, 4 lw $t3, ($s) stall 6 cycles per loop 57

43 Solution II: always predict not-taken Always predict the next PC is PC+4 LOOP: lw $t3, ($s) addi $t, $t, add $v, $v, $t3 addi $s, $s, 4 bne $t, $t, LOOP sw $v, ($s) add $t4, $t3, $t5 nop nop nop nop nop lw $t3, ($s) If branch is not taken: no stalls! If branch is taken: doesn t hurt! 7 cycles per loop flush the instructions fetched incorrectly 58

44 PC Solution III: always predict taken PCWrite PCSrc 4 Address Add Instruc(on Memory /Write inst[3:] / /EX EX/ / inst[5:] hazard detection unit Control inst[3:25],inst[5:] inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 RegWrite /EX.Mem signextend 32 ME EX ForwardA Shi> le> 2 RegDst ForwardB Zero ALU ALUop Add ME Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg ForwardA ForwardB forwarding unit ALUSrc 6

45 PC Solution III: always predict taken PCWrite PCSrc 4 Address Add Instruc(on Memory /Write inst[3:] / /EX EX/ / inst[5:] hazard detection unit Control inst[3:25],inst[5:] inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 Shi> le> 2 Add RegWrite signextend 32 /EX.Mem ME EX ForwardA RegDst ForwardB Zero ALU ALUop ME Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg Still have to stall cycle ForwardA ForwardB forwarding unit ALUSrc 62

46 PC Solution III: always predict taken PCWrite PCSrc 4 Address Add Instruc(on Memory /Write inst[3:] / /EX EX/ / inst[5:] hazard detection unit Control inst[3:25],inst[5:] inst[25:2] Reg Register inst[2:6] Reg 2 Data File Write Reg Data 2 Write Data 6 Shi> le> 2 Add RegWrite signextend 32 /EX.Mem ME EX ForwardA RegDst ForwardB Zero ALU ALUop ME Address MemWrite Write Data Data Memory Mem Data RegWrite MemtoReg Branch Target Buffer Consult BTB in fetch stage ForwardA ForwardB forwarding unit ALUSrc 63

47 PC Branch Target Buffer branch PC target address or target instruction Branch Target Buffer 64

48 Solution III: always predict taken Always predict taken with the help of BTB LOOP: lw $t3, ($s) addi $t, $t, add $v, $v, $t3 addi $s, $s, 4 bne $t, $t, LOOP lw $t3, ($s) addi $t, $t, add $v, $v, $t3 5 cycles per loop (CPI ==!!!) But what if the branch is not always taken? 65

49 Dynamic branch prediction 68

50 -bit counter Predict this branch will go the same way as the result of the last time this branch executed for taken, for not takens PC = x442 x442 x x4464 x Taken! x4578 x8485a x4c x Branch Target Buffer 69

51 2-bit counter A 2-bit counter for each branch taken Predict taken if the counter value >= 2 If the prediction in taken states, fetch from target PC, otherwise, use PC+4 Taken 3 () not taken taken Taken 2 () PC= x442 Not Taken () taken not taken taken Not Taken () not taken x442 x x4464 x x4578 x8485a Taken! not taken x4c x Branch Target Buffer 7

52 Performance of 2-bit counter 2-bit state machine for each branch taken for(i = ; i < ; i++) {! sum += a[i]; } Taken 3 () Not Taken () not taken not taken taken taken not taken taken Taken 2 () Not Taken () not taken 9% accuracy! i state predict actual T T 2 T T 3 T T 4-9 T T T NT Application: 8% ALU, 2% Branch, and branch resolved in EX stage, average CPI? +2%*(-9%)*2 =

53 Make the prediction better Consider the following code: i = ; do { if( i % 3!= ) // Branch Y, taken if i % 3 == a[i] *= 2; a[i] += i; } while ( ++i < ) // Branch X Can we capture the pattern? i branch result Y T X T Y NT X T 2 Y NT 2 X T 3 Y T 3 X T 4 Y NT 4 X T 5 Y NT 5 X T 6 Y T 6 X T 7 Y NT 74

54 Predict using history Instead of using the PC to choose the predictor, use a bit vector (global history register, GHR) made up of the previous branch outcomes. Each entry in the history table has its own counter. n-bit GHR index = (T, NT, T) 2 n entries history table Taken! 75

55 Performance of global history predictor Consider the following code: i = ; do { if( i % 3!= ) // Branch Y, taken if i % 3 == a[i] *= 2; a[i] += i; // Branch Y } while ( ++i < ) // Branch X Assume that we start with a 4- bit GHR=, all counters are. Nearly perfect after this i? GHR BHT prediction actual New BHT Y T T X T T Y T NT X T T 2 Y T NT 2 X T T 3 Y T T 3 X T T 4 Y T NT 4 X T T 5 Y NT NT 5 X T T 6 Y T T 6 X T T 7 Y NT NT 7 X T T 8 Y NT NT 8 X T T 9 Y T T 9 X T T 76 Y NT NT

56 Branch prediction and modern processors 79

57 Deeper pipeline Higher frequencies by shortening the pipeline stages performance with frequencies Potentially higher power consumption as dynamic/active power = acv 2 f Higher marketing values since consumers usually link If the execution time is better, still consume less energy 8

58 Case Study 8

59 Intel Pentium 4 Microarch. 82

60 Intel Pentium 4 Very deep pipeline: in order to achieve high frequency! (start from.5ghz) 2 stages in Netburst TC Nxt IP TC Fetch Drive Alloc Rename Que 3 stages in Prescott Sch 3W (3.6GHz, 65nm) Reference The Microarchitecture of the Pentium 4 Processor Sch 2 Sch 3 Disp 4 Disp 5 RF 6 RF 7 Ex 8 Flgs 9 Br Ck 2 Drive 83

61 AMD Athlon 64 84

62 2 stage pipeline AMD Athlon 64 Inst. Addr Decode 2 Inst Mem 3 Inst. Byte Pick Inst. Dbl. & Pack 7 and Pack 8 Dispatch 9 Scheduling Execution D-Cache Address 2 D-cache Access 89W TDP (Opteron 2.2GHz 9nm) 85

63 Demo revisited Why the sorting the array speed up the code despite the increased instruction count? if(option) std::sort(data, data + arraysize); for (unsigned i = ; i < ; ++i) { int threshold = std::rand(); for (unsigned i = ; i < arraysize; ++i) { if (data[i] >= threshold) sum ++; } } 88

64 Deep pipelining and data hazards 89

65 Data hazard revisited How many cycles it takes to execute the following code? Draw the pipeline execution diagram assume that we have full data forwarding. lw $t, ($a) lw $a, ($t) bne $a, $zero, EX 9 cycles 9

66 Intel s latest SkyLake BPU 32K L Instruction Cache MSROM 4 uops/cycle 6 uops/cycle Decoded Icache (DSB) Instruction Decode Queue (Q,, or micro-op queue) 5 uops/cycle Legacy Decode Pipeline Allocate/Rename/Retire/MoveElimination/ZeroIdiom Port Scheduler Port Port 5 Port 6 Port 2 LD/STA 256K L2 Cache (Unified) Int ALU, Vec FMA, Vec MUL, Vec Add, Vec ALU, Vec Shft, Divide, Branch2 Int ALU, Fast LEA, Vec FMA, Vec MUL, Vec Add, Vec ALU, Vec Shft, Int MUL, Slow LEA Int ALU, Fast LEA, Vec SHUF, Vec ALU, CVT Int ALU, Int Shft, Branch, Port 3 LD/STA Port 4 STD Port 7 STA 32K L Data Cache Good reference for intel microarchitectures: 92

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Processor Design Pipelined Processor (II) Hung-Wei Tseng Processor Design Pipelined Processor (II) Hung-Wei Tseng Recap: Pipelining Break up the logic with pipeline registers into pipeline stages Each pipeline registers is clocked Each pipeline stage takes one

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

Virtual memory. Hung-Wei Tseng

Virtual memory. Hung-Wei Tseng Virtual memory Hung-Wei Tseng Why virtual memory How VM works VM and cache Outline 2 Virtual memory 3 Scenario I An application is design on machine A with memory size X. Can we safely execute the same

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Pipeline design. Mehran Rezaei

Pipeline design. Mehran Rezaei Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Virtual memory. Hung-Wei Tseng

Virtual memory. Hung-Wei Tseng Virtual memory Hung-Wei Tseng Why virtual memory How VM works VM and cache Outline 4 Virtual memory 5 Scenario I An application is design on machine A with memory size X. Can we safely execute the same

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl. Lecture 4: Review of MIPS Instruction formats, impl. of control and datapath, pipelined impl. 1 MIPS Instruction Types Data transfer: Load and store Integer arithmetic/logic Floating point arithmetic Control

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Multi-threaded processors. Hung-Wei Tseng x Dean Tullsen

Multi-threaded processors. Hung-Wei Tseng x Dean Tullsen Multi-threaded processors Hung-Wei Tseng x Dean Tullsen OoO SuperScalar Processor Fetch instructions in the instruction window Register renaming to eliminate false dependencies edule an instruction to

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

Computer Organization and Structure

Computer Organization and Structure Computer Organization and Structure 1. Assuming the following repeating pattern (e.g., in a loop) of branch outcomes: Branch outcomes a. T, T, NT, T b. T, T, T, NT, NT Homework #4 Due: 2014/12/9 a. What

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

CENG 3420 Lecture 06: Pipeline

CENG 3420 Lecture 06: Pipeline CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2019 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3. Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Lecture 9 Pipeline and Cache

Lecture 9 Pipeline and Cache Lecture 9 Pipeline and Cache Peng Liu liupeng@zju.edu.cn 1 What makes it easy Pipelining Review all instructions are the same length just a few instruction formats memory operands appear only in loads

More information

CSEN 601: Computer System Architecture Summer 2014

CSEN 601: Computer System Architecture Summer 2014 CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009 ECS 154B Computer Architecture II Spring 2009 Pipelining Datapath and Control 6.2-6.3 Partially adapted from slides by Mary Jane Irwin, Penn State And Kurtis Kredo, UCD Pipelined CPU Break execution into

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon Pipelined Processor Design EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon Concept Identification of Pipeline Segments Add Pipeline Registers Pipeline Stage Control

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos Pipelined datapath Staging data b 55 Life of a load in the MIPS pipeline Note: both the instruction and the incremented PC value need to be forwarded in the next stage (in case the instruction is a beq)

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

CSE 378 Midterm 2/12/10 Sample Solution

CSE 378 Midterm 2/12/10 Sample Solution Question 1. (6 points) (a) Rewrite the instruction sub $v0,$t8,$a2 using absolute register numbers instead of symbolic names (i.e., if the instruction contained $at, you would rewrite that as $1.) sub

More information

Pipelining is Hazardous!

Pipelining is Hazardous! Pipelining is Hazardous! Hazards are situations where pipelining does not work as elegantly as we would like Three kinds Structural hazards -- we have run out of a hardware resource. Data hazards -- an

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Recall. ISA? Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding how is it decoded? Location of operands and

More information

Chapter 5 Solutions: For More Practice

Chapter 5 Solutions: For More Practice Chapter 5 Solutions: For More Practice 1 Chapter 5 Solutions: For More Practice 5.4 Fetching, reading registers, and writing the destination register takes a total of 300ps for both floating point add/subtract

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

CS232 Final Exam May 5, 2001

CS232 Final Exam May 5, 2001 CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State

More information

DEE 1053 Computer Organization Lecture 6: Pipelining

DEE 1053 Computer Organization Lecture 6: Pipelining Dept. Electronics Engineering, National Chiao Tung University DEE 1053 Computer Organization Lecture 6: Pipelining Dr. Tian-Sheuan Chang tschang@twins.ee.nctu.edu.tw Dept. Electronics Engineering National

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

Chapter 5: The Processor: Datapath and Control

Chapter 5: The Processor: Datapath and Control Chapter 5: The Processor: Datapath and Control Overview Logic Design Conventions Building a Datapath and Control Unit Different Implementations of MIPS instruction set A simple implementation of a processor

More information

CS 351 Exam 2 Mon. 11/2/2015

CS 351 Exam 2 Mon. 11/2/2015 CS 351 Exam 2 Mon. 11/2/2015 Name: Rules and Hints The MIPS cheat sheet and datapath diagram are attached at the end of this exam for your reference. You may use one handwritten 8.5 11 cheat sheet (front

More information

What about branches? Branch outcomes are not known until EXE What are our options?

What about branches? Branch outcomes are not known until EXE What are our options? What about branches? Branch outcomes are not known until EXE What are our options? 1 Control Hazards 2 Today Quiz Control Hazards Midterm review Return your papers 3 Key Points: Control Hazards Control

More information

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PIPELINING: HAZARDS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission deadline: Jan. 30 th This

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

Chapter 4 (Part II) Sequential Laundry

Chapter 4 (Part II) Sequential Laundry Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30

More information

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions Tutorial Questions 2. [AY2014/5 Semester 2 Exam] Refer to the following MIPS program: # register $s0 contains a 32-bit

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours. This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in

More information

LECTURE 9. Pipeline Hazards

LECTURE 9. Pipeline Hazards LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which do not include hazards of any kind. Remember that

More information

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August Lecture 8: Control COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Datapath and Control Datapath The collection of state elements, computation elements,

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

Static, multiple-issue (superscaler) pipelines

Static, multiple-issue (superscaler) pipelines Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue

More information

ECE Exam II - Solutions November 8 th, 2017

ECE Exam II - Solutions November 8 th, 2017 ECE 3056 Exam II - Solutions November 8 th, 2017 1. (15 pts) To the base pipeline we add data forwarding to EX, data hazard detection and stall generation, and branches implemented in MEM and predicted

More information

Quiz for Chapter 4 The Processor3.10

Quiz for Chapter 4 The Processor3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [6 points] For the MIPS datapath shown below, several lines

More information

Final Exam Spring 2017

Final Exam Spring 2017 COE 3 / ICS 233 Computer Organization Final Exam Spring 27 Friday, May 9, 27 7:3 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of Petroleum & Minerals

More information

EE 457 Unit 6a. Basic Pipelining Techniques

EE 457 Unit 6a. Basic Pipelining Techniques EE 47 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink bottling plant Filling the bottle = 3 sec. Placing the cap = 3 sec. Labeling = 3 sec. Would you want Machine = Does

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information