Computer Architectures

Size: px
Start display at page:

Download "Computer Architectures"

Transcription

1 Computer Architectures Pipelined instruction execution Hazards, stages balancing, super-scalar systems Pavel Píša, Michal Štepanovský, Miroslav Šnorek Main source of inspiration: Patterson Czech Technical University in Prague, Faculty of Electrical Engineering English version partially supported by: European Social Fund Prague & EU: We invests in your future. AEB36APO Computer Architectures Ver..

2 Motivation AMD Bulldozer 5h (FX, Opteron) - 2 AEB36APO Computer Architectures 2

3 Motivation Intel Nehalem (Core i7) - 28 AEB36APO Computer Architectures 3

4 The goal of today lecture Convert/extend CPU presented in the lecture 2 to the pipelined CPU design. The following instructions are considered for our CPU design: add, sub, and, or, slt, addi, lw, sw and beq Typ 3 R opcode(6), 3:26 rs(5), 25:2 rt(5), 2:6 rd(5), 5: shamt(5) funct(6), 5: I opcode(6), 3:26 rs(5), 25:2 rt(5), 2:6 immediate (6), 5: J opcode(6), 3:26 address(26), 25: AEB36APO Computer Architectures 4

5 Single cycle CPU together with memories 3:26 5: Control Unit Opcode Funct MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite PC PC A RD Instr 25:2 Instr. 2:6 4 PCPlus4 2:6 5: 5: WE3 A RD A2 A3 WD3 RD2 Reg. File Sign Ext Rt Rd SignImm SrcA Zero WE ALU A RD SrcB AluOut Data ReadData WriteData WD WriteReg <<2 PCBranch Result AEB36APO Computer Architectures From lecture 2 5

6 Single cycle CPU performance: IPS = IC / T = IPC avg.f CLK What is the maximal possible frequency of this CPU? It is given by latency on the critical path it is lw in our case: T c = t PC t Mem t RFread t ALU t Mem t Mux t RFsetup PC PC 4 PCPlus4 A RD Instr. Instr 25:2 5: 5: WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 2:6 Sign Ext Rt Rd SignImm SrcA Zero WE ALU A RD SrcB AluOut Data ReadData WriteData WD WriteReg <<2 PCBranch Result AEB36APO Computer Architectures From lecture 2 6

7 Single cycle CPU throughput: IPS = IC / T = IPC avg.f CLK Tc = tpc t Mem t RFread t ALU t Mem t Mux t RFsetup Consider following parameters t PC = 3 ns t Mem = 3 ns t RFread = 5 ns t ALU = 2 ns t Mux = 2 ns t RFsetup = 2 ns Then Tc = 2 ns --> f CLK max = 98 khz, IPS = 98e3 = 98 instructions per second AEB36APO Computer Architectures From lecture 2 7

8 Pipelined instructions execution Suppose that instruction execution can be divided into 5 stages: IF ID EX MEM WB IF Instruction Fetch, ID Instruction decode (and Operands Fetch), EX Execute, MEM Access, WB Write Back and = max { i } k i=, where i is time required for signal propagation (propagation delay) through i-th stage. IF setup PC for memory and fetch pointed instruction. Update PC = PC4 ID decode the opcode and read registers specified by instruction, check for equality (for possible beq instruction), sign extend offset, compute branch target address for branch case (this is means to extend offset and add PC) EX execute function/pass register values through ALU MEM read/write main memory for load/store instruction case WB write result into RF for instructions of register-register class or instruction load (result source is ALU or memory) AEB36APO Computer Architectures 8

9 Instruction-level parallelism - pipelining IF I I2 I3 I4 I5 I6 I7 I8 I9 I ID I I2 I3 I4 I5 I6 I7 I8 I9 EX I I2 I3 I4 I5 I6 I7 I8 MEM I I2 I3 I4 I5 I6 I7 ST I I2 I3 I4 I5 I6 The time to execute n instructions in the k-stage pipeline: T k = k. (n ) Speedup: S k = T nk τ = T k kτ (n )τ lim S k =k n čas Prerequisite: pipeline is optimally balanced, circuit can arbitrarily divided AEB36APO Computer Architectures 9

10 Instruction-level parallelism - pipelining Does not reduce the execution time of individual instructions, effect is just the opposite... Hazards: structural (resolved by duplication), data (result of data dependencies: RAW, WAR, WAW) control (caused by instructions which change PC)... Hazard prevention can result in pipeline stall or pipeline flush Remark : Deeper pipeline (more stages) results in shorter sequences of gates in each stage which enables to increase the operating frequency of the processor, but more stages means higher overhead (demand to arrange better instructions into pipeline and result in more significant lag in the case of stall or pipeline flush) AEB36APO Computer Architectures

11 Instruction-level parallelism Semantics violations Data hazard: Add writes new value to R ADD R,R2,R3 SUB R4,R,R3 flow of instructions and expected effect Control hazard: BEQZ R3, M ADD R6,R,R2 instruction 3 instruction 4 M: ADD R4,R6,R7 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB SUB reads incorrect value from R Condition and new PC evaluation IF ID EX MEM WB PC set to branch target IF ID EX MEM WB IF ID EX MEM WB Should be these instructions fetched (and executed then)? AEB36APO Computer Architectures

12 Non-pipelined execution PC PC Instr 25:2 A RD Instr. 2:6 4 PCPlus4 2:6 5: 5: WE3 A RD A2 A3 WD3 RD2 Reg. File Sign Ext Rt Rd SignImm SrcA Zero WE ALU A RD SrcB AluOut Data ReadData WriteData WD WriteReg <<2 PCBranch Result AEB36APO Computer Architectures From lecture 2 2

13 Pipelined execution AluOutW PC PC Instr 25:2 A RD Instr. 2:6 4 PCPlus4F 2:6 5: 5: WE3 A RD A2 A3 WD3 PCPlus4D RD2 Reg. File Sign Ext Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM Data ReadData WriteDataE WriteRegE WriteDataM WD WriteRegM WriteRegW <<2 PCBranch Fetch Decode Execute WriteBack AEB36APO Computer Architectures 3

14 Pipelined execution 3:26 5: Control Unit Opcode Funct MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite AluOutW PC PC Instr 25:2 A RD Instr. 2:6 4 PCPlus4F 2:6 5: 5: WE3 A RD A2 A3 WD3 PCPlus4D RD2 Reg. File Sign Ext Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM Data ReadData WriteDataE WriteRegE WriteDataM WD WriteRegM WriteRegW <<2 PCBranch Fetch Decode Execute WriteBack AEB36APO Computer Architectures 4

15 The same design but drawn scaled down Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 2:6 5: 5: SignImmD Sign Ext RtD RdD RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW AEB36APO Computer Architectures 5

16 Cause of the data hazards Register File access from two pipeline stages (Decode, WriteBack) actual write occurs at the first half of the clock cycle, the read in the second half there is no hazard for sub $s input operand RAW (Read After Write) hazard and (or) requires $s in 3 (4) How can such hazard be prevented without pipeline throughput degradation? AEB36APO Computer Architectures 6

17 Forwarding to avoid data hazards If a result is available (computed) before subsequent instruction(s) requires the value then data hazard can be avoided by forwarding Hazard case is indicated when some of source registers in EX stage is the same as destination register in stage MEM or WB The register numbers are fed to the Hazard Unit The RegWrite signal from MEM and WB stage has to be monitored as well to check that register number on WriteReg lines takes effect lw / sw etc. AEB36APO Computer Architectures 7

18 CPU after previous design steps Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 2:6 5: 5: SignImmD Sign Ext RtD RdD RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW AEB36APO Computer Architectures 8

19 Data hazards solved by forwarding Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 9

20 Data hazard avoided by pipeline stall If subsequent instructions require result before it is available in CPU then the pipeline has to be stalled (stall state inserted) The stall is mean to solve hazard but affect system throughput Pipeline stages preceding that one which is affected by the hazard are stalled until all results required by subsequent instructions are available results are forwarded to the sink which required their value AEB36APO Computer Architectures 2

21 Data hazard avoided by pipeline stall The stall is realized by the holding content of the inter-stage registers (gating their clocks or blocking their latch enable signals) Results from colliding stages have to be discarded certain control signals in CPU (RF or memory write enable, branch gating) are reset (held low) Both is achieved by introduction of control signals to hold and/or reset inter-stages registers AEB36APO Computer Architectures 2

22 Processor design build till now Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 22

23 Processor with data hazards avoided by stall Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC EN PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F EN PCPlus4D PCBranchD CLR <<2 ResultW Stall F Stall D Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 23

24 Control hazards (branch and jump) Result is not known before 4 th cycle. Why? AEB36APO Computer Architectures 24

25 Control hazards better to know result earlier If the result of comparison can be evaluated in the 2 nd cycle misprediction penalty can be reduced But the processing of the comparison at earlier stage can induce new RAW hazards..!!! AEB36APO Computer Architectures 25

26 Resolve control hazards by early evaluate and flush PC EN PC 4 A RD Instruction PCPlus4F CLR EN Control unit 3:26 Op 5: Funct InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext <<2 PCPlus4D PCBranchD RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD = RsD RtD RdD PCSrcD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE SrcAE SignImmE SrcBE ALU WriteDataE WriteRegE 4: RegWriteM MemToRegM MemWriteM ALUOutM WriteDataM A RD Data WD WE RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: ResultW Stall F Stall D Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 26

27 PC EN PC 4 A Instruction Resolve RAW hazards by forwarding or stalling RD PCPlus4F CLR EN Control unit 3:26 Op 5: Funct InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext <<2 PCPlus4D PCBranchD RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD = RsD RtD RdD PCSrcD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE Stall SrcAE SignImmE SrcBE ALU WriteDataE WriteRegE 4: Forward / Stall RegWriteM MemToRegM MemWriteM ALUOutM WriteDataM A RD Data WD WE RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: No Action Required ResultW Stall F Stall D BranchD Forward BD Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 27

28 We are finished pipelined processor is designed PC EN PC 4 A RD Instruction PCPlus4F CLR EN Control unit 3:26 Op 5: Funct InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext <<2 PCPlus4D PCBranchD RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD = RsD RtD RdD PCSrcD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE SrcAE SignImmE SrcBE ALU WriteDataE WriteRegE 4: RegWriteM MemToRegM MemWriteM ALUOutM WriteDataM A RD Data WD WE RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: ResultW Stall F Stall D BranchD Forward BD Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 28

29 Pipelined CPU performance: IPS = IC / T = IPC avg.f CLK What is maximal acceptable frequency for the CPU? Which stage is the slowest one? The cycle time is determined by the slowest stage For our case: Tc = 3 ns --> khz If the pipeline fill overhead is neglected (i.e. no pipeline stalls and flushes are considered) then ideal IPC =. IPS = 3 333e3 = instructions per second Introduction of the 5-stage pipeline increases performance (throughput) / 98 = 3.4 times! (considering IPC=) AEB36APO Computer Architectures 29

30 What is result of the design? Return back to non-pipelined CPU version 4 3:26 5: PC PC Instr 25:2 A RD PCPlus4F Instr. 2:6 2:6 5: 5: Control Unit Opcode Funct WE3 A RD A2 RD2 A3 WD3 Reg. File Sign Ext PCPlus4D MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM Data ReadData WriteData WriteReg <<2 PCBranch WD AluOutW AEB36APO Computer Architectures 3

31 What is result of the design? Return back to non-pipelined CPU version A Instr. A Data WD RD WE RD 4 3:26 5: PC PC Instr 25:2 A RD PCPlus4F 2:6 2:6 5: 5: Control Unit Opcode Funct WE3 A RD A2 RD2 A3 WD3 Reg. File Sign Ext PCPlus4D MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM ReadData WriteData WriteReg <<2 WD PCBranch Control unit (control path) AluOutW Data/ALU (data path) AEB36APO Computer Architectures 3

32 What is result of the design? Processor Control unit PC A RD Instruction RD A PC Instr. Address for data Read/Write Data to Write A RD Data WD WE Write enable Read data Data-path (ALU, registers) RD A WD Address Results AEB36APO Computer Architectures 32

33 CPU design result pipelined version PC EN PC 4 A RD Instruction PCPlus4F EN InstrD Contr ol unit 3:26 Op 5: Funct 25:2 2:6 25:2 2:6 5: RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD 5: SignImmD Sign Ext <<2 PCPlus4D WE3 RD RD2 Reg. File A A2 A3 WD3 PCBranchD = PCSrcD RsD RtD RdD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE SrcAE SrcBE ALU SignImmE WriteDataE WriteRegE 4: RegWriteM MemToRegM MemWriteM WE ALUOutM A RD Data WriteDataM WD RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: ResultW Stall F Stall D BranchD Forward BD Hazard unit Forward AE Forward BE RegWriteM RegWrite W AEB36APO Computer Architectures 33

34 Pipelined CPU timing The timing/ac characteristics of synchronous sequential circuit : t setup inputs setup time t hold inputs hold time Signal integrity constrain for the setup time before the clock: Tc >= t pcq t pd t setup t pd combinatorial logic propagation delay AEB36APO Computer Architectures 34

35 Pipelined processor timing Constraint for the setup time (consider the clock distribution jitter): Tc >= t pcq t pd t setup t skew Clock distribution jitter is limiting factor, if it reaches or exceeds value of t pd (too deep pipeline / too many stages ) AEB36APO Computer Architectures 35

36 Pipeline stages balancing Linear pipelining: (applies to tree based adder, multiplier, (unrolled) iterative divider..) Balancing: the goal is to divide the processing into N stages in such way, that stage propagation delays are roughly the same The number of stages reflects preference of performance (throughput) versus latency. AEB36APO Computer Architectures 36

37 Superpipeline and beyond Not well balanced 5-stage pipeline: IM RF DM RF IF ID EX MEM WB Deeper pipeline is result of decomposing stages into more stages IM RF DM RF IF IS RF EX DF DS TC WB It allows CPU to work at higher frequencies but introduces many problems as well.. Complex forwarding, more pipeline stalls, hazards need to be solved by complex logic AEB36APO Computer Architectures 37

38 Typical pipeline depths in todays CPUs P5 (Pentium) : 5 P6 (Pentium 3): P6 (Pentium Pro): 4 NetBurst (Willamette, 8 nm) - Celeron, Pentium 4: 2 NetBurst (Northwood, 3 nm) - Celeron, Pentium 4, Pentium 4 HT: 2 NetBurst (Prescott, 9 nm) - Celeron D, Pentium 4, Pentium 4 HT, Pentium 4 ExEd: 3 NetBurst (Cedar Mill, 65 nm): 3 NetBurst (Presler 65 nm) - Pentium D: 3 Core : 4 Bonnell: 6 K7 Architecture - Athlon : -5 K8 - Athlon 64, Sempron, Opteron, Turion 64: 2-7 ARM 8-9: 5 ARM : 8 Cortex A7: 8- Cortex A8: 3 Cortex A5: 5-25 The Optimum Pipeline Depth for a Microprocessor: AEB36APO Computer Architectures 38

39 Branch stall discussion and delay slots The instruction memory read and fetch is expensive and result of condition evaluation in branch instructions (even worse target in indirect branch instructions) has to be evaluated before next fetch and execute. The stall state is waste of cycles. Options to use that cycle(s) are: Start fetch and execution of instruction(s) following branch and flush/discard results if it is resolved that it should not be executed Extend above by adding condition results/branch predictor (taken/not-taken) and branch target cache (BTB) Execute one or more instructions after branch unconditionally in (so called) delay slot Delay slots unconditional execution is common for many DSP (digital signal processor) and some RISC architectures (MIPS, SPARC) AEB36APO Computer Architectures 39

40 AEB36APO Computer Architectures 4

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.

Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues. Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,

More information

CHW 362 : Computer Architecture & Organization

CHW 362 : Computer Architecture & Organization CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali http://bu.edu.eg/staff/ahmedshalaby4# http://www.bu.edu.eg/staff/mona.abdelbaset Review: Instruction Formats R-Type

More information

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 17: Pipelining Issues Prof. Onur Mutlu ETH Zurich Spring 2017 28 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

CENG 5133 Computer Architecture Design Spring Sample Exam 2

CENG 5133 Computer Architecture Design Spring Sample Exam 2 CENG 533 Computer Architecture Design Spring 24 Sample Exam 2. (6 pt) Determine the propagation delay and contamination delay of the following circuit using the gate delays given below. Gate t pd (ps)

More information

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11

ENCM 501 Winter 2019 Assignment 6 for the Week of March 11 page of 8 ENCM 5 Winter 29 Assignment 6 for the Week of March Steve Norman Department of Electrical & Computer Engineering University of Calgary February 29 Assignment instructions and other documents

More information

Slide Set 7 for Lecture Section 01

Slide Set 7 for Lecture Section 01 Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017 ENCM 369 Winter

More information

Design of A Six-stage Pipelined MIPS Processor Based on FPGA

Design of A Six-stage Pipelined MIPS Processor Based on FPGA Design of A Six-stage Pipelined MIPS Processor Based on FPGA Qiao-Zhi Sun, De-Chun Kong, Cheng-Long Zhao, and Hui-Bin Shi Department of Computer Science and Technology, Nanjing University of Aeronautics

More information

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital

More information

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design ENGN64: Design of Computing Systems Topic 4: Single-Cycle Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam

Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Prof. Martha A. Kim December 7, 23 Name: First Last (Family) UNI (e.g., mak29) You are allowed 3 hours. You may consult your own

More information

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

ENCM 369 Winter 2018 Lab 9 for the Week of March 19

ENCM 369 Winter 2018 Lab 9 for the Week of March 19 page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Topics. Pipelining Lessons (Task, Resource) Pipelining Lessons (Time cycles) Pipelining is Natural! Laundry Example

Topics. Pipelining Lessons (Task, Resource) Pipelining Lessons (Time cycles) Pipelining is Natural! Laundry Example Lecture : Pipelining opics Introduction to pipelining Perfmance Pipelined datapath Design issues Hazards in pipeline ypes F a program with billion instructions, Execution ime =? Pipelining Lessons (ask,

More information

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CPE 335 Computer Organization. Basic MIPS Architecture Part I CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 2. Instruction Set Architecture 第二讲 指令集架构 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 27 Review ENIAC (946) used decimal representation; vacuum tubes per digit; could store

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

COMP2611: Computer Organization. The Pipelined Processor

COMP2611: Computer Organization. The Pipelined Processor COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl. Lecture 4: Review of MIPS Instruction formats, impl. of control and datapath, pipelined impl. 1 MIPS Instruction Types Data transfer: Load and store Integer arithmetic/logic Floating point arithmetic Control

More information

Systems Architecture

Systems Architecture Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Single Cycle MIPS Processor Stephen. Edwards Columbia University Summer 26 Illustrations Copyright 27 Elsevier The path The lw The sw R-Type s The beq The Controller Encoding

More information

The MIPS Processor Datapath

The MIPS Processor Datapath The MIPS Processor Datapath Module Outline MIPS datapath implementation Register File, Instruction memory, Data memory Instruction interpretation and execution. Combinational control Assignment: Datapath

More information

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked

More information

Pipelined Processor Design

Pipelined Processor Design Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

LECTURE 5. Single-Cycle Datapath and Control

LECTURE 5. Single-Cycle Datapath and Control LECTURE 5 Single-Cycle Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor.

More information

EIE/ENE 334 Microprocessors

EIE/ENE 334 Microprocessors EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA

More information

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 14: One Cycle MIPs Datapath Adapted from Computer Organization and Design, Patterson & Hennessy, UCB R-Format Instructions Read two register operands Perform

More information

CENG 3420 Lecture 06: Datapath

CENG 3420 Lecture 06: Datapath CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours. This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in

More information

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007 Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007 Final Exam This is a closed-book take-home exam. You are permitted a calculator and two 8.5x sheets of paper with notes. The exam

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 3: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring 27 6 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

CSEN 601: Computer System Architecture Summer 2014

CSEN 601: Computer System Architecture Summer 2014 CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following

More information

Working on the Pipeline

Working on the Pipeline Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder

More information

CS150 Fall 2012 Solutions to Homework 6

CS150 Fall 2012 Solutions to Homework 6 CS150 Fall 2012 Solutions to Homework 6 October 6, 2012 Problem 1 a.) Answer: 0.09 ns This delay is given in Table 65 as T ILO, specifically An Dn LUT address to A. b.) Answer: 0.41 ns In Table 65, this

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Recall. ISA? Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding how is it decoded? Location of operands and

More information

ECE369. Chapter 5 ECE369

ECE369. Chapter 5 ECE369 Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions 2 3 4 5 Prof. Stephen A. Edwards Columbia University Due June 26, 207 at :00 PM ame: Solutions Uni: Show your work for each problem;

More information

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified

More information

The overall datapath for RT, lw,sw beq instrucution

The overall datapath for RT, lw,sw beq instrucution Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

RISC Processor Design

RISC Processor Design RISC Processor Design Single Cycle Implementation - MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 13 SE-273: Processor Design Feb 07, 2011 SE-273@SERC 1 Courtesy:

More information

CPE 335. Basic MIPS Architecture Part II

CPE 335. Basic MIPS Architecture Part II CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Pipelining: Hazards Ver. Jan 14, 2014

Pipelining: Hazards Ver. Jan 14, 2014 POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information