Computer Architectures
|
|
- Cecilia Mosley
- 5 years ago
- Views:
Transcription
1 Computer Architectures Pipelined instruction execution Hazards, stages balancing, super-scalar systems Pavel Píša, Michal Štepanovský, Miroslav Šnorek Main source of inspiration: Patterson Czech Technical University in Prague, Faculty of Electrical Engineering English version partially supported by: European Social Fund Prague & EU: We invests in your future. AEB36APO Computer Architectures Ver..
2 Motivation AMD Bulldozer 5h (FX, Opteron) - 2 AEB36APO Computer Architectures 2
3 Motivation Intel Nehalem (Core i7) - 28 AEB36APO Computer Architectures 3
4 The goal of today lecture Convert/extend CPU presented in the lecture 2 to the pipelined CPU design. The following instructions are considered for our CPU design: add, sub, and, or, slt, addi, lw, sw and beq Typ 3 R opcode(6), 3:26 rs(5), 25:2 rt(5), 2:6 rd(5), 5: shamt(5) funct(6), 5: I opcode(6), 3:26 rs(5), 25:2 rt(5), 2:6 immediate (6), 5: J opcode(6), 3:26 address(26), 25: AEB36APO Computer Architectures 4
5 Single cycle CPU together with memories 3:26 5: Control Unit Opcode Funct MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite PC PC A RD Instr 25:2 Instr. 2:6 4 PCPlus4 2:6 5: 5: WE3 A RD A2 A3 WD3 RD2 Reg. File Sign Ext Rt Rd SignImm SrcA Zero WE ALU A RD SrcB AluOut Data ReadData WriteData WD WriteReg <<2 PCBranch Result AEB36APO Computer Architectures From lecture 2 5
6 Single cycle CPU performance: IPS = IC / T = IPC avg.f CLK What is the maximal possible frequency of this CPU? It is given by latency on the critical path it is lw in our case: T c = t PC t Mem t RFread t ALU t Mem t Mux t RFsetup PC PC 4 PCPlus4 A RD Instr. Instr 25:2 5: 5: WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 2:6 Sign Ext Rt Rd SignImm SrcA Zero WE ALU A RD SrcB AluOut Data ReadData WriteData WD WriteReg <<2 PCBranch Result AEB36APO Computer Architectures From lecture 2 6
7 Single cycle CPU throughput: IPS = IC / T = IPC avg.f CLK Tc = tpc t Mem t RFread t ALU t Mem t Mux t RFsetup Consider following parameters t PC = 3 ns t Mem = 3 ns t RFread = 5 ns t ALU = 2 ns t Mux = 2 ns t RFsetup = 2 ns Then Tc = 2 ns --> f CLK max = 98 khz, IPS = 98e3 = 98 instructions per second AEB36APO Computer Architectures From lecture 2 7
8 Pipelined instructions execution Suppose that instruction execution can be divided into 5 stages: IF ID EX MEM WB IF Instruction Fetch, ID Instruction decode (and Operands Fetch), EX Execute, MEM Access, WB Write Back and = max { i } k i=, where i is time required for signal propagation (propagation delay) through i-th stage. IF setup PC for memory and fetch pointed instruction. Update PC = PC4 ID decode the opcode and read registers specified by instruction, check for equality (for possible beq instruction), sign extend offset, compute branch target address for branch case (this is means to extend offset and add PC) EX execute function/pass register values through ALU MEM read/write main memory for load/store instruction case WB write result into RF for instructions of register-register class or instruction load (result source is ALU or memory) AEB36APO Computer Architectures 8
9 Instruction-level parallelism - pipelining IF I I2 I3 I4 I5 I6 I7 I8 I9 I ID I I2 I3 I4 I5 I6 I7 I8 I9 EX I I2 I3 I4 I5 I6 I7 I8 MEM I I2 I3 I4 I5 I6 I7 ST I I2 I3 I4 I5 I6 The time to execute n instructions in the k-stage pipeline: T k = k. (n ) Speedup: S k = T nk τ = T k kτ (n )τ lim S k =k n čas Prerequisite: pipeline is optimally balanced, circuit can arbitrarily divided AEB36APO Computer Architectures 9
10 Instruction-level parallelism - pipelining Does not reduce the execution time of individual instructions, effect is just the opposite... Hazards: structural (resolved by duplication), data (result of data dependencies: RAW, WAR, WAW) control (caused by instructions which change PC)... Hazard prevention can result in pipeline stall or pipeline flush Remark : Deeper pipeline (more stages) results in shorter sequences of gates in each stage which enables to increase the operating frequency of the processor, but more stages means higher overhead (demand to arrange better instructions into pipeline and result in more significant lag in the case of stall or pipeline flush) AEB36APO Computer Architectures
11 Instruction-level parallelism Semantics violations Data hazard: Add writes new value to R ADD R,R2,R3 SUB R4,R,R3 flow of instructions and expected effect Control hazard: BEQZ R3, M ADD R6,R,R2 instruction 3 instruction 4 M: ADD R4,R6,R7 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB SUB reads incorrect value from R Condition and new PC evaluation IF ID EX MEM WB PC set to branch target IF ID EX MEM WB IF ID EX MEM WB Should be these instructions fetched (and executed then)? AEB36APO Computer Architectures
12 Non-pipelined execution PC PC Instr 25:2 A RD Instr. 2:6 4 PCPlus4 2:6 5: 5: WE3 A RD A2 A3 WD3 RD2 Reg. File Sign Ext Rt Rd SignImm SrcA Zero WE ALU A RD SrcB AluOut Data ReadData WriteData WD WriteReg <<2 PCBranch Result AEB36APO Computer Architectures From lecture 2 2
13 Pipelined execution AluOutW PC PC Instr 25:2 A RD Instr. 2:6 4 PCPlus4F 2:6 5: 5: WE3 A RD A2 A3 WD3 PCPlus4D RD2 Reg. File Sign Ext Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM Data ReadData WriteDataE WriteRegE WriteDataM WD WriteRegM WriteRegW <<2 PCBranch Fetch Decode Execute WriteBack AEB36APO Computer Architectures 3
14 Pipelined execution 3:26 5: Control Unit Opcode Funct MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite AluOutW PC PC Instr 25:2 A RD Instr. 2:6 4 PCPlus4F 2:6 5: 5: WE3 A RD A2 A3 WD3 PCPlus4D RD2 Reg. File Sign Ext Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM Data ReadData WriteDataE WriteRegE WriteDataM WD WriteRegM WriteRegW <<2 PCBranch Fetch Decode Execute WriteBack AEB36APO Computer Architectures 4
15 The same design but drawn scaled down Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 2:6 5: 5: SignImmD Sign Ext RtD RdD RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW AEB36APO Computer Architectures 5
16 Cause of the data hazards Register File access from two pipeline stages (Decode, WriteBack) actual write occurs at the first half of the clock cycle, the read in the second half there is no hazard for sub $s input operand RAW (Read After Write) hazard and (or) requires $s in 3 (4) How can such hazard be prevented without pipeline throughput degradation? AEB36APO Computer Architectures 6
17 Forwarding to avoid data hazards If a result is available (computed) before subsequent instruction(s) requires the value then data hazard can be avoided by forwarding Hazard case is indicated when some of source registers in EX stage is the same as destination register in stage MEM or WB The register numbers are fed to the Hazard Unit The RegWrite signal from MEM and WB stage has to be monitored as well to check that register number on WriteReg lines takes effect lw / sw etc. AEB36APO Computer Architectures 7
18 CPU after previous design steps Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 2:6 5: 5: SignImmD Sign Ext RtD RdD RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW AEB36APO Computer Architectures 8
19 Data hazards solved by forwarding Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 9
20 Data hazard avoided by pipeline stall If subsequent instructions require result before it is available in CPU then the pipeline has to be stalled (stall state inserted) The stall is mean to solve hazard but affect system throughput Pipeline stages preceding that one which is affected by the hazard are stalled until all results required by subsequent instructions are available results are forwarded to the sink which required their value AEB36APO Computer Architectures 2
21 Data hazard avoided by pipeline stall The stall is realized by the holding content of the inter-stage registers (gating their clocks or blocking their latch enable signals) Results from colliding stages have to be discarded certain control signals in CPU (RF or memory write enable, branch gating) are reset (held low) Both is achieved by introduction of control signals to hold and/or reset inter-stages registers AEB36APO Computer Architectures 2
22 Processor design build till now Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F PCPlus4D PCBranchD <<2 ResultW Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 22
23 Processor with data hazards avoided by stall Control unit 3:26 Op 5: Funct RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE BranchE RegWriteM MemToRegM MemWriteM BranchD PCSrcM RegWriteW MemTo RegW PC EN PC 4 A RD Instruction InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext RsD RtD RdD RsE RtE RdE SrcAE SrcBE ALU WriteDataE WriteRegE 4: SignImmE Zero ALUOutM WriteDataM A RD Data WD WE ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: PCPlus4F EN PCPlus4D PCBranchD CLR <<2 ResultW Stall F Stall D Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 23
24 Control hazards (branch and jump) Result is not known before 4 th cycle. Why? AEB36APO Computer Architectures 24
25 Control hazards better to know result earlier If the result of comparison can be evaluated in the 2 nd cycle misprediction penalty can be reduced But the processing of the comparison at earlier stage can induce new RAW hazards..!!! AEB36APO Computer Architectures 25
26 Resolve control hazards by early evaluate and flush PC EN PC 4 A RD Instruction PCPlus4F CLR EN Control unit 3:26 Op 5: Funct InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext <<2 PCPlus4D PCBranchD RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD = RsD RtD RdD PCSrcD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE SrcAE SignImmE SrcBE ALU WriteDataE WriteRegE 4: RegWriteM MemToRegM MemWriteM ALUOutM WriteDataM A RD Data WD WE RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: ResultW Stall F Stall D Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 26
27 PC EN PC 4 A Instruction Resolve RAW hazards by forwarding or stalling RD PCPlus4F CLR EN Control unit 3:26 Op 5: Funct InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext <<2 PCPlus4D PCBranchD RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD = RsD RtD RdD PCSrcD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE Stall SrcAE SignImmE SrcBE ALU WriteDataE WriteRegE 4: Forward / Stall RegWriteM MemToRegM MemWriteM ALUOutM WriteDataM A RD Data WD WE RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: No Action Required ResultW Stall F Stall D BranchD Forward BD Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 27
28 We are finished pipelined processor is designed PC EN PC 4 A RD Instruction PCPlus4F CLR EN Control unit 3:26 Op 5: Funct InstrD 25:2 WE3 A RD 2:6 A2 RD2 A3 Reg. WD3 File 25:2 2:6 5: 5: SignImmD Sign Ext <<2 PCPlus4D PCBranchD RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD = RsD RtD RdD PCSrcD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE SrcAE SignImmE SrcBE ALU WriteDataE WriteRegE 4: RegWriteM MemToRegM MemWriteM ALUOutM WriteDataM A RD Data WD WE RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: ResultW Stall F Stall D BranchD Forward BD Forward AE Forward BE RegWriteM RegWrite W Hazard unit AEB36APO Computer Architectures 28
29 Pipelined CPU performance: IPS = IC / T = IPC avg.f CLK What is maximal acceptable frequency for the CPU? Which stage is the slowest one? The cycle time is determined by the slowest stage For our case: Tc = 3 ns --> khz If the pipeline fill overhead is neglected (i.e. no pipeline stalls and flushes are considered) then ideal IPC =. IPS = 3 333e3 = instructions per second Introduction of the 5-stage pipeline increases performance (throughput) / 98 = 3.4 times! (considering IPC=) AEB36APO Computer Architectures 29
30 What is result of the design? Return back to non-pipelined CPU version 4 3:26 5: PC PC Instr 25:2 A RD PCPlus4F Instr. 2:6 2:6 5: 5: Control Unit Opcode Funct WE3 A RD A2 RD2 A3 WD3 Reg. File Sign Ext PCPlus4D MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM Data ReadData WriteData WriteReg <<2 PCBranch WD AluOutW AEB36APO Computer Architectures 3
31 What is result of the design? Return back to non-pipelined CPU version A Instr. A Data WD RD WE RD 4 3:26 5: PC PC Instr 25:2 A RD PCPlus4F 2:6 2:6 5: 5: Control Unit Opcode Funct WE3 A RD A2 RD2 A3 WD3 Reg. File Sign Ext PCPlus4D MemToReg MemWrite Branch ALUControl 2: ALUScr RegDest RegWrite Rt Rd SignImm PCPlus4E SrcA Zero WE ALU Result A RD SrcB AluOutM ReadData WriteData WriteReg <<2 WD PCBranch Control unit (control path) AluOutW Data/ALU (data path) AEB36APO Computer Architectures 3
32 What is result of the design? Processor Control unit PC A RD Instruction RD A PC Instr. Address for data Read/Write Data to Write A RD Data WD WE Write enable Read data Data-path (ALU, registers) RD A WD Address Results AEB36APO Computer Architectures 32
33 CPU design result pipelined version PC EN PC 4 A RD Instruction PCPlus4F EN InstrD Contr ol unit 3:26 Op 5: Funct 25:2 2:6 25:2 2:6 5: RegWriteD MemToRegD MemWriteD ALUControlD ALUSrcD RegDstD BranchD EquaD 5: SignImmD Sign Ext <<2 PCPlus4D WE3 RD RD2 Reg. File A A2 A3 WD3 PCBranchD = PCSrcD RsD RtD RdD CLR RegWriteE MemToRegE MemWriteE ALUControlE ALUSrcE RegDstE RsE RtE RdE SrcAE SrcBE ALU SignImmE WriteDataE WriteRegE 4: RegWriteM MemToRegM MemWriteM WE ALUOutM A RD Data WriteDataM WD RegWriteW MemTo RegW ReadDataW ALUOutW WriteRegM 4: WriteRegW 4: ResultW Stall F Stall D BranchD Forward BD Hazard unit Forward AE Forward BE RegWriteM RegWrite W AEB36APO Computer Architectures 33
34 Pipelined CPU timing The timing/ac characteristics of synchronous sequential circuit : t setup inputs setup time t hold inputs hold time Signal integrity constrain for the setup time before the clock: Tc >= t pcq t pd t setup t pd combinatorial logic propagation delay AEB36APO Computer Architectures 34
35 Pipelined processor timing Constraint for the setup time (consider the clock distribution jitter): Tc >= t pcq t pd t setup t skew Clock distribution jitter is limiting factor, if it reaches or exceeds value of t pd (too deep pipeline / too many stages ) AEB36APO Computer Architectures 35
36 Pipeline stages balancing Linear pipelining: (applies to tree based adder, multiplier, (unrolled) iterative divider..) Balancing: the goal is to divide the processing into N stages in such way, that stage propagation delays are roughly the same The number of stages reflects preference of performance (throughput) versus latency. AEB36APO Computer Architectures 36
37 Superpipeline and beyond Not well balanced 5-stage pipeline: IM RF DM RF IF ID EX MEM WB Deeper pipeline is result of decomposing stages into more stages IM RF DM RF IF IS RF EX DF DS TC WB It allows CPU to work at higher frequencies but introduces many problems as well.. Complex forwarding, more pipeline stalls, hazards need to be solved by complex logic AEB36APO Computer Architectures 37
38 Typical pipeline depths in todays CPUs P5 (Pentium) : 5 P6 (Pentium 3): P6 (Pentium Pro): 4 NetBurst (Willamette, 8 nm) - Celeron, Pentium 4: 2 NetBurst (Northwood, 3 nm) - Celeron, Pentium 4, Pentium 4 HT: 2 NetBurst (Prescott, 9 nm) - Celeron D, Pentium 4, Pentium 4 HT, Pentium 4 ExEd: 3 NetBurst (Cedar Mill, 65 nm): 3 NetBurst (Presler 65 nm) - Pentium D: 3 Core : 4 Bonnell: 6 K7 Architecture - Athlon : -5 K8 - Athlon 64, Sempron, Opteron, Turion 64: 2-7 ARM 8-9: 5 ARM : 8 Cortex A7: 8- Cortex A8: 3 Cortex A5: 5-25 The Optimum Pipeline Depth for a Microprocessor: AEB36APO Computer Architectures 38
39 Branch stall discussion and delay slots The instruction memory read and fetch is expensive and result of condition evaluation in branch instructions (even worse target in indirect branch instructions) has to be evaluated before next fetch and execute. The stall state is waste of cycles. Options to use that cycle(s) are: Start fetch and execution of instruction(s) following branch and flush/discard results if it is resolved that it should not be executed Extend above by adding condition results/branch predictor (taken/not-taken) and branch target cache (BTB) Execute one or more instructions after branch unconditionally in (so called) delay slot Delay slots unconditional execution is common for many DSP (digital signal processor) and some RISC architectures (MIPS, SPARC) AEB36APO Computer Architectures 39
40 AEB36APO Computer Architectures 4
Topics. Lecture 12: Pipelining. Introduction to pipelining. Pipelined datapath. Hazards in pipeline. Performance. Design issues.
Lecture 2: Pipelining Topics Introduction to pipelining Performance Pipelined datapath Design issues Hazards in pipeline Types Solutions Pipelining is Natural! Laundry Example Use case scenario Ann, Brian,
More informationCHW 362 : Computer Architecture & Organization
CHW 362 : Computer Architecture & Organization Instructors: Dr Ahmed Shalaby Dr Mona Ali http://bu.edu.eg/staff/ahmedshalaby4# http://www.bu.edu.eg/staff/mona.abdelbaset Review: Instruction Formats R-Type
More informationDesign of Digital Circuits Lecture 17: Pipelining Issues. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 17: Pipelining Issues Prof. Onur Mutlu ETH Zurich Spring 2017 28 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationDesign of Digital Circuits Lecture 16: Dependence Handling. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 16: Dependence Handling Prof. Onur Mutlu ETH Zurich Spring 2017 27 April 2017 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationCENG 5133 Computer Architecture Design Spring Sample Exam 2
CENG 533 Computer Architecture Design Spring 24 Sample Exam 2. (6 pt) Determine the propagation delay and contamination delay of the following circuit using the gate delays given below. Gate t pd (ps)
More informationENCM 501 Winter 2019 Assignment 6 for the Week of March 11
page of 8 ENCM 5 Winter 29 Assignment 6 for the Week of March Steve Norman Department of Electrical & Computer Engineering University of Calgary February 29 Assignment instructions and other documents
More informationSlide Set 7 for Lecture Section 01
Slide Set 7 for Lecture Section 01 for ENCM 369 Winter 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary February 2017 ENCM 369 Winter
More informationDesign of A Six-stage Pipelined MIPS Processor Based on FPGA
Design of A Six-stage Pipelined MIPS Processor Based on FPGA Qiao-Zhi Sun, De-Chun Kong, Cheng-Long Zhao, and Hui-Bin Shi Department of Computer Science and Technology, Nanjing University of Aeronautics
More informationDesign of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)
Microarchitecture Design of Digital Circuits 27 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_7 Adapted from Digital
More informationENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design
ENGN64: Design of Computing Systems Topic 4: Single-Cycle Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationColumbia University CSEE 3827 Fundamentals of Computer Systems Final Exam
Columbia University CSEE 3827 Fundamentals of Computer Systems Final Exam Prof. Martha A. Kim December 7, 23 Name: First Last (Family) UNI (e.g., mak29) You are allowed 3 hours. You may consult your own
More informationDesign of Digital Circuits Lecture 15: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 5: Pipelining Prof. Onur Mutlu ETH Zurich Spring 27 3 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control
ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849
More informationENCM 369 Winter 2018 Lab 9 for the Week of March 19
page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationEECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction
EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationProcessor (I) - datapath & control. Hwansoo Han
Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationEECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer
EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationTopics. Pipelining Lessons (Task, Resource) Pipelining Lessons (Time cycles) Pipelining is Natural! Laundry Example
Lecture : Pipelining opics Introduction to pipelining Perfmance Pipelined datapath Design issues Hazards in pipeline ypes F a program with billion instructions, Execution ime =? Pipelining Lessons (ask,
More informationCPE 335 Computer Organization. Basic MIPS Architecture Part I
CPE 335 Computer Organization Basic MIPS Architecture Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s8/index.html CPE232 Basic MIPS Architecture
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationCOMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined
More informationCS3350B Computer Architecture Quiz 3 March 15, 2018
CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.
More informationComputer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 2. Instruction Set Architecture 第二讲 指令集架构 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 27 Review ENIAC (946) used decimal representation; vacuum tubes per digit; could store
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationCOMP2611: Computer Organization. The Pipelined Processor
COMP2611: Computer Organization The 1 2 Background 2 High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among
More informationCSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content
3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design
More informationLecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.
Lecture 4: Review of MIPS Instruction formats, impl. of control and datapath, pipelined impl. 1 MIPS Instruction Types Data transfer: Load and store Integer arithmetic/logic Floating point arithmetic Control
More informationSystems Architecture
Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University The Processor Logic Design Conventions Building a Datapath A Simple Implementation Scheme An Overview of Pipelining Pipelined
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationFundamentals of Computer Systems
Fundamentals of Computer Systems Single Cycle MIPS Processor Stephen. Edwards Columbia University Summer 26 Illustrations Copyright 27 Elsevier The path The lw The sw R-Type s The beq The Controller Encoding
More informationThe MIPS Processor Datapath
The MIPS Processor Datapath Module Outline MIPS datapath implementation Register File, Instruction memory, Data memory Instruction interpretation and execution. Combinational control Assignment: Datapath
More informationFull Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI
CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked
More informationPipelined Processor Design
Pipelined Processor Design Pipelined Implementation: MIPS Virendra Singh Computer Design and Test Lab. Indian Institute of Science (IISc) Bangalore virendra@computer.org Advance Computer Architecture http://www.serc.iisc.ernet.in/~viren/courses/aca/aca.htm
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More information4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?
Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationImproving Performance: Pipelining
Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationLECTURE 5. Single-Cycle Datapath and Control
LECTURE 5 Single-Cycle Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor.
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationCOMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath
COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 14: One Cycle MIPs Datapath Adapted from Computer Organization and Design, Patterson & Hennessy, UCB R-Format Instructions Read two register operands Perform
More informationCENG 3420 Lecture 06: Datapath
CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.
This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in
More informationDigital Design & Computer Architecture (E85) D. Money Harris Fall 2007
Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007 Final Exam This is a closed-book take-home exam. You are permitted a calculator and two 8.5x sheets of paper with notes. The exam
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationDesign of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 3: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring 27 6 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationInstruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31
4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationChapter 4. The Processor. Computer Architecture and IC Design Lab
Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationCSEN 601: Computer System Architecture Summer 2014
CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following
More informationWorking on the Pipeline
Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder
More informationCS150 Fall 2012 Solutions to Homework 6
CS150 Fall 2012 Solutions to Homework 6 October 6, 2012 Problem 1 a.) Answer: 0.09 ns This delay is given in Table 65 as T ILO, specifically An Dn LUT address to A. b.) Answer: 0.41 ns In Table 65, this
More informationChapter 4. The Processor
Chapter 4 The Processor Recall. ISA? Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Format or Encoding how is it decoded? Location of operands and
More informationECE369. Chapter 5 ECE369
Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions
CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions 2 3 4 5 Prof. Stephen A. Edwards Columbia University Due June 26, 207 at :00 PM ame: Solutions Uni: Show your work for each problem;
More informationCENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu
CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified
More informationThe overall datapath for RT, lw,sw beq instrucution
Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationRISC Processor Design
RISC Processor Design Single Cycle Implementation - MIPS Virendra Singh Indian Institute of Science Bangalore virendra@computer.org Lecture 13 SE-273: Processor Design Feb 07, 2011 SE-273@SERC 1 Courtesy:
More informationCPE 335. Basic MIPS Architecture Part II
CPE 335 Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/courses/cpe335_s08/index.html CPE232 Basic MIPS Architecture
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationPipelining: Hazards Ver. Jan 14, 2014
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationCOMPUTER ORGANIZATION AND DESIGN
ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More information