CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Similar documents
CPE 335. Basic MIPS Architecture Part II

Computer Science 141 Computing Hardware

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

Multiple Cycle Data Path

RISC Design: Multi-Cycle Implementation

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Processor: Multi- Cycle Datapath & Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Processor (multi-cycle)

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

LECTURE 6. Multi-Cycle Datapath and Control

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

RISC Processor Design

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

CSE 2021 COMPUTER ORGANIZATION

Major CPU Design Steps

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

CENG 3420 Lecture 06: Datapath

Introduction to Pipelined Datapath

CSE 2021 COMPUTER ORGANIZATION

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Computer Architecture. Lecture 6.1: Fundamentals of

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Systems Architecture I

ECE369. Chapter 5 ECE369

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

CC 311- Computer Architecture. The Processor - Control

Multicycle conclusion

RISC Architecture: Multi-Cycle Implementation

Topic #6. Processor Design

ENE 334 Microprocessors

Processor (I) - datapath & control. Hwansoo Han

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Chapter 4. The Processor

Chapter 5: The Processor: Datapath and Control

Review: Abstract Implementation View

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

LECTURE 5. Single-Cycle Datapath and Control

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMP2611: Computer Organization. The Pipelined Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

EECE 417 Computer Systems Architecture

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Systems Architecture

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

The Processor: Datapath & Control

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Chapter 4. The Processor

ECE 154A Introduction to. Fall 2012

RISC Architecture: Multi-Cycle Implementation

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Multicycle Approach. Designing MIPS Processor

Lecture 7 Pipelining. Peng Liu.

Initial Representation Finite State Diagram. Logic Representation Logic Equations

14:332:331 Pipelined Datapath

Lets Build a Processor

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Note- E~ S. \3 \S U\e. ~ ~s ~. 4. \\ o~ (fw' \i<.t. (~e., 3\0)

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Lecture 5: The Processor

Introduction. ENG3380 Computer Organization and Architecture MIPS: Data Path Design Part 3. Topics. References. School of Engineering 1

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

ECS 154B Computer Architecture II Spring 2009

Working on the Pipeline

Chapter 4. The Processor

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Fundamentals of Computer Systems

ECE 313 Computer Organization EXAM 2 November 9, 2001

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

Points available Your marks Total 100

Chapter 5 Solutions: For More Practice

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

The overall datapath for RT, lw,sw beq instrucution

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Mapping Control to Hardware

Design of the MIPS Processor

EE 457 Unit 6a. Basic Pipelining Techniques

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

CS3350B Computer Architecture Quiz 3 March 15, 2018

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

ECE468 Computer Organization and Architecture. Designing a Multiple Cycle Controller

Chapter 4. The Processor

Single vs. Multi-cycle Implementation

Chapter 4 The Processor (Part 2)

COMPUTER ORGANIZATION AND DESIGN

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

ECE232: Hardware Organization and Design

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Transcription:

CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27

Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate the slowest instruction Especially problematic for more complex instructions like floating point multiply Clk Cycle Cycle 2 lw sw Waste May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle But, it is simple and easy to understand CAPL Fall 27 2 / 44

Multicycle Datapath Approach () Let an instruction take more than clock cycle to complete Break up instructions into steps where each step takes a cycle while trying to balance the amount of work to be done in each step restrict each cycle to use only one major functional unit Not every instruction takes the same number of clock cycles In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result only need one memory but only one memory access per cycle need only one /adder but only one operation per cycle CAPL Fall 27 3 / 44

Multicycle Datapath Approach (2) At the end of a cycle Store values needed in a later cycle by the current instruction in an internal register (not visible to the programmer). All (except IR) hold data only between a pair of adjacent clock cycles (no write control signal needed) PC Memory Address Data (Instr. or Data) IR MDR Addr Register Addr 2Data File Write Addr Data 2 A B out IR Instruction Register MDR Memory Data Register A, B regfile read data registers out output register Data used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory) CAPL Fall 27 4 / 44

Additional Registers Needed Data used by the same instruction must be stored in additional registers Position is determined by two factors which units fit into same clock cycle what data is needed for later cycles Instruction register (IR) and Memory data register (MDR) added to save output of the memory for instruction read and data read A and B registers added to hold register operand values read from register file Out register holds the output of All registers (except IR) will hold data just between a pair of adjacent cycles, thus do not need write control signal CAPL Fall 27 5 / 44

Multicycle Datapath Functional units shared for different purposes Multiplexer needed for Memory access PC or Out Three s replaced by one additional multiplexer for first input (A or PC) added 4-way multiplexer for second input More registers and multiplexers, but Less memory units ( instead of 2) Fewer adders (2) Reduced hardware cost CAPL Fall 27 6 / 44

More Control Lines Needed Multiple clock cycles per instruction State units (PC, memory, registers) need write control lines Memory needs read signal Additional multiplexers need control line, 4-way needs 2 PC has three possible sources PCWrite, unconditional write of PC PCWriteCond, cause write of PC, if branch is true CAPL Fall 27 7 / 44

Signal name RegDst Effect when deasserted The register destination number for the Write register comes from the rt field (bits 2:6) Actions of the -bit control signals Effect when asserted The register destination number for the Write register comes from the rd field (bits 5:). RegWrite None. The register on the Write register input is written with the value on the Write data input. SrcA The first operand comes from the PC. The first operand comes from the A register. Mem None. Content of memory at the location specified by the Address input is put on Memory data output. MemWrite None. Memory contents at the location specified by the Address input is replaced by value on Write data input. MemtoReg IorD The value fed to the register Write data input comes from the Out. The PC is used to supply the address to the memory unit. The value fed to the register Write data input comes from the MDR. Out is used to supply the address to the memory unit. IRWrite None. The output of the memory is written into the IR. PCWrite None. The PC is written; the source is controlled by PCSource. PCWriteCond None. The PC is written if the Zero output from the is also active. Signal name Op Actions of the 2-bit control signals Value (binary) The performs an add operation. The performs an subtract operation. Effect The funct field of the instruction determines the operation The second input to the comes from the B register. SrcB PCSource The second input to the is the constant 4. The second input to the is the sign-extended, lower bits of the IR. The second input to the is the sign-extended, lower bits of the IR shifted left 2 bits. Output of the (PC+4) is sent to the PC for writing. The contents of Out (branch target address) are sent to the PC for writing. The jump target address (IR[25:]) shifted left 2 bits and concatenated with PC+4[3:28] is sent to the PC for writing. CAPL Fall 27 8 / 44

Instr[3-26] Multicycle Datapath The Multicycle Datapath with Control Signals PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 9 / 44

Instructions from ISA Perspective Move from one-cycle to multi-cycle Identifying steps that take one cycle Equal distribution of execution time At most one operation for each of the modules Register file Memory New registers if The signal is computed in one cycle and used in another cycle The inputs of the block generating the signal may change in the second cycle CAPL Fall 27 / 44

The Five Execution Steps. Instruction fetch Move the instruction from the instruction memory to the instruction register IR 2. Instruction decode and register fetch Provide the register contents for the 3. Execution, memory address computation or branch completion 4. Memory access or R-type instruction completion 5. Write back step CAPL Fall 27 / 44

Step : Instruction Fetch () Load instruction from memory IR = Memory [PC] Set address mux (IorD) = select instruction Set Mem = Set IRWrite = Increment PC PC = PC + 4 Set SrcA = get operand from IR Set SrcB = get operand 4 Set Op = add Allow storing new PC in PC register CAPL Fall 27 2 / 44

Instr[3-26] Multicycle Datapath Step : Instruction Fetch (2) PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 3 / 44

Step 2: Instruction Decode & Register Fetch () Switch registers to the output of the register block A <= register [IR [25:2]] rs B <= register [IR [2:6]] rt No signal setting required (Always) calculate the branch target address Out <= PC + (sign-ext. (IR [5:]) << 2) Value can just be ignored if instruction is not branch Stored in the Out register Set SrcB = Set Op = add CAPL Fall 27 4 / 44

Instr[3-26] Multicycle Datapath Step 2: Instruction Decode & Register Fetch (2) PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 5 / 44

Step 3: Execution, Memory Address Computation or Branch Completion First cycle where step depends on the instruction Selection performed by interpretation of the op + function field of the instruction Memory reference calculate address Out <= A + sign-extend(ir[5:]) Set SrcA = get operand from A Set SrcB = get operand from sign extension unit Set Op = add CAPL Fall 27 6 / 44

Instr[3-26] Multicycle Datapath Step 3: Memory Reference PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 7 / 44

Step 3: Execution, Memory Address Computation or Branch Completion Arithmetic-logical instruction (R-type): Out = A op B Set SrcA = Set SrcB = Set Op = get operand from A get operand from B code from IR CAPL Fall 27 8 / 44

Instr[3-26] Multicycle Datapath Step 3: Arithmetic-Logical Instruction PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 9 / 44

Step 3: Execution, Memory Address Computation or Branch Completion Branch: if (A == B) PC <= Out Set SrcA = Set SrcB = Set Op = Write Out to PC register get operand from A get operand from B subtraction CAPL Fall 27 2 / 44

Instr[3-26] Multicycle Datapath Step 3: Branch PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 2 / 44

Step 3: Execution, Memory Address Computation or Branch Completion Jump: PC <= {PC [3:28], (IR[25:] << 2)} CAPL Fall 27 22 / 44

Instr[3-26] Multicycle Datapath Step 3: Jump PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 23 / 44

Step 4: Memory Access or R-type Instruction Completion Memory reference: controls must remain stable Set IorD = load from memory MDR <= memory[out] Set Mem = store to memory memory[out] <= B Set MemWrite = address from CAPL Fall 27 24 / 44

Instr[3-26] Multicycle Datapath Step 4: Memory Reference (load word) PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 25 / 44

Instr[3-26] Multicycle Datapath Step 4: Memory Reference (save word) PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 26 / 44

Step 4: Memory Access or R-type Instruction Completion Arithmetic-logical instruction completion: Register[IR[5:]] <= Out Set RegDst = Set RegWrite = Set MemToReg = Select write register Allow write operation Select data Op, SrcA, SrcB = constant CAPL Fall 27 27 / 44

Instr[3-26] Multicycle Datapath Step 4: Arithmetic-Logical Instruction Completion PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 28 / 44

Step 5: Write Back Write data from memory to the register: Register[IR[2:6]] <= MDR Set RegDst = Set RegWrite = Set MemToReg = Select write rt as target register Allow write operation Select Memory data Op, SrcA, SrcB = constant CAPL Fall 27 29 / 44

Instr[3-26] Multicycle Datapath Step 5: Memory Reference (load word) PCWriteCond PCWrite PCSource IorD Mem Control Op SrcB MemWrite SrcA MemtoReg RegWrite IRWrite RegDst PC[3-28] PC Address Memory Data (Instr. or Data) IR MDR Addr Register Addr 2 Data File Write Addr Data 2 Instr[5-] Sign Extend 32 Instr[5-] Instr[25-] Shift left 2 A B 4 2 3 Shift left 2 zero 28 control out 2 CAPL Fall 27 3 / 44

... Multicycle Datapath Multicycle Control Unit Not determined solely by the bits in the instruction e.g., op code bits tell what operation the should be doing, but not what instruction cycle is to be done next Must use a finite state machine (FSM) for control a set of states (current state stored in State Register) next state function (determined by current state and the input) output function (determined by current state and the input) Combinational control logic Outputs Inputs...... State Reg Inst Opcode Datapath control points Next State CAPL Fall 27 3 / 44

Graphic Representation of FSM Common part Instruction specific CAPL Fall 27 32 / 44

The Five Steps of the Load Instruction Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB : IFetch: Instruction Fetch and Update PC 2: Dec: Instruction Decode, Register, Sign Extend Offset 3: Exec: Execute R-type; Calculate Memory Address; Branch Comparison; Branch and Jump Completion 4: Mem: Memory ; Memory Write Completion; R-type Completion (RegFile write) 5: WB: Memory Completion (RegFile write) Instructions take 3 5 cycles CAPL Fall 27 33 / 44

Multicycle Advantages & Disadvantages Uses the clock cycle efficiently the clock cycle is timed to accommodate the slowest instruction step Clk Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem R-type IFetch... Multicycle implementations allow functional units to be used more than once per instruction as long as they are used on different clock cycles But Requires additional internal state registers, more muxes, and more complicated (FSM) control CAPL Fall 27 34 / 44

Single Cycle vs. Multiple Cycle Timing Single Cycle Implementation: Clk Cycle Cycle 2 lw sw Waste multicycle clock slower than /5 th of Multiple Cycle Implementation: single cycle clock due to state register overhead Clk Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem R-type IFetch... CAPL Fall 27 35 / 44

Summary If we understand the instructions We can build a simple processor If instructions take different amounts of time, multi-cycle is better Datapath implemented using: Combinational logic for arithmetic State holding elements to remember bits Control implemented using: Combinational logic for single-cycle implementation Finite state machine for multi-cycle implementation CAPL Fall 27 36 / 44

: How Can We Make It Even Faster? Split the multiple instruction cycle into smaller and smaller steps There is a point of diminishing returns where as much time is spent loading the state registers as doing the work Start fetching and executing the next instruction before the current one has completed (all?) modern processors are pipelined for performance Remember the performance equation: CPU time = CPI * CC * IC Fetch (and execute) more than one instruction at a time Superscalar processing CAPL Fall 27 37 / 44

Preconditions Instruction set design Instructions (ideally) of equal length enables to fetch in first stage and decode in second Few instruction formats source register at same place for each instruction read register and determine type of instruction Memory operands only in load & store Aligned data: only one memory access/read operation Sources of problems Instructions with variable length multiple memory accesses Unaligned data multiple memory access for one data item CAPL Fall 27 38 / 44

An Analogous Example () Laundry problem Four processing stages (wash, dry, fold, put away) Identical time (3 minutes) Fixed sequence of usage Total time for n loads: n * 2 hours CAPL Fall 27 39 / 44

An Analogous Example (2) Laundry optimization Units operate independently Overlapping use of resources Total time, loads: * 2 hours + (n-) * /2 hour = 3.5 hours Average time for laundry: 3.5 h / 4 = 52.5 min CAPL Fall 27 4 / 44

An Analogous Example (3) All stages operate concurrently Many tasks are being done in parallel, pipelining improves throughput of the laundry, while time to complete single load (instructions...) does not change (latency is not reduced) is only faster for many loads Far more important metric, because programs execute billions of instructions If stages take same amount of time, and if all stages can be used, speedup due to pipelining is equal to number of stages in pipeline But two ifs..., there is a limit for the length of a pipeline where no further speedup will be seen CAPL Fall 27 4 / 44

Single Cycle Datapath CAPL Fall 27 42 / 44

Real Pipeline MIPS pipeline steps. IF: Fetch instruction from memory 2. ID: registers while decoding 3. EX: Execute the operation or calculate an address 4. MEM: Access an operand in data memory 5. WB: Write back results in register Unequal time for steps (in ps) Single cycle: cycle depends on slowest instruction Instruction class Load Word (lw) Store Word (sw) R-format (add, sub, and, or, slt) Branch (beq) Instruction fetch Register read operation Data access Register write Total time 2 ps ps 2 ps 2 ps ps 8 ps 2 ps ps 2 ps 2 ps 7 ps 2 ps ps 2 ps ps 6 ps 2 ps ps 2 ps 5 ps CAPL Fall 27 43 / 44

Single Cycle vs. Pipelined Execution regfile write in first half regfile read in second half CAPL Fall 27 44 / 44