Processor: Multi- Cycle Datapath & Control

Similar documents
CSE 2021 COMPUTER ORGANIZATION

LECTURE 6. Multi-Cycle Datapath and Control

CPE 335. Basic MIPS Architecture Part II

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

CSE 2021 COMPUTER ORGANIZATION

Systems Architecture I

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

EECE 417 Computer Systems Architecture

Multiple Cycle Data Path

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

ECE369. Chapter 5 ECE369

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

The Processor: Datapath & Control

Points available Your marks Total 100

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

Topic #6. Processor Design

RISC Processor Design

CC 311- Computer Architecture. The Processor - Control

Chapter 5: The Processor: Datapath and Control

Processor (multi-cycle)

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Design of the MIPS Processor

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Lets Build a Processor

Chapter 5 Solutions: For More Practice

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

ENE 334 Microprocessors

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

Note- E~ S. \3 \S U\e. ~ ~s ~. 4. \\ o~ (fw' \i<.t. (~e., 3\0)

Chapter 4 The Processor (Part 2)

Multicycle Approach. Designing MIPS Processor

Design of the MIPS Processor (contd)

RISC Architecture: Multi-Cycle Implementation

Computer Science 141 Computing Hardware

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Single vs. Multi-cycle Implementation

RISC Architecture: Multi-Cycle Implementation

Control Unit for Multiple Cycle Implementation

RISC Design: Multi-Cycle Implementation

Systems Architecture

Using a Hardware Description Language to Design and Simulate a Processor 5.8

Major CPU Design Steps

EE457. Note: Parts of the solutions are extracted from the solutions manual accompanying the text book.

Lecture 5: The Processor

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Mapping Control to Hardware

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Processor Implementation in VHDL. University of Ulster at Jordanstown University of Applied Sciences, Augsburg

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Processor (I) - datapath & control. Hwansoo Han

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

ECE232: Hardware Organization and Design

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

EE457 Lab 4 Part 4 Seven Questions From Previous Midterm Exams and Final Exams ee457_lab4_part4.fm 10/6/04

Initial Representation Finite State Diagram. Logic Representation Logic Equations

Final Project: MIPS-like Microprocessor

Control & Execution. Finite State Machines for Control. MIPS Execution. Comp 411. L14 Control & Execution 1

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CSEN 601: Computer System Architecture Summer 2014

Microprogramming. Microprogramming

CPE 335 Computer Organization. Basic MIPS Architecture Part I

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

LECTURE 5. Single-Cycle Datapath and Control

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

ECE 3056: Architecture, Concurrency and Energy of Computation. Single and Multi-Cycle Datapaths: Practice Problems

Multicycle conclusion

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Lab 8: Multicycle Processor (Part 1) 0.0

CS232 Final Exam May 5, 2001

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Implementing the Control. Simple Questions

Lecture 2: MIPS Processor Example

Chapter 4. The Processor

Laboratory Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

October 24. Five Execution Steps

ECS 154B Computer Architecture II Spring 2009

Laboratory 5 Processor Datapath

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

The Processor: Datapath & Control

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Very short answer questions. You must use 10 or fewer words. "True" and "False" are considered very short answers.

ECE Exam I - Solutions February 19 th, :00 pm 4:25pm

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Chapter 4. The Processor

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control

Introduction to Pipelined Datapath

CMOS VLSI Design. MIPS Processor Example. Outline

Transcription:

Processor: Multi- Cycle Datapath & Control (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann, 27)

COURSE CONTENTS Introduction s Computer Arithmetic Performance Processor: Datapath Processor: Control Pipelining Techniques Memory Input/Output Devices 2

PROCESSOR: DATAPATH & CONTROL Multi-Cycle Datapath Multi-Cycle Control Additional Registers and Multiplexers 3

Multicycle Approach Break up an instruction into steps, each step takes a cycle: balance the amount of work to be done restrict each cycle to use only one major functional unit Different instructions take different number of cycles to complete At the end of a cycle: store values for use in later cycles (easiest thing to do) introduce additional internal registers for such temporal storage Reusing functional units (reduces hardware cost): Use ALU to compute address/result and to increment PC Use memory for both instructions and data 4

Multi-Cycle Datapath: Additional Registers Additional internal registers : register (IR) -- to hold current instruction Memory data register (MDR) -- to hold data read from memory A register (A) & B register (B) -- to hold register operand values from register files ALUOut register (ALUOut) -- to hold output of ALU, also serves as memory address register (MAR) All registers except IR hold data only between a pair of adjacent cycles and thus do not need write control signals; IR holds instructions till end of instruction, hence needs a write control signal PC M ux Address Write data Memory Data Inst / [25 2] [2 6] [5 ] register [5 ] [5 ] M ux M ux Read register Read Read register 2 data Registers Write register Read data 2 Write data A B 4 M ux M ux 2 3 ALU Zero ALU result ALUOut Memory data register 6 Sign extend 32 Shift left 2 5 Note: we ignore jump inst here

Multicycle Datapath: Additional Multiplexors Additional multiplexors: Mux for first ALU input -- to select A or PC (since we use ALU for both address/result computation & PC increment) Bigger mux for second ALU input -- due to two additional inputs: 4 (for normal PC increment) and the sign-extended & shifted offset field (in branch address computation) Mux for memory address input -- to select instruction address or data address PC M ux Address Write data Memory Data Inst / [25 2] [2 6] [5 ] register [5 ] [5 ] M ux M ux Read register Read register 2 Registers Write register Write data Read data Read data 2 A B 4 M ux M ux 2 3 Zero ALU ALU result ALUOut Memory data register 6 Sign extend 32 Shift left 2 6 Note: we ignore jump inst here

Multi-Cycle Datapath & Control 2 2 2 Note the reason for each control signal; also note that we have included the jump instruction 7

Control Signals for Multi-Cycle Datapath Note: three possible sources for value to be written into PC (controlled by PCSource): () regular increment of PC, (2) conditional branch target from ALUOut, (3) unconditional jump (lower 26 bits of instruction in IR shifted left by 2 and concatenated with upper 4 bits of the incremented PC) two PC write control signals: () PCWrite (for unconditional jump), & (2) PCWriteCond (for zero signal to cause a PC write if asserted during beq inst.) since memory is used for both inst. & data, need IorD to select appropriate addresses IRWrite needed for IR so that instruction is written to IR (IRWrite = ) during the first cycle of the instruction and to ensure that IR not be overwritten by another instruction during the later cycles of the current instruction execution (by keeping IRWrite = ) other control signals 8

Breaking the into 3-5 Execution Steps. Fetch (All instructions) 2. Decode (All instructions), Register Fetch & Branch Address Computation (in advance, just in case) 3. ALU (R-type) execution, Memory Address Computation, or Branch Completion ( dependent) 4. Memory Access or R-type Completion ( dependent) 5. Memory Read Completion (only for lw) At end of every clock cycle, needed data must be stored into register(s) or memory location(s). Each step (can be several parallel operations) is clock cycle --> s take 3 to 5 cycles! Clock Events during a cycle, e.g.: Data ready 9 Clock in result operation

Step : Fetch Use PC to get instruction (from memory) and put it in the Register Increment of the PC by 4 and put the result back in the PC Can be described succinctly using RTL "Register-Transfer Language" IR <= Memory[PC]; PC <= PC + 4; Which control signals need to be asserted? IorD =, MemRead =, IRWrite = ALUSrcA =, ALUSrcB =, ALUOp =, PCWrite =, PCSource = Why can instruction read & PC update be in the same step? Look at state element timing What is the advantage of updating the PC now?

Step 2: Decode, Reg. Fetch, & Branch Addr. Comp. In this step, we decode the instruction in IR (the opcode enters control unit in order to generate control signals). In parallel, we can Read registers rs and rt, just in case we need them Compute the branch address, just in case the instruction is a branch beq RTL: A <= Reg[IR[25:2]]; B <= Reg[IR[2:6]]; ALUOut <= PC + (sign-extend(ir[5:]) << 2); Control signals: ALUSrcA =, ALUSrcB =, ALUOp = (add) Note: no explicit control signals needed to write A, B, & ALUOut. They are written by clock transitions automatically at end of step

Step 3: Dependent Operation One of four functions, based on instruction type: Memory address computation (for lw, sw): ALUOut <= A + sign-extend(ir[5:]); Control signals: ALUSrcA =, ALUSrcB =, ALUOp = ALU (R-type): ALUOut <= A op B; Control signals: ALUSrcA =, ALUSrcB =, ALUOp = Conditional branch: if (A==B) PC <= ALUOut; Control signals: ALUSrcA =, ALUSrcB =, ALUOp = (Sub), PCSource =, PCWriteCond = (to enable zero to write PC if ) What is the content of ALUOut during this step? Immediately after this step? Jump: PC <= PC[3:28] (IR[25:]<<2); Control signals: PCSource =, PCWrite = 2 Note: Conditional branch & jump instructions completed at this step!

Step 4: Memory Access or ALU (R-type) Completion For lw or sw instructions (access memory): MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; Control signals (for lw): IorD = (to select ALUOut as address), MemRead =, note that no write signal needed for writing to MDR, it is written by clock transition automatically at end of step Control signals (for sw): IorD = (to select ALUOut as address), MemWrite = For ALU (R-type) instructions (write result to register): Reg[IR[5:]] <= ALUOut; Control signals: RegDst = (to select register address), MemtoReg =, RegWrite = 3

Step 5: Memory Read Completion For lw instruction only (write data from MDR to register): Reg[IR[2:6]]<= MDR; Control signals: RegDst = (to select register address), MemtoReg =, RegWrite = Note: lw instruction completed at this step! 4

Summary of Execution Steps Step name Action for R-type instructions Action for memory-reference instructions Action for branches Action for jumps fetch IR <= Memory[PC] PC <= PC + 4 A <= Reg [IR[25:2]] decode/register fetch B <= Reg [IR[2:6]] /branch addr comp ALUOut <= PC + (sign-extend (IR[5:]) << 2) Execution, address ALUOut <= A op B ALUOut <= A + sign-extend if (A ==B) then PC <= PC [3:28] computation, branch/ (IR[5:]) PC <= ALUOut II (IR[25:]<<2) jump completion Memory access or R-type Reg [IR[5:]] <= Load: MDR <= Memory[ALUOut] completion ALUOut or Store: Memory [ALUOut] <= B Memory read completion Load: Reg[IR[2:6]] <= MDR Some instructions take shorter number of cycles, therefore next instructions can start earlier. Hence, compare to single-cycle implementation where all instructions take same amount of time, multi-cycle implementation is faster! 5 Multi-cycle implementation also reduces hardware cost (reduces adders & memory, increases number of

Simple Questions How many cycles will it take to execute this code? lw $t2, ($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) Label:... #assume not What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? 6