The overall datapath for RT, lw,sw beq instrucution

Similar documents
CPE 335 Computer Organization. Basic MIPS Architecture Part I

Review: Abstract Implementation View

CENG 3420 Lecture 06: Datapath

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Processor (I) - datapath & control. Hwansoo Han

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Chapter 4. The Processor

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

LECTURE 5. Single-Cycle Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Systems Architecture

Chapter 4. The Processor

Microprogrammed Control Approach

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

The Processor: Datapath & Control

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CPE 335. Basic MIPS Architecture Part II

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Topic #6. Processor Design

Chapter 4 The Processor 1. Chapter 4A. The Processor

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

The MIPS Processor Datapath

COMP303 Computer Architecture Lecture 9. Single Cycle Control

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Laboratory 5 Processor Datapath

ECE260: Fundamentals of Computer Engineering

RISC Design: Multi-Cycle Implementation

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

RISC Processor Design

Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

ECE232: Hardware Organization and Design

Ch 5: Designing a Single Cycle Datapath

Chapter 4. The Processor Designing the datapath

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

CC 311- Computer Architecture. The Processor - Control

Systems Architecture I

Introduction. Datapath Basics

Inf2C - Computer Systems Lecture Processor Design Single Cycle

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ECS 154B Computer Architecture II Spring 2009

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

Chapter 5: The Processor: Datapath and Control

CSEN 601: Computer System Architecture Summer 2014

Introduction. ENG3380 Computer Organization and Architecture MIPS: Data Path Design Part 3. Topics. References. School of Engineering 1

MIPS-Lite Single-Cycle Control

Data paths for MIPS instructions

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

CPU Organization (Design)

Fundamentals of Computer Systems

Multiple Cycle Data Path

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

CSE140: Components and Design Techniques for Digital Systems

Major CPU Design Steps

Lecture 10: Simple Data Path

Pipelined Processor Design

Design of the MIPS Processor

RISC Architecture: Multi-Cycle Implementation

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

UC Berkeley CS61C : Machine Structures

Processor: Multi- Cycle Datapath & Control

Lecture 7 Pipelining. Peng Liu.

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Chapter 4. The Processor

Single Cycle CPU Design. Mehran Rezaei

ENE 334 Microprocessors

ECE331: Hardware Organization and Design

Single Cycle Datapath

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Improving Performance: Pipelining

EE 457 Unit 6a. Basic Pipelining Techniques

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

The Processor: Datapath & Control

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

RISC Architecture: Multi-Cycle Implementation

Pipelined Processor Design

Computer Architecture. Lecture 6.1: Fundamentals of

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

Chapter 4. The Processor

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

EE457 Lab 4 Part 4 Seven Questions From Previous Midterm Exams and Final Exams ee457_lab4_part4.fm 10/6/04

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

Single Cycle Datapath

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

Transcription:

Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs rt offset base dest/src 31-26 25-21 20-16 15-0 a) Branch : Op rs rt offset 1.src 2.src 31-26 25-21 20-16 15-0 Observations: Bits 31-26 is always the opcode field. rs and rt fields always specify the two registers to be read in (R-type, beq, sw) the base register for sw and lw is always in the rs field. the 16-bit offset value is always in 15-0, for beq, lw, sw. for lw, destination reg. is in rt field, whereas for R-type it is in the rd field We need a MUX to select the appropriate destination reg. field. This new MUX is to be placed in front of the write register (wr) input of the Reg. file unit. For the 4 MUX es, we need 4 select control lines. Then, we need write control inputs for the Reg. file and Data Mem. We need a read control input for the Data Mem. We shall also show the ALU control block, and its connection to the ALU. The overall datapath for RT, lw,sw beq instrucution 13

Main control unit asserts output signals to control the operation of the sequential and combinational blocks at every clock-pulse. We have introduced all control inputs and outputs in the circuit previously. These signals are summarized in the following table. 14

In this truth-table the reset input is isolated from opcode and provides only asserting the PCClr output. Consequently, this can be decomposed to two parts, reset, and execution. The control unit with the truth-table above asserts a Branch output independent of ALU-Zero each time when a BEQ instruction is executed. In this case, a circuit in datapath tests ALU-Zero and generates PCSrc=1 only when both Branch and ALU-Zero are high. To eliminate the branch circuit from the datapath, the controller must have ALU-zero input and PCSrc output as specified in following table: In this case, the controller asserts the PCSrc output directly depending on the ALU-Zero input. The truth table for this case is seen below. 15

Datapath modification for a unconditional Jump Instruction format for jump: Op address 2 31-26 25-0 jump branches unconditionally, to a 32 bit address which is obtained as follows: 31-28 27-2 1-0 from current PC from jump inst.(26 bits) Modification required on datapath: A new MUX is needed to select the jump address, or the (branch target address or PC+4) 00 The highest four bits of PC (PC31-28) remains unchanged. 16 MemWrite

17

18

Problems with the single-cycle implementation: Since every instruction will be executed in the same clock cycle time, CPI = 1. The length of the single clock cycle is determined by the longest path in the design. This longest path is the one used by the load instruction: Sequentially it uses - the instruction memory - the register file - the ALU - the data memory, and - the register file again Note that some of the instructions considered could fit in shorter clock cycles. The shortest is Jump instruction. Assume the operation times (access times) of the major functional units for this implementation are given as: As we expected, the longest time-taking instruction is lw, and it requires 40ns for the completion of the processing. Minimum processing time for each instruction of the single-cycle-processor. That is, if all instructions must run on one clock cycle, processor needs 40ns to complete lw instruction, and thus the processor clock-cycle must be at least 40ns. 19

Ex: Let us find out which of the following implementations would be faster: a- The single clock cycle implementation (of fixed length) b- The variable length implementation with clock-cycle=10ns, but each instruction takes several clock-cycles. Instruction mixture of the test program Number of cycles 3 4 4 3 2 Answer Single and Multi-cycle cases for the implemented datapath. Instruction-execution-time = CPI clock-cycle-time. Compute the CPU-execution-time for each case CPU-execution-time = Inst. count CPI Clock cycle time a) For the single-clock cycle implementation: clock cycle time = 40 ns, CPI= 1 ; CPU-execution-time = I 1 40 ns = 40 I ns b) For the multi-clock cycle implementation the average CPI is: CPI = (CPIk Instr.-countk ) / Instruction-count CPI = (3 0.49)+(4 0.22)+(4 0.11)+(3 0.16)+(2 0.02) = 3.31 CPU-execution-time = I 3.31 10 ns = 33.1 I ns The performance-ratio of the two implementation is: The variable clock implementation has 1.21 times better performance rating over the fixed-single-clockcycle implementation. >> If floating point operations, or a more complicated instruction set is considered in the design, the single clock cycle design would use an extremely slow clock. So, instead, use implementation techniques that have a shorter clock cycle, requiring multiple clock cycles for each instruction 20

SUMMARY for Single Cycle Processor Advantages Single cycle per instruction makes logic and clock simple Disadvantages Inefficient utilization of memory and functional units since different instructions take different lengths of time ALU only computes values a small amount of the time Cycle time is the worst case path long cycle times Load instruction PC CLK-to-Q + instruction memory access time + register file access time + ALU delay + data memory access time + register file setup time All machines would have a CPI of 1 Fixing the Single Cycle Multicycle implementation Divide each instruction into a series of steps Each step will take one clock cycle Different instructions can now have different CPI Requires a few significant changes to organization Use registers to separate each stage Advantages Shorter cycle time Simple instructions executed in short period of time Variable cycles per instruction no longer restricts to worst case Functional units can be used more than once/instruction Less hardware required to implement processor Disadvantages Requires additional registers to store between stages More timing paths to design, analyze, and tune Multiple-Cycle Implementation The single-clock-cycle implementation requires distinct memory units for the instruction fetch and data access. A simplified representation of the data processing path starts with the instruction memory, and terminates with the data memory. We have four main sections in this datapath, those require almost same processing time. A simplified single clock datapath can be divided into four main sections 21

In dividing, we pay attention not to have datapath elements in between the sections, and having almost the same processing time for each of the divisions. Benefits of the multi-cycle processing Divide the execution into steps Each step takes one clock cycle to complete In this implementation, we can use a functional unit more than once per instruction, provided that it is used on different clock cycles reduced hardware Different instructions may take different no. of clock cycles flexibility Simplified multi-clock-cycle datapath ALU usage during each cycle of the multi-clock-cycle-processing At the first clock cycle, - ALU calculates PC+4. At the second cycle - ALU is used to calculate the branch-target-address PC+4 + imm 4 while the operands are not ready from the register-file outputs. During the first and second cycles the instruction is not yet decoded, and datapath performs the same operations for all instructions. At the third cycle, - For R-format instructions ALU performs the specified operation of the instruction. - The memory address of the base-addressing instructions lw and sw is calculated. - In the branch instruction, ALU compares the contents of the source registers. At the fourth cycle - For R-Format instructions, to store the ALU-result into the destination register. - For sw instruction, the memory write occurs. - For lw instruction, the memory read occurs At the fifth cycle - For lw instruction, to store the memory contents into the destination register. 22