Design of Decode, Control and Associated Datapath Units

Similar documents
ECE/CS Computer Design Lab Lab 1 - Completion by Sept 13. Lab 2 will be assigned during the 2nd week of September.

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

Data paths for MIPS instructions

Chapter 4. The Processor

Module 5 - CPU Design

Where Does The Cpu Store The Address Of The

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit

Chapter 4. The Processor

CS150 Project Final Report

Digital System Design Using Verilog. - Processing Unit Design

Chapter 4. The Processor Designing the datapath

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Class Notes. Dr.C.N.Zhang. Department of Computer Science. University of Regina. Regina, SK, Canada, S4S 0A2

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Introduction. Datapath Basics

Introduction to CPU Design

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Chapter 4. The Processor

CS 101, Mock Computer Architecture

Single cycle MIPS data path without Forwarding, Control, or Hazard Unit

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

AS/A Level Computing Syllabus 2011

CPU ARCHITECTURE. QUESTION 1 Explain how the width of the data bus and system clock speed affect the performance of a computer system.

Processor (I) - datapath & control. Hwansoo Han

Introduction to Computers - Chapter 4

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

COMPUTER ORGANIZATION AND DESIGN

Block diagram view. Datapath = functional units + registers

The MARIE Architecture

Design of the MIPS Processor (contd)

Computer Logic II CCE 2010

Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Single Cycle Datapath

Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx

Computer Organization II CMSC 3833 Lecture 33

CSE Lecture In Class Example Handout

Control Structures. Code can be purely arithmetic assignments. At some point we will need some kind of control or decision making process to occur

Topic #6. Processor Design

Processing Unit CS206T

ECE260: Fundamentals of Computer Engineering

Design of the MIPS Processor

CS Computer Architecture Spring Week 10: Chapter

Outcomes. Lecture 13 - Introduction to the Central Processing Unit (CPU) Central Processing UNIT (CPU) or Processor

COMPUTER ORGANIZATION AND DESIGN

Control & Execution. Finite State Machines for Control. MIPS Execution. Comp 411. L14 Control & Execution 1

Single Cycle Datapath

CAD4 The ALU Fall 2009 Assignment. Description

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

CPU Structure and Function

Examining the complete instruction cycle for one instruction illustrates the operation of the P8 CPU. Assume the following conditions: C C C

Lecture 11: Control Unit and Instruction Encoding

5.7. Microprogramming: Simplifying Control Design 5.7

Running Applications

Dual Port SRAM Based Microcontroller Chip Test Report

Computer Organization

CS3350B Computer Architecture Winter 2015

Programmable machines

EECS 151/251 FPGA Project Report


A 32-bit Processor: Sequencing and Output Logic

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

The MIPS Processor Datapath

Chapter 4 The Von Neumann Model

10/30/2016. How Do We Write Instructions? ECE 120: Introduction to Computing. Put Bits into Memory, Then Execute the Bits

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

COMPUTER ORGANIZATION AND DESIGN

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Pipelining. Pipeline performance

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CENG 3420 Lecture 06: Datapath

Chapter 16. Control Unit Operation. Yonsei University

Systems Architecture

EE 3170 Microcontroller Applications

Chapter 5 Solutions: For More Practice

What about branches? Branch outcomes are not known until EXE What are our options?

Grundlagen Microcontroller Processor Core. Günther Gridling Bettina Weiss

Memory General R0 Registers R1 R2. Input Register 1. Input Register 2. Program Counter. Instruction Register

DC57 COMPUTER ORGANIZATION JUNE 2013

Midterm. Sticker winners: if you got >= 50 / 67

Review: Abstract Implementation View

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B DIGITAL SYSTEMS II Fall 1999

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

Introduction to CPU architecture using the M6800 microprocessor

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Computer Organization. Structure of a Computer. Registers. Register Transfer. Register Files. Memories

Implementing the Control. Simple Questions

William Stallings Computer Organization and Architecture

Transcription:

1 Design of Decode, Control and Associated Datapath Units ECE/CS 3710 - Computer Design Lab Lab 3 - Due Date: Thu Oct 18 I. OVERVIEW In the previous lab, you have designed the ALU and hooked it up with the register files. The execute operation of your machine performs arithmetic and logical operations in the ALU; it fetches the data from register files and computes the result. So far, so good. In this lab, you will now conceptualize, design and implement the decode and control logic for your CPU and integrate it with the ALU and register files to complete the datapath. Before you begin designing the components for this lab, I would urge you to first re-cap as to how your ALU works, how many cycles does it take to perform the execute operation and then decide what control signals need to be generated to complete the data-flow through the CPU. Think about what extra registers/status flags would you need (if at all) to keep track of instruction execution. Develop a high-level view, preferably a block diagram, and conceptually verify if all your requirements are met. Then decide how to partition the decode, control, associated program counter hardware, and then proceed with the implementation. If you have already made plans regarding how to augment your base instruction set, make your design amenable to later modifications to accommodate the extra hardware/control at a later stage (this is tough, I know, but you can get a feel of how/what you would have to modify). In the next lab (lab 4), you will design the interface of your CPU with the memory. For designing and validating your CPU, you may assume for now that an instruction and/or data is available in associated registers: Memory Address Register (MAR), Memory Data Register (MDR), etc., as and when you need it. II. FETCHING THE INSTRUCTION The program counter is going to generate the addresses to fetch the instruction. Since the CPU will be interfaced with a memory controller, the current address would have to be latched in a Memory Address Register (MAR) for the memory controller. Of course, the PC itself may act as an MAR, but this can create the following problem. Usually, the PC gets incremented by one, the moment an instruction is fetched. Also, when branch instructions are executed, the PC gets updated only after the branch target address is computed. In case of pipelined execution, the PC computation can get rather involved if you implement some for of prediction. Without a specific MAR, how would the PC be interfaced with the memory controller so that the current address is latched until the corresponding instruction/data is being fetched? That s why, implementing an MAR may not be a bad idea. If an MAR is indeed implemented, how it would be interfaced with the PC and when/how would it get updated? That is the thinking part of this problem. Note that this MAR can be made a part of the Memory controller, but thats an issue for the next lab. To come to the point, maybe you should implement a memory address register to store the address of the current instruction/data being fetched.

2 SDRAM M A R P C PC = PC + 1 or PC = branch target address M D R I R mux/buf controls for regfiles opcodes for ALU Data ready write_enables for regfiles (or maybe for PC?) Data to ALU BUS/Regfile Fetch Decode Fig. 1. Block Diagram of the Memory-CPU Interface for fetch/decode stages A. Program Counter Design The program counter is a dedicated special register in the machine that holds the address of the next instruction to execute. It needs to be capable of being updated in every way the PC needs to be updated. For your machine, this means that the PC needs to be incremented by one word (the normal case), added to a (sign-extended) displacement (for branches) or loaded from a register (for jumps). Your datapath needs to be able to perform all of these operations. If your PC is already set up to feed into your ALU (which is unlikely), then you could perform the displacement calculation by simply setting some input MUXs (or bufs) so that the PC and the immediate go to the ALU and the ALU function is set to add. You might load from a register by simply setting the ALU such that the appropriate register source makes it through the ALU without modification. This value can then be latched into the PC. For the basic increment case, you could either put the PC through one side of the ALU, and select a constant 1 for the other somehow (put a constant value on one of the input muxes or something similar to that approach), or you could build your PC as a loadable counter. If you use the counter approach you can load the counter for the update and displacement functions, and count the counter for the increment-the-pc function. The choice is yours. The advantage of the counter is that you may not have to use the complete datapath for each PC increment. The advantage of the increment-through-the-alu approach is that every pc-update function goes through the same process, but with different mux settings. Remember that when you use the ALU to update the PC, you should not update the condition codes! Finally, remember that for a JAL instruction, the PC needs to have a path into the register file so that it can be stored in the link register. Your datapath must allow this operation. Another issue with the PC has to do with signed and unsigned arithmetic. Recall that the signed arithmetic is all done with two s complement numbers. This means that the range of numbers in a 16-bit word is -32,768 to 32,767. On the other hand, if you use those 16 bits to encode an unsigned number, you can represent 0 to 65,534 (64k). Since addresses are usually considered unsigned numbers, we need to consider what it means to have an unsigned PC that

3 is operated on by a two s complement ALU; especially in the face of signed offsets that might require subtraction! Note that our PC is addressing 16-bit words and not bytes! That way the bits in the PC are bits 0-15 of the word address and the PC can address 64k 16-bit words. B. Memory Data Register and Instruction Registers When the memory controller fetches and delivers a 16-bit word - the CPU has to know whether an instruction was fetched or data was fetched. The control logic knows what is being fetched. If an instruction is being fetched, you would like to store it in a register (usually called the Instruction Register, or IR) for subsequent decode. What if it is data? Would you like to store it in IR or have a dedicated Memory Data Register? Maybe, in your opinion, the Memory Data Register (MDR) should be inside the memory controller, and the CPU s job would be to: i) fetch the 16-bit instruction from the MDR into the IR; or ii) fetch the 16-bit data from the MDR and put it on the systembus/alu input MUX/buf; or iii) any other way you would want to implement it. The choice is yours, but you need to show consistency - synchronizing the machine operation is the main thinking issue here. III. SIGN EXTENSION Various instructions in our machine make use of sign-extended immediates. Recall from the instruction set handout that immediates in arithmetic operations are sign-extended from the 8-bits. Logical immediate operations are zero-extended instead of sign-extended. Check the 427 handout for details. IV. DECODING THE INSTRUCTION AND GENERATING THE CONTROL SIGNALS In this assignment, the complete decode logic needs to be designed and completed: Opcodes, MUX/buf select signals, Read and Write enables on Register Files, etc., etc., etc., will have to be generated by this piece of sequential control logic. Even though this is the most important feature as far as your machine operation is concerned, you have already developed part of your decode logic/signals: (i) The R src,r dest read buffers and their controls in the register files; (ii) The write enable control signal; (iii) PSR updates; (iv) anything else?? You need to think about what s involved in actually executing each instruction. Since you are (most likely) not building a pipelined machine, the execution model is pretty simple 1. Basically your control needs to sequence some actions in the machine that will result in the instruction being executed. Instruction Fetch: Already discussed in previous section. To repeat, before you can execute an instruction, you need to fetch it from memory. Think about what this involves. First you have to get the current PC value into the MAR. Then you read from the SDRAM (Through the sdram macro and your sdram interface state machine). The returned data should then be latched into the instruction register (on the rising edge of the clock after DONE goes high from the controller). The result of this sequence of actions is that you fetch the next instruction into the instruction register so you now know which instruction you are about to execute. Instruction Decode: In this phase of execution you use the information in the instruction register to set up all the state in the control path that you need to execute the instruction. This may or may not involve a separate clock phase depending on how your datapath is arranged. Things that get decoded include: mux settings, register file addressing, immediate fields (including sign extension or zero extension), and register enables. Of course, if the decoded value 1 If you want to pipeline your machine, you should discuss your plans with me and Neal. I would encourage those of you who have little software coding for their project to attempt some engineering optimizations. Pipeline is one, but be careful: it is easy to understand and difficult to implement.

4 describes an instruction that needs a second word of data (i.e. some extended version of the instruction set. The baseline doesn t have any two-word instructions...), you ll need to do a second fetch from the memory, but to a separate register so that you don t write over the current instruction. You ll need to have incremented the PC in this case too. Instruction Execution: Now that everything is set up by the instruction decode logic, you can execute the instruction. In this non-pipelined case this simply means allowing the correct data to go through the datapath and compute a result. Make sure you understand each and every instruction. Note that loads and stores may require some extra work here because you need to communicate to the SDRAM controller to execute the load or store - SDRAM interface is still pending. Also note that a PC operation must be performed somewhere. If the instruction is not a branch or jump, the PC needs to be incremented by one 16-bit word. If it is a taken branch, then the PC must be added to a signed offset from the instruction (the offset is a word-offset from the current PC), and if it s a jump, the PC must be loaded from a register. If it s a jump-and-link, then the incremented PC must be stored in the destination register. Make sure to get the details of JAL right! Writeback to the Register File: Once the result is computed for that instruction, you need, in most cases, to write back the answer to the register file. Most of you have already set up all the relevant information in the datapath (like destination addresses and mux/buf settings), so this is probably nothing more than enabling the register file to do a write on the next cycle. Remember, register files are enabled for write before the clock-trigger arrives. By the way, what about writes into Processor Status Register? See if your Control logic needs to directly access (read or write or read-modify-write) the PSR? This cycle then repeats itself for each new instruction. V. SUGGESTIONS ON HOW TO PROCEED Disclaimer: These are just my suggestions, mostly based on my experience. By no means you have to follow them strictly. You may want to design an MAR, MDR, IR and PC, and try to synchronize the fetch part. Since the SDRAM interface is not ready yet, in your testbench you can emulate it by writing the PC into MAR and fetching the instruction from the MDR into IR. This way, you can de-link the fetch stage from decode and execute stages. Later on, when we design the memory interface, you would have to do minimal changes to this fetch stage and perhaps hardly any for subsequent decoding and execution. If you mix everything in one-helluva-complex-module, re-synchronizing the CPU with the SDRAM would become almost impossible. Moreover, try to group the instructions for decoding that: i) either perform similar execute operations; or ii) generate similar control signals. I would group the instructions with similar addressing modes together. Furthermore, keep a track of how many execute cycles does an instruction require - e.g. your shifter, if implemented as a register, may require n cycles for shift-by-n operations. Counters can help, but ensure their correct operation by testing for corner cases, resetting and roll-overs, etc. Implementation of fetch part of the Load-Store can be postponed until the next lab, when we have the SDRAM interface ready. But you can surely test the decode and control part for load/store operations. Validate each module (fetch/decode stage) separately, then put it together. Finally, you may want to use a program to test your machine. You can do that similar to what we have been doing so far. Another important advice: First, try to implement the machine without jumps and branches. Make sure that a purely step-by-step sequential machine operation is currectly updating the PC, is setting up the right con-

5 trol signals at the right time, and so on. Once this is achieved, augment the hardware to include Jumps and Branches. Good luck!